Open Microscopy Environment

by **a.herbert** » Fri Jul 29, 2011 10:36 am

Hi,

At our site we advise all of our users to archive their files when uploading to OMERO if they need to analyse the files in proprietary software (or redistribute them, etc). However the Drive Space section of the web admin client does not account for the archived files.

I have had a look at the code and see the functionality resides in:

lib/python/omeroweb/webadmin/controller/drivespace.py

The script queries the Pixels table and totals up the dimensions of each image and multiplies this by the size of the image byte type. This seems perfectly valid given that the files are stored on the drive in the Pixels folder in raw uncompressed format. However it does not account for the archived files.

I had a look into HQL and the object that gets returned and managed to extend the script so that for each Pixels object returned it has a reference to the parent Image object and the archived flag. If the flag is present and set to true then the size of the archived file can be obtained by query to the PixelsOriginalFileMap object.

This approach works for me but I would like to determine if this is the best approach. On our server we have 3TB of data in 14580 Pixels objects and 11786 archived files (figures taken from the Postgres DB tables). When I run a helper script from the command line to collate drive space the original logic takes about 15 seconds. The modified logic takes 43 seconds. Thus the impact on the run-time is not insignificant.

Any advice on a better way to do this would be appreciated. For example would it be a better approach to simply total up the pixels first and then total up the archived files in a second HQL query. My script currently queries the system for each archived file which would be 11786 calls to the query service in the example above. I imagine speed would improve with a directed query using a larger page size directly on the PixelsOriginalFileMap table. The problem with this is that I could not find a link from PixelsOriginalFileMap to the Image object which has the archived flag. I only found PixelsOriginalFileMap linked to OriginalFile and to Pixels. I suppose I could then go from Pixels to Image but my limited HQL knowledge has prevented me tring this so far. I thought a second opinion would help.

Thanks,

Alex

I tried to attach my modified drivespace.py script for reference but the forum refused all file extensions (and no extension) so here it is:

Code: Select all: # lib/python/omeroweb/webadmin/controller/drivespace.py from webadmin.controller import BaseController import omero class BaseDriveSpace(BaseController): freeSpace = None usedSpace = None topTen = None def __init__(self, conn): BaseController.__init__(self, conn) self.freeSpace = self.conn.getFreeSpace() self.experimenters = list(self.conn.listExperimenters()) def _bytes_per_pixel(pixel_type): if pixel_type == "int8" or pixel_type == "uint8": return 1 elif pixel_type == "int16" or pixel_type == "uint16": return 2 elif pixel_type == "int32" or pixel_type == "uint32" or pixel_type == "float": return 4 elif pixel_type == "double": return 8; else: logger.error("Error: Unknown pixel type: %s" % (pixel_type)) logger.error(traceback.format_exc()) raise AttributeError("Unknown pixel type: %s" % (pixel_type)) def _usage_map_helper(conn,ctx,pixels_list,exps): tt = dict() for p in pixels_list: oid = p.details.owner.id.val p_size = p.sizeX.val * p.sizeY.val * p.sizeZ.val * p.sizeC.val * p.sizeT.val p_size = p_size*_bytes_per_pixel(p.pixelsType.value.val) if tt.has_key(oid): tt[oid]['data']+=p_size else: tt[oid] = dict() tt[oid]['label']=exps[oid] tt[oid]['data']=p_size if p.image.archived != None and p.image.archived.val: # Query the size of the original file param = omero.sys.ParametersI() param.add("id", p.id) results = conn.getQueryService().findAllByQuery( "select m from PixelsOriginalFileMap as m join fetch m.parent " \ "where m.child.id = :id", param, ctx) for result in results: oid = result.details.owner.id.val p_size = result.parent.size.val if tt.has_key(oid): tt[oid]['data']+=p_size else: tt[oid] = dict() tt[oid]['label']=exps[oid] tt[oid]['data']=p_size return tt #sorted(tt.iteritems(), key=lambda (k,v):(v,k), reverse=True) def usersData(conn, offset=0): exps = dict() for e in list(conn.listExperimenters()): exps[e.id] = e.getFullName() PAGE_SIZE = 1000 offset = long(offset) ctx = dict() if conn.isAdmin(): ctx['omero.group'] = '-1' else: ctx['omero.group'] = str(conn.getEventContext().groupId) p = omero.sys.ParametersI() p.page(offset, PAGE_SIZE) pixels_list = conn.getQueryService().findAllByQuery( "select p from Pixels as p join fetch p.pixelsType join fetch p.image " \ "order by p.id", p, ctx) count = len(pixels_list) usage_map = _usage_map_helper(conn,ctx,pixels_list,exps) count = len(pixels_list) offset += count if count != PAGE_SIZE: loading = False else: loading = True return {'loading':loading, 'offset':offset, 'usage':usage_map}

by **wmoore** » Fri Jul 29, 2011 11:47 am

Hi Alex,

As far as I can tell, the 'archived' flag on image is purely a convenience to know if the image has archived files without having to load pixels and PixelsOriginalFileMap. You can ignore this flag (don't need to load images at all).

Instead of making a call per-pixel object, you should simply be able to make a single query on PixelsOriginalFileMap using a list of the pixel ids:

Code: Select all: ids = [2351,2301,2256,2255,2254] idList = [rlong(i) for i in ids] pids = rlist(idList) param = omero.sys.ParametersI() param.add("pids", pids) results = conn.getQueryService().findAllByQuery( "select m from PixelsOriginalFileMap as m join fetch m.parent " \ "where m.child.id in (:pids)", param)

Hope that helps.

It sounds like we should implement this in the webclient. I've created a ticket and we'll see if we can get this in the next release (no promises): http://trac.openmicroscopy.org.uk/ome/ticket/6359

Any feedback you can give us (about whether the code above is any faster) would be really handy!

Cheers,

Will.

by **a.herbert** » Fri Jul 29, 2011 2:26 pm

Hi Will,

I have adapted my script using the code that you gave me. Now my test script runs with the following times:

Ignoring archive files:

real 0m17.931s
user 0m15.317s
sys 0m0.149s

With archive files:

real 0m31.132s
user 0m26.277s
sys 0m0.118s

A definite improvement. This means that the extra query to the archived information is making the script approx. 2x slower instead of 3x slower for my original.

Thanks for the help,

Alex

The rest of the code is the same, only the helper function has changed:

Code: Select all: def _usage_map_helper(conn,ctx,pixels_list,exps): tt = dict() idList = [] for p in pixels_list: oid = p.details.owner.id.val p_size = p.sizeX.val * p.sizeY.val * p.sizeZ.val * p.sizeC.val * p.sizeT.val p_size = p_size*_bytes_per_pixel(p.pixelsType.value.val) if tt.has_key(oid): tt[oid]['data']+=p_size else: tt[oid] = dict() tt[oid]['label']=exps[oid] tt[oid]['data']=p_size if p.image.archived != None and p.image.archived.val: # Query the size of the original file param = omero.sys.ParametersI() param.add("id", p.id) results = conn.getQueryService().findAllByQuery( "select m from PixelsOriginalFileMap as m join fetch m.parent " \ "where m.child.id = :id", param, ctx) for result in results: oid = result.details.owner.id.val p_size = result.parent.size.val if tt.has_key(oid): tt[oid]['data']+=p_size else: tt[oid] = dict() tt[oid]['label']=exps[oid] tt[oid]['data']=p_size pids = rlist(idList) param = omero.sys.ParametersI() param.add("pids", pids) results = conn.getQueryService().findAllByQuery( "select m from PixelsOriginalFileMap as m join fetch m.parent " \ "where m.child.id in (:pids)", param, ctx) for result in results: oid = result.details.owner.id.val p_size = result.parent.size.val tt[oid]['data']+=p_size return tt

by **wmoore** » Fri Jul 29, 2011 3:11 pm

Did you forget to remove some code?

This is what I have - Does it work for you?

Code: Select all: def _usage_map_helper(conn,ctx,pixels_list,exps): tt = dict() idList = [] for p in pixels_list: oid = p.details.owner.id.val p_size = p.sizeX.val * p.sizeY.val * p.sizeZ.val * p.sizeC.val * p.sizeT.val p_size = p_size*_bytes_per_pixel(p.pixelsType.value.val) if tt.has_key(oid): tt[oid]['data']+=p_size else: tt[oid] = dict() tt[oid]['label']=exps[oid] tt[oid]['data']=p_size idList.append(p.id) pids = rlist(idList) param = omero.sys.ParametersI() param.add("pids", pids) results = conn.getQueryService().findAllByQuery( "select m from PixelsOriginalFileMap as m join fetch m.parent " \ "where m.child.id in (:pids)", param, ctx) for result in results: oid = result.details.owner.id.val p_size = result.parent.size.val if tt.has_key(oid): tt[oid]['data']+=p_size else: tt[oid] = dict() tt[oid]['label']=exps[oid] tt[oid]['data']=p_size return tt

by **a.herbert** » Mon Aug 01, 2011 9:16 am

Hi Will,

Sorry for the mistake. I pasted in part modified code (since I have been cutting it from my larger admin script). My final code is the same as yours except that I have removed the check for the dictionary key in the second loop on the assumption that the owner should exist:

Code: Select all: def _usage_map_helper(conn,ctx,pixels_list,exps): tt = dict() idList = [] for p in pixels_list: oid = p.details.owner.id.val p_size = p.sizeX.val * p.sizeY.val * p.sizeZ.val * p.sizeC.val * p.sizeT.val p_size = p_size*_bytes_per_pixel(p.pixelsType.value.val) if tt.has_key(oid): tt[oid]['data']+=p_size else: tt[oid] = dict() tt[oid]['label']=exps[oid] tt[oid]['data']=p_size idList.append(p.id) pids = rlist(idList) param = omero.sys.ParametersI() param.add("pids", pids) results = conn.getQueryService().findAllByQuery( "select m from PixelsOriginalFileMap as m join fetch m.parent " \ "where m.child.id in (:pids)", param, ctx) for result in results: oid = result.details.owner.id.val p_size = result.parent.size.val # Owner should already exist tt[oid]['data']+=p_size return tt

Regards,

Alex

by **atarkowska** » Tue Aug 02, 2011 11:22 am

Hi Alex,

I've already fixed that issue, if you would like to test it, please roll back your changes and apply path http://git.openmicroscopy.org/?p=ome.gi ... e3db1c18a5

Thanks
Ola

by **a.herbert** » Wed Aug 03, 2011 2:06 pm

Hi Ola,

Thanks for the patch. It took me a few attempts to apply the patch until I realised that it would not apply because I am using a detached repository (I need to build the source for v.4.3.0).

I have updated to the master branch, applied the patch, obtained the new file and tested it on my local installation. It all works as expected.

Regards,

Alex

Open Microscopy Environment

WebAdmin Drive Space script

WebAdmin Drive Space script

Re: WebAdmin Drive Space script

Re: WebAdmin Drive Space script

Re: WebAdmin Drive Space script

Re: WebAdmin Drive Space script

Re: WebAdmin Drive Space script

Re: WebAdmin Drive Space script

Who is online