We're Hiring!

WebAdmin Drive Space script

General and open developer discussion about using OMERO APIs from C++, Java, Python, Matlab and more! Please new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

If you are having trouble with custom code, please provide a link to a public repository, ideally GitHub.

WebAdmin Drive Space script

Postby a.herbert » Fri Jul 29, 2011 10:36 am

Hi,

At our site we advise all of our users to archive their files when uploading to OMERO if they need to analyse the files in proprietary software (or redistribute them, etc). However the Drive Space section of the web admin client does not account for the archived files.

I have had a look at the code and see the functionality resides in:

lib/python/omeroweb/webadmin/controller/drivespace.py

The script queries the Pixels table and totals up the dimensions of each image and multiplies this by the size of the image byte type. This seems perfectly valid given that the files are stored on the drive in the Pixels folder in raw uncompressed format. However it does not account for the archived files.

I had a look into HQL and the object that gets returned and managed to extend the script so that for each Pixels object returned it has a reference to the parent Image object and the archived flag. If the flag is present and set to true then the size of the archived file can be obtained by query to the PixelsOriginalFileMap object.

This approach works for me but I would like to determine if this is the best approach. On our server we have 3TB of data in 14580 Pixels objects and 11786 archived files (figures taken from the Postgres DB tables). When I run a helper script from the command line to collate drive space the original logic takes about 15 seconds. The modified logic takes 43 seconds. Thus the impact on the run-time is not insignificant.

Any advice on a better way to do this would be appreciated. For example would it be a better approach to simply total up the pixels first and then total up the archived files in a second HQL query. My script currently queries the system for each archived file which would be 11786 calls to the query service in the example above. I imagine speed would improve with a directed query using a larger page size directly on the PixelsOriginalFileMap table. The problem with this is that I could not find a link from PixelsOriginalFileMap to the Image object which has the archived flag. I only found PixelsOriginalFileMap linked to OriginalFile and to Pixels. I suppose I could then go from Pixels to Image but my limited HQL knowledge has prevented me tring this so far. I thought a second opinion would help.

Thanks,

Alex

I tried to attach my modified drivespace.py script for reference but the forum refused all file extensions (and no extension) so here it is:

Code: Select all
# lib/python/omeroweb/webadmin/controller/drivespace.py

from webadmin.controller import BaseController

import omero

class BaseDriveSpace(BaseController):

    freeSpace = None
    usedSpace = None
    topTen = None

    def __init__(self, conn):
        BaseController.__init__(self, conn)
        self.freeSpace = self.conn.getFreeSpace()
        self.experimenters = list(self.conn.listExperimenters())

def _bytes_per_pixel(pixel_type):
    if pixel_type == "int8" or pixel_type == "uint8":
        return 1
    elif pixel_type == "int16" or pixel_type == "uint16":
        return 2
    elif pixel_type == "int32" or pixel_type == "uint32" or pixel_type == "float":
        return 4
    elif pixel_type == "double":
        return 8;
    else:
        logger.error("Error: Unknown pixel type: %s" % (pixel_type))
        logger.error(traceback.format_exc())
        raise AttributeError("Unknown pixel type: %s" % (pixel_type))
   
def _usage_map_helper(conn,ctx,pixels_list,exps):
    tt = dict()
    for p in pixels_list:
        oid = p.details.owner.id.val
        p_size = p.sizeX.val * p.sizeY.val * p.sizeZ.val * p.sizeC.val * p.sizeT.val
        p_size = p_size*_bytes_per_pixel(p.pixelsType.value.val)
        if tt.has_key(oid):
            tt[oid]['data']+=p_size
        else:
            tt[oid] = dict()
            tt[oid]['label']=exps[oid]
            tt[oid]['data']=p_size
        if p.image.archived != None and p.image.archived.val:
            # Query the size of the original file
            param = omero.sys.ParametersI()
            param.add("id", p.id)
            results = conn.getQueryService().findAllByQuery(
                    "select m from PixelsOriginalFileMap as m join fetch m.parent " \
                    "where m.child.id = :id", param, ctx)
            for result in results:
                oid = result.details.owner.id.val
                p_size = result.parent.size.val
                if tt.has_key(oid):
                    tt[oid]['data']+=p_size
                else:
                    tt[oid] = dict()
                    tt[oid]['label']=exps[oid]
                    tt[oid]['data']=p_size
       
    return tt #sorted(tt.iteritems(), key=lambda (k,v):(v,k), reverse=True)

def usersData(conn, offset=0):
    exps = dict()
    for e in list(conn.listExperimenters()):
        exps[e.id] = e.getFullName()
       
    PAGE_SIZE = 1000
    offset = long(offset)
   
    ctx = dict()
    if conn.isAdmin():
        ctx['omero.group'] = '-1'
    else:
        ctx['omero.group'] = str(conn.getEventContext().groupId)
       
    p = omero.sys.ParametersI()
    p.page(offset, PAGE_SIZE)
    pixels_list = conn.getQueryService().findAllByQuery(
            "select p from Pixels as p join fetch p.pixelsType join fetch p.image " \
            "order by p.id", p, ctx)
   
    count = len(pixels_list)
    usage_map = _usage_map_helper(conn,ctx,pixels_list,exps)
   
    count = len(pixels_list)
    offset += count
   
    if count != PAGE_SIZE:
        loading = False
    else:       
        loading = True
   
    return {'loading':loading, 'offset':offset, 'usage':usage_map}
Last edited by a.herbert on Mon Aug 01, 2011 8:54 am, edited 1 time in total.
a.herbert
 
Posts: 53
Joined: Tue Jan 11, 2011 1:35 pm

Re: WebAdmin Drive Space script

Postby wmoore » Fri Jul 29, 2011 11:47 am

Hi Alex,

As far as I can tell, the 'archived' flag on image is purely a convenience to know if the image has archived files without having to load pixels and PixelsOriginalFileMap. You can ignore this flag (don't need to load images at all).

Instead of making a call per-pixel object, you should simply be able to make a single query on PixelsOriginalFileMap using a list of the pixel ids:

Code: Select all
ids = [2351,2301,2256,2255,2254]
idList = [rlong(i) for i in ids]
pids = rlist(idList)
param = omero.sys.ParametersI()
param.add("pids", pids)
results = conn.getQueryService().findAllByQuery(
        "select m from PixelsOriginalFileMap as m join fetch m.parent " \
        "where m.child.id in (:pids)", param)


Hope that helps.

It sounds like we should implement this in the webclient. I've created a ticket and we'll see if we can get this in the next release (no promises): http://trac.openmicroscopy.org.uk/ome/ticket/6359

Any feedback you can give us (about whether the code above is any faster) would be really handy!

Cheers,

Will.
User avatar
wmoore
Team Member
 
Posts: 674
Joined: Mon May 18, 2009 12:46 pm

Re: WebAdmin Drive Space script

Postby a.herbert » Fri Jul 29, 2011 2:26 pm

Hi Will,

I have adapted my script using the code that you gave me. Now my test script runs with the following times:

Ignoring archive files:

real 0m17.931s
user 0m15.317s
sys 0m0.149s

With archive files:

real 0m31.132s
user 0m26.277s
sys 0m0.118s

A definite improvement. This means that the extra query to the archived information is making the script approx. 2x slower instead of 3x slower for my original.

Thanks for the help,

Alex

The rest of the code is the same, only the helper function has changed:

Code: Select all
def _usage_map_helper(conn,ctx,pixels_list,exps):
    tt = dict()
    idList = []
    for p in pixels_list:
        oid = p.details.owner.id.val
        p_size = p.sizeX.val * p.sizeY.val * p.sizeZ.val * p.sizeC.val * p.sizeT.val
        p_size = p_size*_bytes_per_pixel(p.pixelsType.value.val)
        if tt.has_key(oid):
            tt[oid]['data']+=p_size
        else:
            tt[oid] = dict()
            tt[oid]['label']=exps[oid]
            tt[oid]['data']=p_size
        if p.image.archived != None and p.image.archived.val:
            # Query the size of the original file
            param = omero.sys.ParametersI()
            param.add("id", p.id)
            results = conn.getQueryService().findAllByQuery(
                    "select m from PixelsOriginalFileMap as m join fetch m.parent " \
                    "where m.child.id = :id", param, ctx)
            for result in results:
                oid = result.details.owner.id.val
                p_size = result.parent.size.val
                if tt.has_key(oid):
                    tt[oid]['data']+=p_size
                else:
                    tt[oid] = dict()
                    tt[oid]['label']=exps[oid]
                    tt[oid]['data']=p_size

    pids = rlist(idList)
    param = omero.sys.ParametersI()
    param.add("pids", pids)
    results = conn.getQueryService().findAllByQuery(
        "select m from PixelsOriginalFileMap as m join fetch m.parent " \
        "where m.child.id in (:pids)", param, ctx)
   
    for result in results:
        oid = result.details.owner.id.val
        p_size = result.parent.size.val
        tt[oid]['data']+=p_size
       
    return tt
a.herbert
 
Posts: 53
Joined: Tue Jan 11, 2011 1:35 pm

Re: WebAdmin Drive Space script

Postby wmoore » Fri Jul 29, 2011 3:11 pm

Did you forget to remove some code?

This is what I have - Does it work for you?

Code: Select all
def _usage_map_helper(conn,ctx,pixels_list,exps):
    tt = dict()
    idList = []
    for p in pixels_list:
        oid = p.details.owner.id.val
        p_size = p.sizeX.val * p.sizeY.val * p.sizeZ.val * p.sizeC.val * p.sizeT.val
        p_size = p_size*_bytes_per_pixel(p.pixelsType.value.val)
        if tt.has_key(oid):
            tt[oid]['data']+=p_size
        else:
            tt[oid] = dict()
            tt[oid]['label']=exps[oid]
            tt[oid]['data']=p_size
        idList.append(p.id)

    pids = rlist(idList)
    param = omero.sys.ParametersI()
    param.add("pids", pids)
    results = conn.getQueryService().findAllByQuery(
        "select m from PixelsOriginalFileMap as m join fetch m.parent " \
        "where m.child.id in (:pids)", param, ctx)
   
    for result in results:
        oid = result.details.owner.id.val
        p_size = result.parent.size.val
        if tt.has_key(oid):
            tt[oid]['data']+=p_size
        else:
            tt[oid] = dict()
            tt[oid]['label']=exps[oid]
            tt[oid]['data']=p_size
       
    return tt
User avatar
wmoore
Team Member
 
Posts: 674
Joined: Mon May 18, 2009 12:46 pm

Re: WebAdmin Drive Space script

Postby a.herbert » Mon Aug 01, 2011 9:16 am

Hi Will,

Sorry for the mistake. I pasted in part modified code (since I have been cutting it from my larger admin script). My final code is the same as yours except that I have removed the check for the dictionary key in the second loop on the assumption that the owner should exist:

Code: Select all
def _usage_map_helper(conn,ctx,pixels_list,exps):
    tt = dict()
    idList = []
    for p in pixels_list:
        oid = p.details.owner.id.val
        p_size = p.sizeX.val * p.sizeY.val * p.sizeZ.val * p.sizeC.val * p.sizeT.val
        p_size = p_size*_bytes_per_pixel(p.pixelsType.value.val)
        if tt.has_key(oid):
            tt[oid]['data']+=p_size
        else:
            tt[oid] = dict()
            tt[oid]['label']=exps[oid]
            tt[oid]['data']=p_size
        idList.append(p.id)

    pids = rlist(idList)
    param = omero.sys.ParametersI()
    param.add("pids", pids)
    results = conn.getQueryService().findAllByQuery(
        "select m from PixelsOriginalFileMap as m join fetch m.parent " \
        "where m.child.id in (:pids)", param, ctx)
   
    for result in results:
        oid = result.details.owner.id.val
        p_size = result.parent.size.val
        # Owner should already exist
        tt[oid]['data']+=p_size
       
    return tt


Regards,

Alex
a.herbert
 
Posts: 53
Joined: Tue Jan 11, 2011 1:35 pm

Re: WebAdmin Drive Space script

Postby atarkowska » Tue Aug 02, 2011 11:22 am

Hi Alex,

I've already fixed that issue, if you would like to test it, please roll back your changes and apply path http://git.openmicroscopy.org/?p=ome.gi ... e3db1c18a5

Thanks
Ola
atarkowska
 
Posts: 327
Joined: Mon May 18, 2009 12:44 pm

Re: WebAdmin Drive Space script

Postby a.herbert » Wed Aug 03, 2011 2:06 pm

Hi Ola,

Thanks for the patch. It took me a few attempts to apply the patch until I realised that it would not apply because I am using a detached repository (I need to build the source for v.4.3.0).

I have updated to the master branch, applied the patch, obtained the new file and tested it on my local installation. It all works as expected.

Regards,

Alex
a.herbert
 
Posts: 53
Joined: Tue Jan 11, 2011 1:35 pm


Return to Developer Discussion

Who is online

Users browsing this forum: No registered users and 0 guests