At our site we advise all of our users to archive their files when uploading to OMERO if they need to analyse the files in proprietary software (or redistribute them, etc). However the Drive Space section of the web admin client does not account for the archived files.
I have had a look at the code and see the functionality resides in:
lib/python/omeroweb/webadmin/controller/drivespace.py
The script queries the Pixels table and totals up the dimensions of each image and multiplies this by the size of the image byte type. This seems perfectly valid given that the files are stored on the drive in the Pixels folder in raw uncompressed format. However it does not account for the archived files.
I had a look into HQL and the object that gets returned and managed to extend the script so that for each Pixels object returned it has a reference to the parent Image object and the archived flag. If the flag is present and set to true then the size of the archived file can be obtained by query to the PixelsOriginalFileMap object.
This approach works for me but I would like to determine if this is the best approach. On our server we have 3TB of data in 14580 Pixels objects and 11786 archived files (figures taken from the Postgres DB tables). When I run a helper script from the command line to collate drive space the original logic takes about 15 seconds. The modified logic takes 43 seconds. Thus the impact on the run-time is not insignificant.
Any advice on a better way to do this would be appreciated. For example would it be a better approach to simply total up the pixels first and then total up the archived files in a second HQL query. My script currently queries the system for each archived file which would be 11786 calls to the query service in the example above. I imagine speed would improve with a directed query using a larger page size directly on the PixelsOriginalFileMap table. The problem with this is that I could not find a link from PixelsOriginalFileMap to the Image object which has the archived flag. I only found PixelsOriginalFileMap linked to OriginalFile and to Pixels. I suppose I could then go from Pixels to Image but my limited HQL knowledge has prevented me tring this so far. I thought a second opinion would help.
Thanks,
Alex
I tried to attach my modified drivespace.py script for reference but the forum refused all file extensions (and no extension) so here it is:
- Code: Select all
# lib/python/omeroweb/webadmin/controller/drivespace.py
from webadmin.controller import BaseController
import omero
class BaseDriveSpace(BaseController):
freeSpace = None
usedSpace = None
topTen = None
def __init__(self, conn):
BaseController.__init__(self, conn)
self.freeSpace = self.conn.getFreeSpace()
self.experimenters = list(self.conn.listExperimenters())
def _bytes_per_pixel(pixel_type):
if pixel_type == "int8" or pixel_type == "uint8":
return 1
elif pixel_type == "int16" or pixel_type == "uint16":
return 2
elif pixel_type == "int32" or pixel_type == "uint32" or pixel_type == "float":
return 4
elif pixel_type == "double":
return 8;
else:
logger.error("Error: Unknown pixel type: %s" % (pixel_type))
logger.error(traceback.format_exc())
raise AttributeError("Unknown pixel type: %s" % (pixel_type))
def _usage_map_helper(conn,ctx,pixels_list,exps):
tt = dict()
for p in pixels_list:
oid = p.details.owner.id.val
p_size = p.sizeX.val * p.sizeY.val * p.sizeZ.val * p.sizeC.val * p.sizeT.val
p_size = p_size*_bytes_per_pixel(p.pixelsType.value.val)
if tt.has_key(oid):
tt[oid]['data']+=p_size
else:
tt[oid] = dict()
tt[oid]['label']=exps[oid]
tt[oid]['data']=p_size
if p.image.archived != None and p.image.archived.val:
# Query the size of the original file
param = omero.sys.ParametersI()
param.add("id", p.id)
results = conn.getQueryService().findAllByQuery(
"select m from PixelsOriginalFileMap as m join fetch m.parent " \
"where m.child.id = :id", param, ctx)
for result in results:
oid = result.details.owner.id.val
p_size = result.parent.size.val
if tt.has_key(oid):
tt[oid]['data']+=p_size
else:
tt[oid] = dict()
tt[oid]['label']=exps[oid]
tt[oid]['data']=p_size
return tt #sorted(tt.iteritems(), key=lambda (k,v):(v,k), reverse=True)
def usersData(conn, offset=0):
exps = dict()
for e in list(conn.listExperimenters()):
exps[e.id] = e.getFullName()
PAGE_SIZE = 1000
offset = long(offset)
ctx = dict()
if conn.isAdmin():
ctx['omero.group'] = '-1'
else:
ctx['omero.group'] = str(conn.getEventContext().groupId)
p = omero.sys.ParametersI()
p.page(offset, PAGE_SIZE)
pixels_list = conn.getQueryService().findAllByQuery(
"select p from Pixels as p join fetch p.pixelsType join fetch p.image " \
"order by p.id", p, ctx)
count = len(pixels_list)
usage_map = _usage_map_helper(conn,ctx,pixels_list,exps)
count = len(pixels_list)
offset += count
if count != PAGE_SIZE:
loading = False
else:
loading = True
return {'loading':loading, 'offset':offset, 'usage':usage_map}