Open Microscopy Environment

by **davemason** » Wed Apr 02, 2014 1:16 pm

We hope to soon deploy OMERO 5 for a multi user facility, and will likely be charging users based on storage space used (or at least keeping an eye on space used, for quota management).

I'm interested in how/if people report on space used by individual users and furthermore if it's possible to change [omero.fs.repo.path] to include a %GroupID% as it's first variable. Something like:

Code: Select all: omero.fs.repo.path=%GroupID%/%user%_%userId%/%year%-%month%/%day%/%time%

That way I could just poll the group's usage from the root of a directory tree in linux with:

Code: Select all: du --max-depth=1 /OMERO.data/ManagedRepository

Any thoughts appreciated,

Dave

by **jmoore** » Wed Apr 02, 2014 8:01 pm

Hi Dave,

we're also interested in hearing about anyone who's begun or plans to implement such reporting. In the meantime, the variable "groupId" can be used in "omero.fs.repo.path", though I don't know of anyone currently using it. Let us know if that gets you what you need, or if there are other variables that would be more useful.

Best wishes,
~Josh.

by **davemason** » Thu Apr 03, 2014 8:54 am

Cheers Josh,

I'll try implementing the altered path in a virtual machine and let you know. My only concern was the line in the documentation:

the first path component must be %user%_%userId%

From: [https://www.openmicroscopy.org/site/support/omero5/sysadmins/fs-upload-configuration.html?highlight=userid]

Although this will still resolve unique upload directories so I can't think it'd be a problem.

Dave

by **jmoore** » Thu Apr 03, 2014 9:02 am

Hi Dave,

my apologies. You're of course right. We added that restriction (visible in ManagedRepositoryI.java) to be as safe as possible. I think at the moment, you're only choice would be to put "groupId" UNDER the user directory. We'll look into the impact of removing this restriction.

Sorry for the confusion.
~Josh.

See https://trac.openmicroscopy.org.uk/ome/ticket/12160 for more information.

by **ppouchin** » Fri Apr 11, 2014 8:58 am

Hi,

I'm starting to check these options to know what I should do when I'll migrate to OMERO 5, and I was wondering something similar: would it be possible to only use "%user%" ?

I know it may not always be unique, but when using an LDAP for example, it should be (since even removing and re-adding a user with the same username would still lead OMERO to assume they're one and the same, based on the DN... forcing the administrator to find a way).

by **jmoore** » Fri Apr 11, 2014 9:43 am

At the moment, no. %user_id% is added for exactly this reason. A user may then end up with two directories:

"%old_user_name_%user_id"
"%new_user_name_%user_id"

but it will work fine.

We'll certainly include the "user/group rename" issue into the upcoming work.

Cheers,
~Josh

by **davemason** » Fri Apr 11, 2014 11:15 am

By means of an update on the original issue (for anyone that is interested). I've been playing around and have come up with a workable solution for polling disk usage on a per-group level. My test machine is Ubuntu 12.04.4, so YMMV depending upon distro (specifically, the "sort" command needs to be able to sort by version number IE. 1,2,3...10,11 instead of the default behaviour of 1,10,11,2,3...).

The script below relies upon searching for the Group ID in the "omero.fs.repo.path". This should be set to the following (spaces added for clarity):

Code: Select all: %userName%_%userId% / %groupId% / %groupName% / %year%%month%%day% / %time%

Briefly:
- Find the maximum number of groups
- For each group ID (starting at #3 - the first user added group), search for the value at position 2 in the directory structure. Report the sum of all of the folders for that group.
- Update the headers of an existing log file to include all of the current group names (if no log exists, create a new one with headers).
- Append a date stamp and the usage info for each group to the bottom of the log

Exporting as a CSV, you can just drop this file into excel (or similar) and quickly report on monthly usage per group, even if new groups are added mid-month.

Happy to dump this into a git repository if anyone wants to branch the latest. Otherwise this should give you a rough idea:

Code: Select all: #!/bin/bash # Script to report on daily disk usage from OMERO binary repository and # write the output to a log file (groups as columns, days as rows) # - Dave Mason, University of Liverpool, CCI. April 2014 # # Requires the binary data repository to be organised with the following root: # %userName%_%userId% / %groupId% / %groupName% / %year%%month%%day% / %time% # Set the location of the Binary Data Store: REPO_DIR=/data/OMERO.data/ManagedRepository # Where to store the log file: #LOG_PATH=~/logOmeroUsage.csv # Consider using a monthly log file: LOG_PATH=~/$(date +%Y-%m)-logOmeroUsage.csv # Set the working directory to come back to at the end WORK_DIR="$(pwd)" # Find the number of groups cd $REPO_DIR NUM_GROUPS="$(find . -maxdepth 2|awk -F"/" '{print $3}'|grep -ve repository|sort -u --version-sort|tail -n 1)" # Get a comma separated list of directory sizes in order of gID SIZE_VALUES="$(for ((n=3;n<=$NUM_GROUPS;n++)); do find -iname "$n" -exec du -c '{}' +|tail -n 1|awk -F" " '{print $1}' ; done|xargs|sed 's/ /,/g')" # Update the titles to represent the most up to date group list (if no log exists - create one with titles) LOG_HEADER=$(echo DateStamp,$(find . -mindepth 3 -maxdepth 3|grep -ve repository|awk -F"/" '{OFS="";print $3,"_",$4}'|sort -u --version-sort|xargs|sed 's/ /,/g')) if [ -f $LOG_PATH ]; then # Have to use a temporary file here as you can't cat the same file into itself echo $LOG_HEADER|cat - $LOG_PATH|awk 'NR!=2'>$LOG_PATH.temp mv $LOG_PATH.temp $LOG_PATH else echo $LOG_HEADER>$LOG_PATH fi # Date stamp and append to the log echo $(date +%Y%m%d-%H%M),$SIZE_VALUES>>$LOG_PATH # Return to the starting directory cd $WORK_DIR

by **jmoore** » Mon Apr 14, 2014 10:10 am

Hi Dave,

thanks for the script! Seems like it could be of use to others. We'll keep you posted on getting the group_id earlier in the path.

Cheers,
~Josh.

by **wmoore** » Mon Apr 14, 2014 10:26 am

Hi Dave,

FYI - there's a PR open for webadmin to display disk usage by group in the pie chart - see https://github.com/openmicroscopy/openm ... /pull/2270

Regards,

Will.

by **davemason** » Mon Apr 14, 2014 4:14 pm

Will;

Thanks for the link. I had a look and I think it would be a nice feature. For extra brownie points, it would be useful if the data could be downloaded as a file, although without historical data I wonder how useful it would be if you have to login and manually report every period.

Best,
Dave

Open Microscopy Environment

Reporting on Disk Usage within the Binary Repository

Reporting on Disk Usage within the Binary Repository

Re: Reporting on Disk Usage within the Binary Repository

Re: Reporting on Disk Usage within the Binary Repository

Re: Reporting on Disk Usage within the Binary Repository

Re: Reporting on Disk Usage within the Binary Repository

Re: Reporting on Disk Usage within the Binary Repository

Re: Reporting on Disk Usage within the Binary Repository

Re: Reporting on Disk Usage within the Binary Repository

Re: Reporting on Disk Usage within the Binary Repository

Re: Reporting on Disk Usage within the Binary Repository

Who is online