By means of an update on the original issue (for anyone that is interested). I've been playing around and have come up with a workable solution for polling disk usage on a per-group level. My test machine is Ubuntu 12.04.4, so YMMV depending upon distro (specifically, the "sort" command needs to be able to sort by version number IE. 1,2,3...10,11 instead of the default behaviour of 1,10,11,2,3...).
The script below relies upon searching for the Group ID in the "omero.fs.repo.path". This should be set to the following (spaces added for clarity):
- Code: Select all
%userName%_%userId% / %groupId% / %groupName% / %year%%month%%day% / %time%
Briefly:
- Find the maximum number of groups
- For each group ID (starting at #3 - the first user added group), search for the value at position 2 in the directory structure. Report the sum of all of the folders for that group.
- Update the headers of an existing log file to include all of the current group names (if no log exists, create a new one with headers).
- Append a date stamp and the usage info for each group to the bottom of the log
Exporting as a CSV, you can just drop this file into excel (or similar) and quickly report on monthly usage per group, even if new groups are added mid-month.
Happy to dump this into a git repository if anyone wants to branch the latest. Otherwise this should give you a rough idea:
- Code: Select all
#!/bin/bash
# Script to report on daily disk usage from OMERO binary repository and
# write the output to a log file (groups as columns, days as rows)
# - Dave Mason, University of Liverpool, CCI. April 2014
#
# Requires the binary data repository to be organised with the following root:
# %userName%_%userId% / %groupId% / %groupName% / %year%%month%%day% / %time%
# Set the location of the Binary Data Store:
REPO_DIR=/data/OMERO.data/ManagedRepository
# Where to store the log file:
#LOG_PATH=~/logOmeroUsage.csv
# Consider using a monthly log file:
LOG_PATH=~/$(date +%Y-%m)-logOmeroUsage.csv
# Set the working directory to come back to at the end
WORK_DIR="$(pwd)"
# Find the number of groups
cd $REPO_DIR
NUM_GROUPS="$(find . -maxdepth 2|awk -F"/" '{print $3}'|grep -ve repository|sort -u --version-sort|tail -n 1)"
# Get a comma separated list of directory sizes in order of gID
SIZE_VALUES="$(for ((n=3;n<=$NUM_GROUPS;n++)); do find -iname "$n" -exec du -c '{}' +|tail -n 1|awk -F" " '{print $1}' ; done|xargs|sed 's/ /,/g')"
# Update the titles to represent the most up to date group list (if no log exists - create one with titles)
LOG_HEADER=$(echo DateStamp,$(find . -mindepth 3 -maxdepth 3|grep -ve repository|awk -F"/" '{OFS="";print $3,"_",$4}'|sort -u --version-sort|xargs|sed 's/ /,/g'))
if [ -f $LOG_PATH ];
then
# Have to use a temporary file here as you can't cat the same file into itself
echo $LOG_HEADER|cat - $LOG_PATH|awk 'NR!=2'>$LOG_PATH.temp
mv $LOG_PATH.temp $LOG_PATH
else
echo $LOG_HEADER>$LOG_PATH
fi
# Date stamp and append to the log
echo $(date +%Y%m%d-%H%M),$SIZE_VALUES>>$LOG_PATH
# Return to the starting directory
cd $WORK_DIR