Lipplab wrote:I would like to check integrity between the db and the repository.
This aims - in particular - on the question whether (i) all image pointers in the db actually point to corresponding data and (ii) (if possible) whether each piece of imaging data in the repository has a "link" to itself in the db.
Hi Peter
We had an issue some time ago that required a tool like that. Omero
covers #2 and Josh already replied. For #1 we never got around
creating a script but I have the series of commands and psql queries
used. There is a simple python script used on the commands below named
check-omero-datawhich we have online (note that will require have omeropy on
PYTHONPATH).
- Code: Select all
## Get list of pixel ids (filenames in Pixels) that are missing.
$ psql -d omerodb -c \
"COPY (SELECT id
FROM pixels
WHERE repo IS NULL OR repo = '')
TO STDOUT" | ./check-omero-data /srv/OMERO/ Pixels -
## Get list of originalfile ids (filenames in Files) that are missing.
$ psql -d omerodb -c \
"COPY (SELECT id
FROM originalfile
WHERE (repo IS NULL OR repo = '')
AND mimetype != 'Repository')
TO STDOUT" | ./check-omero-data /mnt/OMERO/ Files -
## This will include omero scripts of previous omero installs and you
## may want to ignore those
## http://lists.openmicroscopy.org.uk/pipermail/ome-users/2016-September/006172.html
## You can ignore all with mimetype 'text/x-python' or, if you are
## afraid of having other python files that are not the official omero
## scripts, you can filter them out later. This will show the python
## files only.
$ psql -d omerodb -c \
"COPY (SELECT f.id, f.hash, f.name
FROM originalfile f JOIN checksumalgorithm h on f.hasher = h.id
WHERE f.mimetype = 'text/x-python' AND f.repo IS null)
TO STDOUT"
This only looks into the files inside the Files and Pixels
directories. It checks nothing within ManagedRepositories.
I have a bunch of other notes, SQL commands, that I used to create
reports on the missing files. These include getting list of checksums
of the missing files and looking for duplicates on omero (seems common
for different people from the same lab to upload the same image),
dates of the missing images, and number of missing images per group
and person. I can share them with you if you need, I just need to
organize them as they are an absolute mess of snippets with cryptic
comments. Let me know and good luck.