stefanm wrote:Hello,
Hi Stefan,
we are testing OMERO 5.2.4 at the moment. What I noticed is, that image files imported using OMERO.insight are saved in the repository with a new change/modification date identical to the time, when the files were imported (not acquired!). Also the path within the managed repository reflects that point in time. Of course the acquisition time-stamp is retained in the metadata - at least for the data formats that actually hold an acquisition time-stamp. However, in a lab book all that matters is the acquisition time.
An example: I imported a file acquired on 2012-03-08 19:38:28. The change/modification date of the imported file in the repository was set to 2016-06-15 15:56:40.636 and consequently the path within the repository was
"ManagedRepository/stefanm/2016-06/15/15-56-40.636"
This has both a historical and a social component. Earlier versions of Java didn't provide a method for setting the modification time, and so to some extent, we didn't pursue options or feedback around timestamps. It's now possible, and so that's certainly something will need to do. See
https://trello.com/c/WN1Ihwhf/107-preserve-file-acquisition-times -- thanks for getting this started.
On the other hand, there's the question of trust and intent. Even a tool like rsync doesn't automatically copy the modification time, an extra option is required for that. This reflects that the operation is fundamentally a copy, as is `bin/omero import`. My assumption, though, is that we could provide a similar functionality to `rsync -a`. (I can imagine that there may need be some limitations put in place; for example, system administrators not necessarily wanting users to be able to have complete freedom with regard to file provenance.)
Moreover downloading the file later generates another change/modification date reflecting the export time.
When downloading from the CLI and/or Java, we could similarly attempt to set the value if the appropriate flag has been set, but I don't know if this will be possible from the web (See
https://bugzilla.mozilla.org/show_bug.cgi?id=178506)
When I discussed the behaviour with scientists over here, they simply said it's a show stopper. The main source for searching for data is their lab book and there the acquisition date is critical. And they expect that the file modification date reflects that time point. This is even more so when I think about them leaving the lab taking their data with them on a disk (that we would need to export first out of OMERO).
That being the case, I wonder if we'd not be better advised to record the modification time in the database itself, so it's queryable, etc. Even if OMERO properly implements a save-modification-time flag, there's always the possibility that when migrating between servers, the values will be lost.
I suspect that OMERO.fs using the native file system directly could be one solution. But I also think that in the managed repository the file modification date should not change as long as the data/image file was unchanged.
At the moment, fs does provide a (partial) workaround,
in-place import. It doesn't cover all the regular import workflows, but if you could use the `ln` (hardlink) or `ln_s`(symlink) options, then the original modification times would be preserved.
I know that importing files via OMERO.insight is strictly speaking not the same as a copying an image in a file system (where normally the modification time is preserved).
What OS / filesystem do you have in mind? I typically understand a copy (i.e. `cp`) to produce a new timestamp.
On the other hand if at the moment we would like to preserve the modification time we would need to keep a copy of the original data, which destroys the big advantage that OMERO5 now handles the native data format directly.
Agreed. I definitely see that "bin/omero import X && rm -rf X" currently poses a loss of information for you, and it's very much worth finding a way for that not to happen.
Moreover, we were considering of importing a lot of old data (several thousand files, some more than 4 years old) into OMERO. All the images would be time-stamped with their import date, which at least for us - clearly makes little or no sense.
For this kind of bulk operation, I'd definitely suggest looking into in-place import which may solve multiple problems (storage, timestamps, etc). And then for newer acquisitions, the hopefully short period of time between acquisition and import to OMERO would be less of an issue.
Am I missing something or could that be solved in future version of OMERO?
Best regards
Stefan
Let's hope so!
Cheers,
~Josh.