Olympus .oir subfiles
Posted: Tue Sep 05, 2017 8:13 am
I'm using OME recently added support of .oir files -- thank you so much for adding support.
When a .oir file reachs a size larger than 1Gb, Olympus automatically begins splitting the file into sub-files, i.e. 'datafile.oir' will contain time points 0 through 399, then datafile_00001 will contain time points 400 through 799, etc...).
It would be great if Bioformats automatically detected these sub-files and treated them as representing one large data set, but in the meantime, I'm trying to work around this by simply opening each of the sub-files individually.
The problem is I'm having trouble reading the sub-files successfully. I'm working through python and I'd ideally like to use the very nice PIMS package BioformatsReader. The reader has no problem reading primary .oir data files, but when it attempts to open a sub-file I always get an exception. The exception occurs when setId is called on the ChannelSeparator reader. For some reason, this causes an out-of-memory exception no matter how large the heap.
If I attempt to use the python-bioformats package instead of pims, I can successfully read both primary datafiles and sub-files. But in this case, the returned data is floating point (instead of int16), and I'm not sure how to appropriate scale the data.
Any thoughts on:
1) why the reader PIMS is using is crashing, or
2) why python-bioformats returns floating point data, and how to determine the appropiate way to rescale the data?
As an separate aside, both python-bioformats and Bioformats command line toolkit often throw the following exception when working with .oir files:
[Fatal Error] :1:35: Character reference "�" is an invalid XML character.
Thanks,
Aaron
When a .oir file reachs a size larger than 1Gb, Olympus automatically begins splitting the file into sub-files, i.e. 'datafile.oir' will contain time points 0 through 399, then datafile_00001 will contain time points 400 through 799, etc...).
It would be great if Bioformats automatically detected these sub-files and treated them as representing one large data set, but in the meantime, I'm trying to work around this by simply opening each of the sub-files individually.
The problem is I'm having trouble reading the sub-files successfully. I'm working through python and I'd ideally like to use the very nice PIMS package BioformatsReader. The reader has no problem reading primary .oir data files, but when it attempts to open a sub-file I always get an exception. The exception occurs when setId is called on the ChannelSeparator reader. For some reason, this causes an out-of-memory exception no matter how large the heap.
If I attempt to use the python-bioformats package instead of pims, I can successfully read both primary datafiles and sub-files. But in this case, the returned data is floating point (instead of int16), and I'm not sure how to appropriate scale the data.
Any thoughts on:
1) why the reader PIMS is using is crashing, or
2) why python-bioformats returns floating point data, and how to determine the appropiate way to rescale the data?
As an separate aside, both python-bioformats and Bioformats command line toolkit often throw the following exception when working with .oir files:
[Fatal Error] :1:35: Character reference "�" is an invalid XML character.
Thanks,
Aaron