We're Hiring!

Accessing Multiple Images (aka Series) programmatically

Historical discussions about the Bio-Formats library. Please look for and ask new questions at https://forum.image.sc/tags/bio-formats
Please note:
Historical discussions about the Bio-Formats library. Please look for and ask new questions at https://forum.image.sc/tags/bio-formats

If you are having trouble with image files, there is information about reporting bugs in the Bio-Formats documentation. Please send us the data and let us know what version of Bio-Formats you are using. For issues with your code, please provide a link to a public repository, ideally GitHub.

Accessing Multiple Images (aka Series) programmatically

Postby waxenegger » Thu May 21, 2015 11:25 am

Hi,

I'm writing a converter to a very simple custom byte stream file format.

I quite like the abstraction of the ImageReader, i.e. the IFormatReader, nevertheless I find there is somewhat funky behavior when it comes to formats/files that entail multiple files eventually.

I noticed the first time when I was contemplating tossing a zip at the readers setId() and of course it read it and that was all well and good and it has -from my understanding/debugging- an embedded image reader in it, namely of the first supported format. Now, here is where I'm getting puzzled and perhaps I'm just missing a detail. The encapsulation of the class does not let me have a go at the reader private member so I can only go via openBytes but that method does not give me meta-info regarding important image info of the file.

Furthermore, while the reader has a getUsedFiles and similiar methods, the files are stuffed into a Location class which -and I did not double check that- it occurred to me could contain the files of another zip potentially. So, again, I was prevented by encapsulation, when running through the Location files map to verify which originating zip they came from to ensure that I would not use a wrong one.

Either way, I abandoned the zip reader in favor of manually looping over the zip entries and instantiating the readers once extracted. But soon I hit a similiar issue:

I have a series of OME Tiffs incl an OME XML. The question I had from just looking at the Reader was: which to use to attack the entire series? I tried, both, the XML as well as the first tiff in the series. They both populated the image meta data nicely but once again when reading I was facing the dilemma of a getBytes that in the case of the xml returned a buffer that was obviously corresponding correctly in length with X x Y x bit depth but I was hoping in vain for it to be anything but initialized (automatically by java). So perhaps my bad and I should have not expected the actual image data from tossing the XML at the Reader but it is a tiny bit misleading to get back an array that hints (in size) at being the first image but isn't. Then trying the first OME Tiff I had a slightly more promising result: It did give me the image data but, only for that first file, which is once again perhaps a misconception of mine given that it does not -per se- know of the other files in the series, I presume, the XML file would have, however, yet it did not do it either.

So, in summation, do I expect too much of the reader, given the interface? Is it really only reading the one file given to it, even if that file may contain xml that is definitely parsed (I saw that much in the code) and does contain the file locations of all files that are part of the series.
If so I'd like to know whether I have to extract the file locations from the xml "manually" (as I had to in the case of a zip) which is not a problem at all? Also, I could not find the the file locations with getUsedFiles() or any other similarly named function which would have suggested itself given the interface.

Thanks in advance for any clarification provided,

Harald
User avatar
waxenegger
 
Posts: 12
Joined: Wed May 20, 2015 10:20 pm

Re: Accessing Multiple Images (aka Series) programmatically

Postby waxenegger » Fri May 22, 2015 12:25 am

alright, to make this a bit more tangible:

this bit is from the loci.formats.in.OMEXMLReader:

Code: Select all
  public byte[] openBytes(int no, byte[] buf, int x, int y, int w, int h)
    throws FormatException, IOException
  {
    if (binDataOffsets.size() == 0) return buf;

...


binDataOffsets is in fact empty and that may very well be ok given the format BUT:
is there a way to get to the DOM content and the individual files (incl. pixels)?
they are parsed in the process, various meta-data is constructed (btw I could not find the source for ome.xml.model.OME and some others in git, well not for 5.1.1 that is), but struggle to get a handle on any of these objects.

Would be nice to get to ome.xml.model.Image and then ome.xml.model.Plane.
Eventually I'd like to get to the pixels but, at a minimum I need the the files and at the moment I don't know how to do so unless I parsed the XML myself to then feed them to the TiffReader individually.
User avatar
waxenegger
 
Posts: 12
Joined: Wed May 20, 2015 10:20 pm

Re: Accessing Multiple Images (aka Series) programmatically

Postby jmoore » Fri May 22, 2015 10:16 am

Hi Harald,

waxenegger wrote:I noticed the first time when I was contemplating tossing a zip at the readers setId() and of course it read it and that was all well and good and it has -from my understanding/debugging- an embedded image reader in it, namely of the first supported format. Now, here is where I'm getting puzzled and perhaps I'm just missing a detail. The encapsulation of the class does not let me have a go at the reader private member so I can only go via openBytes but that method does not give me meta-info regarding important image info of the file.


Thanks for pointing this out; I've filed a TODO. Do note that if you want to access it in code in the mean time, you can do the equivalent of this Jython:
Code: Select all
import loci
z = loci.formats.in.ZipReader()
z.setId("c.zip")
r = z.getClass().getDeclaredField("reader")
r.setAccessible(True)
r = r.get(z)
print r.getUsedFiles()


Furthermore, while the reader has a getUsedFiles and similiar methods, the files are stuffed into a Location class which -and I did not double check that- it occurred to me could contain the files of another zip potentially. So, again, I was prevented by encapsulation, when running through the Location files map to verify which originating zip they came from to ensure that I would not use a wrong one.


This doesn't sound like it could happen. Each ZipReader will only contain the files from its invocation of `setId` (see above). What the ZipReader doesn't currently support is reading multiple filesets from a single ZipFile. For example, my above "c.zip" comes from these commands in bash:
Code: Select all
  507  touch a.fake
  508  touch b.fake
  509  zip c.zip {a,b}.fake


The result of the print statement is:
Code: Select all
array(java.lang.String, [u'/opt/ome0/components/bioformats/tools/a.fake'])


i.e. only one of the two files (the first) was detected. Therefore all files in a single zip should be printed in the call to getUsedFiles on the internal reader OR something is being missed.

Either way, I abandoned the zip reader in favor of manually looping over the zip entries and instantiating the readers once extracted. But soon I hit a similiar issue:

I have a series of OME Tiffs incl an OME XML. The question I had from just looking at the Reader was: which to use to attack the entire series? I tried, both, the XML as well as the first tiff in the series. They both populated the image meta data nicely but once again when reading I was facing the dilemma of a getBytes that in the case of the xml returned a buffer that was obviously corresponding correctly in length with X x Y x bit depth but I was hoping in vain for it to be anything but initialized (automatically by java). So perhaps my bad and I should have not expected the actual image data from tossing the XML at the Reader but it is a tiny bit misleading to get back an array that hints (in size) at being the first image but isn't. Then trying the first OME Tiff I had a slightly more promising result: It did give me the image data but, only for that first file, which is once again perhaps a misconception of mine given that it does not -per se- know of the other files in the series, I presume, the XML file would have, however, yet it did not do it either.


In this case, all I can say is "it depends" (on valid XML, etc.) Where is this OME-XML + TIFFs coming from? Could you share a file listing and perhaps the the XML if not the whole dataset? (see the link to "send us the data" above).

So, in summation, do I expect too much of the reader, given the interface? Is it really only reading the one file given to it, even if that file may contain xml that is definitely parsed (I saw that much in the code) and does contain the file locations of all files that are part of the series.


Each file type in Bio-Formats defines a different structure that it supports. (See https://www.openmicroscopy.org/site/support/bio-formats/formats/dataset-table.html for more information) If the OME-XML doesn't reference the OME-TIFFs correctly, then yes, the file will be picked up on its own.

If so I'd like to know whether I have to extract the file locations from the xml "manually" (as I had to in the case of a zip) which is not a problem at all?


Again, you shouldn't, but it's going to depend on how this is laid out.

cheers,
~Josh.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Accessing Multiple Images (aka Series) programmatically

Postby waxenegger » Sun May 24, 2015 11:36 am

Thank you for clearing things up, josh.

I will delve deeper into the file specification to then double check whether the XML is such that it lists the files the proper way.

On a side note, the zip reader "getUsed" comment I might have mixed up with the OME-XML or tiffs, that is that is works as you illustrated. I tried zip first and then the others so that I could have easily gotten confused and, wrongly, assumed that the behavior was the same for zip.

So if the TODO you initiated gives access to the files/readers involved (for any sort of multi file arrangement really) that's definitely very useful in the future. Of course I realize that I still have to rely on the file format supporting this info and the presence/correctness of that meta-info. But if it's there it would make my life (and I'm pretty sure others) much easier since I can then conveniently work with the ImageReader and "insist" on people handing me the one file that will be the head of the linked list so to speak.

For the time being I'll try the reflection method with setAccessible in java itself and hopefully the SecurityManger won't rain on my parade. I cannot be bothered to switch to python/jython, sorry ;-)
User avatar
waxenegger
 
Posts: 12
Joined: Wed May 20, 2015 10:20 pm

Re: Accessing Multiple Images (aka Series) programmatically

Postby jmoore » Mon May 25, 2015 9:13 am

waxenegger wrote:So if the TODO you initiated gives access to the files/readers involved (for any sort of multi file arrangement really) that's definitely very useful in the future. Of course I realize that I still have to rely on the file format supporting this info and the presence/correctness of that meta-info. But if it's there it would make my life (and I'm pretty sure others) much easier since I can then conveniently work with the ImageReader and "insist" on people handing me the one file that will be the head of the linked list so to speak.


Understand, completely.

For the time being I'll try the reflection method with setAccessible in java itself and hopefully the SecurityManger won't rain on my parade. I cannot be bothered to switch to python/jython, sorry ;-)


What application will you be using this code in? I wouldn't foresee any SecurityManager issues, unless you work in a fairly strict environment. Use of jython was solely to make pasting on the forums simpler! :)

Cheers,
~Josh
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Accessing Multiple Images (aka Series) programmatically

Postby waxenegger » Tue May 26, 2015 5:54 am

I apologize for being persistent/annoying...

Now I checked whether the xml was well-formed (the parser would have thrown an exception anyhow) and the schema and it was old but,ok, in a sense that it got parsed and things get read/members populated.

The thing that I lament a bit is that from my point of view (debugging), the OME XML Reader seems to be throwing away parsed meta-data:

Code: Select all
protected void initFile(String id) throws FormatException, IOException {
    super.initFile(id);

    in = new RandomAccessInputStream(id);
    in.setEncoding("ASCII");
    binData = new Vector<BinData>();
    binDataOffsets = new Vector<Long>();
    compression = new Vector<String>();

    DefaultHandler handler = new OMEXMLHandler();
    try {
      RandomAccessInputStream s = new RandomAccessInputStream(id);
      XMLTools.parseXML(s, handler);
      s.close();
    }
    catch (IOException e) {
      throw new FormatException("Malformed OME-XML", e);
    }

    ...

    LOGGER.info("Populating metadata");

    OMEXMLMetadata omexmlMeta;
    OMEXMLService service;
    try {
      ServiceFactory factory = new ServiceFactory();
      service = factory.getInstance(OMEXMLService.class);
      omexmlMeta = service.createOMEXMLMetadata(omexml);
    }
    catch (DependencyException de) {
      throw new MissingLibraryException(OMEXMLServiceImpl.NO_OME_XML_MSG, de);
    }
    catch (ServiceException se) {
      throw new FormatException(se);
    }

    hasSPW = omexmlMeta.getPlateCount() > 0;
    addGlobalMeta("Is SPW file", hasSPW);

    // TODO
    //Hashtable originalMetadata = omexmlMeta.getOriginalMetadata();
    //if (originalMetadata != null) metadata = originalMetadata;

    int numDatasets = omexmlMeta.getImageCount();

    int oldSeries = getSeries();
    core.clear();
    for (int i=0; i<numDatasets; i++) {
      CoreMetadata ms = new CoreMetadata();
      core.add(ms);
...
      }
    }
    setSeries(oldSeries);

    // populate assigned metadata store with the
    // contents of the internal OME-XML metadata object
    MetadataStore store = getMetadataStore();
    service.convertMetadata(omexmlMeta, store);
    MetadataTools.populatePixels(store, this, false, false);
  }



Roughly, it seems to parse the XML, create an instance of OMEXMLMetadata, create an instance of CoreMetaData and in the end claims to populate a metadata store (last 3 lines).

Now the first 2 metadata objects get created, and the core meta data which contains important information (sizes, pixel type, etc) is also available to me later through the getSize... and other FormatReader methods. I suppose via reflection I could also get to the core private member which is not necessary since I have the getters.

BUT, task number three [MetaDataStore - the line: service.convertMetadata(omexmlMeta, store);] does not quite do what the comment claims it would. It wants to copy over/set the meta data but the issue is that when the line "service.convertMetadata(omexmlMeta, store);" executes the store object is of type loci.formats.meta.DummyMetadata. Once the initFile method return I seem to therefore be left with nothing but a dummy object and the core meta data which does not contain the desired info.

Perhaps I am wrong but it seems to me that the initFile method of the OMEXMLReader was meant to set the meta data it has already instantiated along the way and stored in the local omexmlMeta variable (via setMetadataStore(omexmlMeta) perhaps?) so that when getMetadata() is called afterwards it would return the OMEXMLServiceImpl instead of the dummy.

As soon as I force this before in code, I do in fact have a MetadataStore that is of the type desired that I can then use:
Code: Select all
  ServiceFactory serviceFactory = new ServiceFactory();
                 OMEXMLService omexmlService =
                   serviceFactory.getInstance(OMEXMLService.class);
                 IMetadata meta = omexmlService.createOMEXMLMetadata();
             
             try {
                reader.setMetadataStore(meta);
                                reader.setId(fileToBeProcessed);
...


Problem with this workaround is that it is ugly and I loose generality/genericness since I need to be agnostic of the file format before setId() which itself determines the type/instance anyhow.

I still think it would be very beneficial to override the getUsedFiles for formats that contain multiple files such as the OMEXML and Tiffs so that after parsing that info it is available to users of the library.
User avatar
waxenegger
 
Posts: 12
Joined: Wed May 20, 2015 10:20 pm

Re: Accessing Multiple Images (aka Series) programmatically

Postby jmoore » Tue May 26, 2015 8:45 am

waxenegger wrote:I apologize for being persistent/annoying...


Not at all. That's what we're here for. We're just in the middle of the 5.1.2 release and then preparations for our annual users' meeting. It may take a bit of time to get back to you.

Thanks for bearing with us.
~Josh.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Accessing Multiple Images (aka Series) programmatically

Postby waxenegger » Tue May 26, 2015 11:07 pm

no problem, it's not super urgent, I can always code around things meanwhile.

Something occurred to me right now...

The XML I have comprises only one Image element with a lot of TiffData elements in a Pixel element where each of them have a UUID incl. filename attribute.

Code: Select all
<OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2011-06" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2011-06 http://www.openmicroscopy.org/Schemas/OME/2011-06/ome.xsd" UUID="urn:uuid:105AC197-8C0B-48CE-BD81-851F54BDD51B">

...

<Image ID="Image:1430780852" Name="Capture 2 - Position 0 [6]"><AcquiredDate>2015-05-05T09:45:27</AcquiredDate><Description></Description><ObjectiveSettings ID="Objective:1" Medium="Water"/>
  <Pixels ID="Pixels:1" DimensionOrder="XYZCT" Type="uint16" SizeX="2048" SizeY="2048" SizeZ="807" SizeC="1" SizeT="6" PhysicalSizeX="0.156" PhysicalSizeY="0.156" PhysicalSizeZ="0.330">
    <Channel ID="Channel:21" AcquisitionMode="Other" IlluminationType="Other" Color="65280" EmissionWavelength="500" Name="488_SC_S wide" SamplesPerPixel="1"><DetectorSettings ID="Detector:1" Binning="1x1" Integration="1" AmplificationGain="-1" Gain="-1"/></Channel>
    <TiffData FirstC="0" FirstZ="0" FirstT="0" IFD="0" PlaneCount="1"><UUID FileName="Capture 2 - Position 0 [6]_XY1430780852_Z000_T0_C0.tiff">urn:uuid:105AC197-8C0B-48CE-BD81-851F54BDD51B</UUID></TiffData><TiffData FirstC="0" FirstZ="1" FirstT="0" IFD="0" PlaneCount="1"><UUID FileName="Capture 2 - Position 0 [6]_XY1430780852_Z001_T0_C0.tiff">urn:uuid:384D72AE-09D4-4E01-8CFE-295C6D61A071</UUID></TiffData><TiffData FirstC="0" FirstZ="2" FirstT="0" IFD="0" PlaneCount="1"><UUID FileName="Capture 2 - Position 0 [6]_XY1430780852_Z002_T0_C0.tiff">urn:uuid:64DB93DE-6FE2-4EDE-92FB-6E0091513B5F</UUID></TiffData>

....


As far as the schema goes that is a perfectly good configuration but I could imagine that is the reason bio-formats treats them a single image rather than the multiple I (probably wrongly assumed). The TiffData elements are multiple slices so that for all intents and purposes the whole of them constitute a 3D image (spatial) + T. Is it wrong then to treat them as a single image as far as meta-data goes? In a way the question is rhetorical, the sld says no, and I was handed the data and will therefore need to process it in that configuration.


Regardless of the above, the one remark (my previous post) on meta-data not being set within the initFile method of the OMEXMLReader and OMETiffReader is probably worth having a look at. Intuitively it seems odd to me that it would return a dummy implementation after setId() completion.
User avatar
waxenegger
 
Posts: 12
Joined: Wed May 20, 2015 10:20 pm

Re: Accessing Multiple Images (aka Series) programmatically

Postby jmoore » Wed May 27, 2015 11:56 am

A quick intermediate answer:

waxenegger wrote:As far as the schema goes that is a perfectly good configuration but I could imagine that is the reason bio-formats treats them a single image rather than the multiple I (probably wrongly assumed). The TiffData elements are multiple slices so that for all intents and purposes the whole of them constitute a 3D image (spatial) + T. Is it wrong then to treat them as a single image as far as meta-data goes?


Perhaps it depends what you mean by "Image". In OME terminology, this can be a 5 dimensional structure with many timepoints, z-slices, and channels.

~Josh.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Accessing Multiple Images (aka Series) programmatically

Postby mlinkert » Tue Jun 16, 2015 3:01 pm

Thanks again for the feedback, and our apologies for the delay.

BUT, task number three [MetaDataStore - the line: service.convertMetadata(omexmlMeta, store);] does not quite do what the comment claims it would. It wants to copy over/set the meta data but the issue is that when the line "service.convertMetadata(omexmlMeta, store);" executes the store object is of type loci.formats.meta.DummyMetadata. Once the initFile method return I seem to therefore be left with nothing but a dummy object and the core meta data which does not contain the desired info.

Perhaps I am wrong but it seems to me that the initFile method of the OMEXMLReader was meant to set the meta data it has already instantiated along the way and stored in the local omexmlMeta variable (via setMetadataStore(omexmlMeta) perhaps?) so that when getMetadata() is called afterwards it would return the OMEXMLServiceImpl instead of the dummy.

As soon as I force this before in code, I do in fact have a MetadataStore that is of the type desired that I can then use:

Code: Select all
ServiceFactory serviceFactory = new ServiceFactory();
                 OMEXMLService omexmlService =
                   serviceFactory.getInstance(OMEXMLService.class);
                 IMetadata meta = omexmlService.createOMEXMLMetadata();
             
             try {
                reader.setMetadataStore(meta);
                                reader.setId(fileToBeProcessed);


Problem with this workaround is that it is ugly and I loose generality/genericness since I need to be agnostic of the file format before setId() which itself determines the type/instance anyhow.

I still think it would be very beneficial to override the getUsedFiles for formats that contain multiple files such as the OMEXML and Tiffs so that after parsing that info it is available to users of the library.


The few lines of code above are not a workaround - that's the correct way of setting up a MetadataStore to be used with a reader. That is format independent, and does not require you to know the format of the file being passed to setId in advance, so you will not lose any generality there.

Do note that the terminology here is slightly tricky. OMEXMLReader refers to reading data from an OME-XML file; OMEXMLService/OMEXMLMetadata refers to using objects from the ome.xml.model package (http://downloads.openmicroscopy.org/bio ... mmary.html) to store metadata in a format-independent manner, for querying or when exporting a file to a different format.

If you haven't already, I would suggest reading through some of the documentation and examples, specifically:

http://www.openmicroscopy.org/site/supp ... processing
http://www.openmicroscopy.org/site/supp ... d-concepts
http://www.openmicroscopy.org/site/supp ... xport.html

If any of that doesn't make sense, or you have any further questions, please don't hesitate to let us know.

Regards,
-Melissa
User avatar
mlinkert
Team Member
 
Posts: 353
Joined: Fri May 29, 2009 2:12 pm
Location: Southwest Wisconsin

Next

Return to User Discussion [Legacy]

Who is online

Users browsing this forum: Google [Bot] and 1 guest