We're Hiring!

Read/Write/Memory Problems

Historical discussions about the Bio-Formats library. Please look for and ask new questions at https://forum.image.sc/tags/bio-formats
Please note:
Historical discussions about the Bio-Formats library. Please look for and ask new questions at https://forum.image.sc/tags/bio-formats

If you are having trouble with image files, there is information about reporting bugs in the Bio-Formats documentation. Please send us the data and let us know what version of Bio-Formats you are using. For issues with your code, please provide a link to a public repository, ideally GitHub.

Read/Write/Memory Problems

Postby kdean » Wed Jul 22, 2015 7:52 pm

Hello,

We use LabView to acquire data from multiple custom light-sheet microscopes, and we are having significant issues with both read/write speed and memory. I am not an expert in BioFormats, or computer programming in general, so please excuse my naivety.

Following data acquisition, LabView must reopen each individual image stack, read the temporary metadata, and write the cumulative OME metadata for the entire image sequence. Lately, the number of image planes that must be handled ranges from 50,000-300,000 image planes. This becomes prohibitively long to write the OME metadata, taking hours to days. Often times, we will cancel the OME rewrite process after it is complete with the first image so that we may continue with imaging. However, this is not a good solution.

Importantly, because we also have problems with the command line and matlab-based tools, I do not think that this is purely a LabView problem.

I have uploaded two representative files, 1_CAM01_000000.tif and 1_CAM01_000499.tif. The first file has the OME metadata, whereas the second does not. Each file is ~75 Mb. Because the second file does not have the OME metadata, it opens using the Matlab bfopen (~1.1 seconds). The first file, however, cannot open in Matlab despite increasing the java heap memory to the maximum amount. Below is the error that I receive.

...............................Reading IFDs
Populating metadata
Caught "std::exception" Exception message is:
Message Catalog MATLAB:services was not loaded from the file. Please check file location, format or contents
An error was encountered while saving the command history
java.io.FileNotFoundException: /Users/kdean/.matlab/R2014b/History.xml (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at com.mathworks.mde.cmdhist.AltHistoryCollection

Indeed, while keeping an eye on my memory in the Activity Monitor, bfopen is immensely memory-intensive. I am a bit lost why it takes so long to open the file, and I am beginning to believe that this may be a related issue as to why we cannot save the OME-TIFF information in the first place in a respectable amount of time.

Thank you, and I apologize if this is unclear.

Kevin
kdean
 
Posts: 8
Joined: Mon Dec 10, 2012 1:35 am

Re: Read/Write/Memory Problems

Postby sbesson » Wed Jul 22, 2015 9:53 pm

Hi Kevin,

thanks for the post and sharing the data. My feeling is that both issues you are encountering are separate: one is about with OME-TIFF writing performance while the second is about OME-TIFF reading performance.

For the reading problem described in this thread, the issue likely lives within the nature of bfopen since this function will both initialize a reader for the selected file and read all of its pixel data.

The 1_CAM01_000499.tif does not contain any OME-XML metadata and is initialized as a regular
TIFF file with the following dimensions:

Code: Select all
$ showinf -nopix /ome/apache_repo/11318/1_CAM01_000499.tif
Checking file format [Tagged Image File Format]
Initializing reader
TiffDelegateReader initializing /ome/apache_repo/11318/1_CAM01_000499.tif
Reading IFDs
Populating metadata
Checking comment style
Populating OME metadata
Initialization took 1.302s

Reading core metadata
filename = /ome/apache_repo/11318/1_CAM01_000499.tif
Series count = 1
Series #0 :
   Image count = 138
   RGB = false (1)
   Interleaved = false
   Indexed = false (false color)
   Width = 512
   Height = 512
   SizeZ = 1
   SizeT = 138
   SizeC = 1
   Thumbnail size = 128 x 128
   Endianness = intel (little)
   Dimension order = XYCZT (uncertain)
   Pixel type = uint16
   Valid bits per pixel = 16
   Metadata complete = true
   Thumbnail series = false


The 1_CAM01_000000.tif is initialized as an OME-TIFF file, including file grouping and has the following dimensions:

Code: Select all
$ showinf -nopix /ome/apache_repo/11318/1_CAM01_000000.tif
Checking file format [OME-TIFF]
Initializing reader
OMETiffReader initializing /ome/apache_repo/11318/1_CAM01_000000.tif
Reading IFDs
Populating metadata
Initialization took 4.67s

Reading core metadata
filename = /ome/apache_repo/11318/1_CAM01_000000.tif
Used files:
   /ome/apache_repo/11318/1_CAM01_000000.tif
   /ome/apache_repo/11318/1_CAM01_000499.tif
Series count = 1
Series #0 :
   Image count = 69000
   RGB = false (1)
   Interleaved = false
   Indexed = false (false color)
   Width = 512
   Height = 512
   SizeZ = 138
   SizeT = 500
   SizeC = 1
   Thumbnail size = 128 x 128
   Endianness = intel (little)
   Dimension order = XYZCT (certain)
   Pixel type = uint16
   Valid bits per pixel = 16
   Metadata complete = true
   Thumbnail series = false


While calling bfopen in MATLAB, using 1_CAM01_000499.tif would load 138 planes while using 1_CAM01_000000.tif would load 138x500 planes. This would likely explain the resource and file descriptor exhaustion you reported.

The MATLAB equivalent of the command above, which only initializes the reader without loading its pixels data is bfGetReader. Can you run the following commands:

Code: Select all
bfGetReader('/ome/apache_repo/11318/1_CAM01_000000.tif');
bfGetReader('/ome/apache_repo/11318/1_CAM01_000499.tif');


I would expect the first call to have an overhead but to return in a timely fashion.

Best,
Sebastien
User avatar
sbesson
Team Member
 
Posts: 421
Joined: Tue Feb 28, 2012 7:20 pm

Re: Read/Write/Memory Problems

Postby kdean » Wed Jul 22, 2015 10:43 pm

Hey Seb,

You are correct. The reading is significantly faster with bfGetReader.

1_CAM01_000000.tif completed in 5.067s, whereas 1_CAM01_000499.tif completed in 0.179s.

Both provided a strange output:

loci.formats.ChannelSeparator@170836e0

loci.formats.ChannelSeparator@4cb58c7f
kdean
 
Posts: 8
Joined: Mon Dec 10, 2012 1:35 am

Re: Read/Write/Memory Problems

Postby sbesson » Thu Jul 23, 2015 1:44 pm

Hi Kevin,

the output of the bfGetReader call should be an initialized reader of type ChannelSeparator. So the MATLAB output is expected and it looks like you have reasonable reading performance.

On the OME-TIFF writing side, while trying to assess the time it would take to embed the OME-XML metadata from the first TIFF file into the second TIFF file using Bio-Formats command line tools, I got the following error:

Code: Select all
$ tiffcomment 1_CAM01_000000.tif > 1_CAM01_000000.xml
$ tiffcomment -set 1_CAM01_000000.xml 1_CAM01_000499.tif
loci.formats.FormatException: Tag not found (IMAGE_DESCRIPTION)
sbesson@necromancer ~ $


So the IMAGE_DESCRIPTION tag which is required for embedding the OME-XML is not present in the original TIFF file and cannot be recreated. Is it possible from the acquisition side to have this TIFF tag created when the original TIFF files are saved?

Sebastien
User avatar
sbesson
Team Member
 
Posts: 421
Joined: Tue Feb 28, 2012 7:20 pm

Re: Read/Write/Memory Problems

Postby kdean » Thu Jul 23, 2015 4:26 pm

One quick note in response to "While calling bfopen in MATLAB, using 1_CAM01_000499.tif would load 138 planes while using 1_CAM01_000000.tif would load 138x500 planes. This would likely explain the resource and file descriptor exhaustion you reported."

For testing purposes, I downloaded only the _000000.tif and _000499.tif files from a remote server. Images _000001...488.tif were not on the machine at the time, nor was there an active connection to the remote server. In this case, I don't know where bfOpen was actually 'getting' the data for the intermediate image stacks...

I will have more info regarding the ImageDescription writing soon...

Kevin
kdean
 
Posts: 8
Joined: Mon Dec 10, 2012 1:35 am

Re: Read/Write/Memory Problems

Postby sbesson » Fri Jul 24, 2015 3:03 pm

Hi Kevin,

when creating the reader from the file containing the OME-XML metadata, the reader will use this metadata notably to determine the image dimensions in XYZTC and then try to group the files required to construct the fileset.

As you can see in my previous command, if only the first and last TIFF files of the fileset are present, then they files registered in the fileset:

Code: Select all
$ showinf -nopix /ome/apache_repo/11318/1_CAM01_000000.tif
...
Used files:
   /ome/apache_repo/11318/1_CAM01_000000.tif
   /ome/apache_repo/11318/1_CAM01_000499.tif
Series count = 1
...


Then when reading the pixels data from the image. using bfGetPlane(planeIndex) under MATLAB, if the TIFF file for the requested plane is not present, the function will return an array of size sizeX x sizeY filled with zeros.

Sebastien
User avatar
sbesson
Team Member
 
Posts: 421
Joined: Tue Feb 28, 2012 7:20 pm

Re: Read/Write/Memory Problems

Postby kdean » Mon Jul 27, 2015 8:33 pm

That makes perfect sense. Thank you, Seb. I can see why the computer would be uphappy performing the Matlab equivalent of data=zeros(512,512,180,500)...

After some analysis, it appears that the major bottleneck is XML generation, where the data is loaded into memory and the XML string is created. Moving forward, we will try to create the 'stub' XML file at the beginning of the data acquisition, and save the tiff files immediately with the stub XML file embedded within them. We will need to calculate the master XML UUID at the beginning of the data acquisition to do this.

The master XML file will need to be written after data acquisition is complete. This will be built up in parallel during the acquisition, and should be quick.
kdean
 
Posts: 8
Joined: Mon Dec 10, 2012 1:35 am

Re: Read/Write/Memory Problems

Postby mlinkert » Tue Jul 28, 2015 10:55 pm

Hi Kevin,

If you haven't already, you might consider using the BinaryOnly/MetadataOnly feature of OME-TIFF in this case; see:

https://www.openmicroscopy.org/site/sup ... ata-blocks
http://www.openmicroscopy.org/Schemas/D ... BinaryOnly

This would require writing the fully-assembled OME-XML once to a text file, with each of the OME-TIFF files having small XML stub that references that file. Depending upon the total number of files and the size of the complete XML, this may be faster than writing the XML to multiple TIFFs after acquisition.

-Melissa
User avatar
mlinkert
Team Member
 
Posts: 353
Joined: Fri May 29, 2009 2:12 pm
Location: Southwest Wisconsin


Return to User Discussion [Legacy]

Who is online

Users browsing this forum: Google [Bot] and 1 guest