Page 1 of 1

Bio-Formats vs. Scifio???

PostPosted: Wed Mar 30, 2016 3:58 pm
by dsudar
Dear all,

I have been in discussion with a collaborator about them developing or supporting the development of a Bio-Formats reader for their complex file format. We all understand the benefits of having a general purpose and widely used image file reading solution such as Bio-Formats. The question from the collaborator was about Scifio and to paraphrase his question: "It appears that some developers are moving away from Bio-Formats towards Scifio for a variety of reasons (speed, multi-dimensionality, data size,...). Should we consider using Scifio instead of Bio-Formats especially since our type of data tends to be large, multi-dimensional, and that speed is important."

I honestly did not know how to answer that question but it points at an issue that our entire community should probably care about. Is the Scifio development one of those unfortunate parallel efforts that splinter our community? Or is Scifio mostly a pilot of better functionality that will be part of Bio-Formats itself very soon?

I look forward to getting an answer to the question from the collaborator and ideally a bit more clarity for myself on this topic.

Thanks,
- Damir

Re: Bio-Formats vs. Scifio???

PostPosted: Mon Apr 04, 2016 2:02 pm
by jrswedlow
Hi Damir-

Thanks for your question. It’s important to clarify the relationships between the Bio-Formats and SCIFIO projects as they now stand, and where they are going.

As you know, LOCI Madison, OME Dundee, and Glencoe Software have been collaborating on the Bio-Formats project since 2002. The result is a reasonably successful software library, with >30,000 sites using it and >100,000 start/day during 2015. Even with this success, no direct grant funding for Bio-Formats in Java has ever been awarded (yes, we have tried). We have one proposal pending in the UK to expand Bio-Formats’ capabilities (see [1] for info on that proposal).

The SCIFIO project was started in 2011 and is a generalization of Bio-Formats to encapsulate and expose the fundamental steps of scientific image reading, writing and translation. As you probably know, Bio-Formats is built against the OME Data Model [2]. While useful for the bioimaging community, the concept of I/O translation is generally useful, and SCIFIO seeks to enable the same concepts as Bio-Formats across a much wider range of application domains by making it easier to adapt I/O translation functions to a wide range of data models.

SCIFIO is neither a competing effort nor a pilot that will merge with Bio-Formats. Bio-Formats remains the gold standard for translating proprietary bioimage and biomedical formats to the OME-TIFF open exchange format. SCIFIO lays the groundwork for expanding the open science principles of Bio-Formats to new domains and alternate metadata standards.

Bio-Formats is a higher-level domain-specific (OME-XML) library. SCIFIO actually has a Bio-Formats plugin [3], which uses the Bio-Formats library for proprietary format reading. This allows Bio-Formats to be used automatically in applications using SCIFIO for image I/O e.g., ImageJ and KNIME Image Processing. In practical terms, if you use SCIFIO to read bioimaging data, then you use Bio-Formats file readers to interpret proprietary file formats. The low level I/O libraries in Bio-Formats and SCIFIO are different and may have advantages or disadvantages, depending on the specific application.

Regarding performance: One very important aspect of Bio-Formats functionality is contained in its setID function. As you well know, the vast range of proprietary file formats use many different file layout concepts, and Bio-Formats has to track all these and identify each one. Bio-Formats uses a concept of a fileset to identify and organize all the component files of any given file format. Depending on the file format, setID’s costs for identifying a fileset can be either low or high, depending on the complexity of the file layout chosen by the creator of the file format. Bio-Formats has been built as a stateless library-- this means that every read invokes setID. For the large datasets containing many 100s, 1000s or more files, this is quite expensive. We have attempted to reduce some of this burden by adding a caching mechanism in Bio-Formats, for file formats where we have good test data. If you have a file format where Bio-Formats performance is poorer than you expect, please do let us know-- and send us the data!!!

As far as choosing which library to use for developing your own applications, we don't see it as an "either/or" case. Our goal has always been to provide bridges so that advances in both libraries mutually benefit the community as a whole. Updates on the roadmap and goals of Bio-Formats and SCIFIO are available [1, 4, 5].

We hope that clarifies the situation. Do let us know if you have any further questions.

Best wishes,

Jason, Kevin, Curtis, Josh, Mark, Melissa, Sebastien

[1]
http://lists.openmicroscopy.org.uk/pipermail/ome-users/2016-February/005853.html
[2] http://www.openmicroscopy.org/site/support/ome-model
[3] https://github.com/scifio/scifio-bf-compat
[4] https://github.com/scifio/scifio/milestones
[5] https://github.com/scifio/scifio/wiki/FAQ

Re: Bio-Formats vs. Scifio???

PostPosted: Tue Apr 05, 2016 2:19 pm
by crueden
There is also a FAQ entry on the SCIFIO GitHub wiki:
https://github.com/scifio/scifio/wiki/F ... io-formats

The words are slightly dated compared to Jason's post above, though largely still accurate.

We're happy to clarify any remaining questions, both here and in the SCIFIO FAQ.