Open Microscopy Environment

by **dwight** » Fri Aug 31, 2012 11:26 am

I have a few Olympys Fluoview files in which some fields have text with ampersands (the directory name of the saved image). Using bfconvert the file is saved as an OME-XML file, but it is not possible to read the converted file. For example, the showinf command aborts with a long stacktrace

Code: Select all: Exception in thread "main" loci.formats.FormatException: Malformed OME-XML at loci.formats.in.OMEXMLReader.initFile(OMEXMLReader.java:241) at loci.formats.FormatReader.setId(FormatReader.java:1178) at loci.formats.ImageReader.setId(ImageReader.java:727) at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:529) at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:988) at loci.formats.tools.ImageInfo.main(ImageInfo.java:1031) Caused by: java.io.IOException at loci.common.xml.XMLTools.parseXML(XMLTools.java:350) at loci.common.xml.XMLTools.parseXML(XMLTools.java:318) at loci.formats.in.OMEXMLReader.initFile(OMEXMLReader.java:237) ... 5 more Caused by: org.xml.sax.SAXParseException; lineNumber: 4274; columnNumber: 59; The reference to entity "iso_1" must end with the ';' delimiter. ...etc...

As far as I see, the problem lies in that the ampersand in the original OIB file gets into a OriginalMetaData Value tag and XML parser doesn't like single & characters.

I made a quick hack to solve the issue (for me at least) by modifying the sanitizeXML method to look for standalone & characters, without (hopefully) altering the rest of the output of the method. This included using a StringBuffer instead of a character array as the total length of the string being sanitized does not necessarily remain constant.

Code: Select all: diff --git a/components/common/src/loci/common/xml/XMLTools.java b/components/common/src/loci/common/xml/XMLTools.java index 6665abf..c00d74d 100644 --- a/components/common/src/loci/common/xml/XMLTools.java +++ b/components/common/src/loci/common/xml/XMLTools.java @@ -181,17 +181,29 @@ public final class XMLTools { /** Remove invalid characters from an XML string. */ public static String sanitizeXML(String s) { - final char[] c = s.toCharArray(); - for (int i=0; i<s.length(); i++) { - if (Character.isISOControl(c[i]) || - !Character.isDefined(c[i]) || c[i] > '~') - { - c[i] = ' '; + StringBuffer sb = new StringBuffer(); + for (int i=0; i < s.length(); i++) { + if (Character.isISOControl(s.charAt(i)) || + !Character.isDefined(s.charAt(i)) || s.charAt(i) > '~'){ + sb.append(' '); } // eliminate invalid &# sequences - if (i > 0 && c[i - 1] == '&' && c[i] == '#') c[i - 1] = ' '; + else if (i < s.length() - 1 && s.substring(i, i + 2).equals("&#")){ + sb.append(" #"); + i += 1; + } + else if (s.charAt(i) == '&'){ + if (i < s.length() - 4 && s.substring(i, i + 5).equals("&")){ + i += 4; + } + sb.append("&"); + } + else{ + sb.append(s.charAt(i)); + } + } - return new String(c); + return sb.toString(); } /** Indents XML to be more readable. */ diff --git a/components/scifio/src/loci/formats/services/OMEXMLServiceImpl.java b/components/scifio/src/loci/formats/services/OMEXMLServiceImpl.java index ba4fdb8..cab4d08 100644 --- a/components/scifio/src/loci/formats/services/OMEXMLServiceImpl.java +++ b/components/scifio/src/loci/formats/services/OMEXMLServiceImpl.java @@ -881,7 +881,7 @@ public class OMEXMLServiceImpl extends AbstractService implements OMEXMLService Element valueElement = document.createElementNS(ORIGINAL_METADATA_NS, "Value"); keyElement.setTextContent(key); - valueElement.setTextContent(value); + valueElement.setTextContent(XMLTools.sanitizeXML(value)); Element originalMetadata = document.createElementNS(ORIGINAL_METADATA_NS, "OriginalMetadata");

This could easily be extended to avoid other XML entities than & if necessary. Let me know if there's a better way of getting &-s out of my converted OME-XML files (other than not having them in the metadata in the first place

)

by **rleigh** » Mon Sep 03, 2012 1:09 pm

Hello,

I've opened a ticket for this here: https://trac.openmicroscopy.org.uk/ome/ticket/9572 and added you to the Cc, so that you will be notified of the progress of this ticket.

Regards,
Roger

by **dwight** » Wed Sep 05, 2012 2:13 pm

Thanks for letting me know about the ticket. However, even though the ticket has now been marked 'fixed' the fix does not solve the problem for my file. I'm not sure if the bug report actually corresponds to the issue I have (which is probably why my problem doesn't get fixed by the update). In the ticket the & character is added to an element attribute. In my case the offending & is not in an element attribute but, rather, the value itself:

Code: Select all: <XMLAnnotation ID="Annotation:847"> <Value> <OriginalMetadata xmlns="openmicroscopy.org/OriginalMetadata"> <Key>[File Info] Path</Key> <Value>D:/FV10-ASW/Users/confo/Image/120807/a&b_1/</Value></OriginalMetadata></Value></XMLAnnotation>

Apologies for not being specific enough earlier.

by **mlinkert** » Wed Sep 05, 2012 2:55 pm

Thanks for letting me know about the ticket. However, even though the ticket has now been marked 'fixed' the fix does not solve the problem for my file. I'm not sure if the bug report actually corresponds to the issue I have (which is probably why my problem doesn't get fixed by the update). In the ticket the & character is added to an element attribute. In my case the offending & is not in an element attribute but, rather, the value itself:

What is fixed by that ticket is two things:

* properly escaped '&' (i.e. '&') values are now read correctly
* '&' in the metadata values will now be correctly written as '&'

If you had previously written files that contained an unescaped '&', those will still not be readable (as it is invalid XML), but if you re-generate the OME-XML using the latest build then the new OME-XML should be correctly escaped and readable.

by **dwight** » Wed Sep 05, 2012 10:20 pm

Yes, I understand that the ome-xml file would have to be regenerated. I have done that and the error message persists. Also, at least when I run the bfconvert command, the sanitizeXML method does not even get called, so I am not sure how this one line fix can do anything about invalid characters. If you want I can upload the file causing the problem so you can try it for yourself. For the sake of completeness, this is the output from converting the file:

Code: Select all: ./bfconvert Image0019.oib out.ome Image0019.oib Initializing helper readers Reading additional metadata Populating metadata Reading bitmap header Populating metadata Unknown LaserMedium value 'fluo-3' will be stored as "Other" [Olympus FV1000] -> out.ome [OME-XML] Series 0: converted 1/1 planes (100%) Series 1: converted 1/1 planes (100%) [done] 11.872s elapsed (27.0+138.0ms per plane, 524ms overhead)

The output is identical with or without a

Code: Select all: System.out.println("In sanitizeXML")

in the sanitizeXML method.

The output when trying to use showinf on the file gives:

Code: Select all: ./showinf out.ome Checking file format [OME-XML] Initializing reader Exception in thread "main" loci.formats.FormatException: Malformed OME-XML at loci.formats.in.OMEXMLReader.initFile(OMEXMLReader.java:241) at loci.formats.FormatReader.setId(FormatReader.java:1178) at loci.formats.ImageReader.setId(ImageReader.java:727) at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:529) at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:988) at loci.formats.tools.ImageInfo.main(ImageInfo.java:1031) Caused by: java.io.IOException at loci.common.xml.XMLTools.parseXML(XMLTools.java:338) at loci.common.xml.XMLTools.parseXML(XMLTools.java:306) at loci.formats.in.OMEXMLReader.initFile(OMEXMLReader.java:237) ... 5 more Caused by: org.xml.sax.SAXParseException; lineNumber: 4274; columnNumber: 59; The reference to entity "iso_1" must end with the ';' delimiter. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:391) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1404) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1826) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3009) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:625) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:488) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:819) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:748) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1208) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:525) at javax.xml.parsers.SAXParser.parse(SAXParser.java:392) at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) at loci.common.xml.XMLTools.parseXML(XMLTools.java:330)

The offending line I pasted in my previous post. I even made a pristine clone of bioformats straight from the repository just in case I had some old stuff hanging around from somewhere.

by **mlinkert** » Tue Sep 11, 2012 2:37 am

Ah, I see the problem. Nearly everything was fixed by the ticket mentioned previously, but there was one lingering bug in the OME-XML writer which is fixed here:

https://github.com/melissalinkert/biofo ... 386005c1f0

This only showed up when actually fully converting files to OME-XML; converting to OME-TIFF or just generating the OME-XML metadata for a file did not exhibit the problem. So, if you checkout the 'sprint5-bug-fixes' branch of the above repository and rebuild, then OME-XML files converted with the new build should be readable.

As an aside, is there a reason why you are converting to OME-XML and not OME-TIFF? We do recommend that OME-TIFF is used instead of OME-XML, as the image data is stored in a much nicer fashion (and both formats store the same exact metadata).

Open Microscopy Environment

Unescaped ampersands in bfconvert output

Unescaped ampersands in bfconvert output

Re: Unescaped ampersands in bfconvert output

Re: Unescaped ampersands in bfconvert output

Re: Unescaped ampersands in bfconvert output

Re: Unescaped ampersands in bfconvert output

Re: Unescaped ampersands in bfconvert output

Who is online