Unescaped ampersands in bfconvert output
Posted: Fri Aug 31, 2012 11:26 am
I have a few Olympys Fluoview files in which some fields have text with ampersands (the directory name of the saved image). Using bfconvert the file is saved as an OME-XML file, but it is not possible to read the converted file. For example, the showinf command aborts with a long stacktrace
As far as I see, the problem lies in that the ampersand in the original OIB file gets into a OriginalMetaData Value tag and XML parser doesn't like single & characters.
I made a quick hack to solve the issue (for me at least) by modifying the sanitizeXML method to look for standalone & characters, without (hopefully) altering the rest of the output of the method. This included using a StringBuffer instead of a character array as the total length of the string being sanitized does not necessarily remain constant.
This could easily be extended to avoid other XML entities than & if necessary. Let me know if there's a better way of getting &-s out of my converted OME-XML files (other than not having them in the metadata in the first place )
- Code: Select all
Exception in thread "main" loci.formats.FormatException: Malformed OME-XML
at loci.formats.in.OMEXMLReader.initFile(OMEXMLReader.java:241)
at loci.formats.FormatReader.setId(FormatReader.java:1178)
at loci.formats.ImageReader.setId(ImageReader.java:727)
at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:529)
at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:988)
at loci.formats.tools.ImageInfo.main(ImageInfo.java:1031)
Caused by: java.io.IOException
at loci.common.xml.XMLTools.parseXML(XMLTools.java:350)
at loci.common.xml.XMLTools.parseXML(XMLTools.java:318)
at loci.formats.in.OMEXMLReader.initFile(OMEXMLReader.java:237)
... 5 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 4274; columnNumber: 59; The reference to entity "iso_1" must end with the ';' delimiter.
...etc...
As far as I see, the problem lies in that the ampersand in the original OIB file gets into a OriginalMetaData Value tag and XML parser doesn't like single & characters.
I made a quick hack to solve the issue (for me at least) by modifying the sanitizeXML method to look for standalone & characters, without (hopefully) altering the rest of the output of the method. This included using a StringBuffer instead of a character array as the total length of the string being sanitized does not necessarily remain constant.
- Code: Select all
diff --git a/components/common/src/loci/common/xml/XMLTools.java b/components/common/src/loci/common/xml/XMLTools.java
index 6665abf..c00d74d 100644
--- a/components/common/src/loci/common/xml/XMLTools.java
+++ b/components/common/src/loci/common/xml/XMLTools.java
@@ -181,17 +181,29 @@ public final class XMLTools {
/** Remove invalid characters from an XML string. */
public static String sanitizeXML(String s) {
- final char[] c = s.toCharArray();
- for (int i=0; i<s.length(); i++) {
- if (Character.isISOControl(c[i]) ||
- !Character.isDefined(c[i]) || c[i] > '~')
- {
- c[i] = ' ';
+ StringBuffer sb = new StringBuffer();
+ for (int i=0; i < s.length(); i++) {
+ if (Character.isISOControl(s.charAt(i)) ||
+ !Character.isDefined(s.charAt(i)) || s.charAt(i) > '~'){
+ sb.append(' ');
}
// eliminate invalid &# sequences
- if (i > 0 && c[i - 1] == '&' && c[i] == '#') c[i - 1] = ' ';
+ else if (i < s.length() - 1 && s.substring(i, i + 2).equals("&#")){
+ sb.append(" #");
+ i += 1;
+ }
+ else if (s.charAt(i) == '&'){
+ if (i < s.length() - 4 && s.substring(i, i + 5).equals("&")){
+ i += 4;
+ }
+ sb.append("&");
+ }
+ else{
+ sb.append(s.charAt(i));
+ }
+
}
- return new String(c);
+ return sb.toString();
}
/** Indents XML to be more readable. */
diff --git a/components/scifio/src/loci/formats/services/OMEXMLServiceImpl.java b/components/scifio/src/loci/formats/services/OMEXMLServiceImpl.java
index ba4fdb8..cab4d08 100644
--- a/components/scifio/src/loci/formats/services/OMEXMLServiceImpl.java
+++ b/components/scifio/src/loci/formats/services/OMEXMLServiceImpl.java
@@ -881,7 +881,7 @@ public class OMEXMLServiceImpl extends AbstractService implements OMEXMLService
Element valueElement =
document.createElementNS(ORIGINAL_METADATA_NS, "Value");
keyElement.setTextContent(key);
- valueElement.setTextContent(value);
+ valueElement.setTextContent(XMLTools.sanitizeXML(value));
Element originalMetadata =
document.createElementNS(ORIGINAL_METADATA_NS, "OriginalMetadata");
This could easily be extended to avoid other XML entities than & if necessary. Let me know if there's a better way of getting &-s out of my converted OME-XML files (other than not having them in the metadata in the first place )