[icedtea-web] RFC: use tagsoup to try and parse malformed JNLP files

Omair Majid omajid at redhat.com
Mon Jan 10 13:39:38 PST 2011


Hi,

I have come across a number of JNLP files that are not valid xml. Netx
can not parse these files using a xml parser, and fails to run them. I
spent some time looking for a solution and came across TagSoup[1]. The
TagSoup library parses a malformed HTML document into a well-formed
xml-like HTML document, but it works almost perfectly for our purposes too.

The attached patch makes use of TagSoup for parsing input jnlp files.

Parsing is currently implemented in two passes. In the first pass, 
TagSoup reads the "xml" (which can be malformed and hence not really 
xml), and outputs valid XML. Netx then uses this valid XML and uses it's 
own XML parser to parse the file.

The patch requires TagSoup as an optional dependency. To use TagSoup,
run configure (--with-tagsoup can be used to point to a TagSoup jar). To
not use TagSoup (even if it installed), use --with-tagsoup=no

The patch also adds an additional command line option, -xml ,to the 
javaws binary. This option can be used to force Netx to use the normal 
xml parser instead of TagSoup to parse the jnlp file.

Any thoughts or comments?

ChangeLog:
2011-01-10  Omair Majid  <omajid at redhat.com>

     * Makefile.am: Add NETX_EXCLUDE_SRCS, NETX_DUMMY_CLASSPATH
     (netx-source-files.txt): Selectively exclude some sources from
     compilation.
     (stamps/netx.stamp): Depend on netx-dummy.jar
     (netx-dummy.jar): New target. Empty jar. Used so there is always at
     least one class on the classpath.
     ($(NETX_DIR)/launcher/%.o): Add classpath.
     * NEWS: Update with fix.
     * acinclude.m4: Add IT_CHECK_FOR_TAGSOUP.
     * configure.ac: Call IT_CHECK_FOR_TAGSOUP.
     * netx/net/sourceforge/jnlp/JNLPFile.java: Add new member
     parserSettings.
     (JNLPFile(URL)): Pass a ParserSettings object.
     (JNLPFile(URL,boolean)): Refactored into...
     (JNLPFile(URL,ParserSettings)): New method.
     (JNLPFile(URL,Version,boolean)): Refactored into...
     (JNLPFile(URL,Version,ParserSettings)): New method.
     (JNLPFile(URL,Version,boolean,UpdatePolicy)): Refactored into...
     (JNLPFile(URL,Version,ParserSettings,UpdatePolicy)): New method.
     (JNLPFile(URL,String,Version,boolean,UpdatePolicy)): Refactored
     into...
     (JNLPFile(URL,String,Version,ParserSettings,UpdatePolicy)): New
     method.
     (JNLPFile(InputStream,boolean)): Refactored into...
     (JNLPFile(InputStream,ParserSettings)): New method.
     (getParserSettings): New method.
     (parse(Node,boolean,URL)): Refactored into...
     (parse(InputStream,URL)): New method. Invoke parser to get the root
     node and then parse it.
     * netx/net/sourceforge/jnlp/Launcher.java
     (toFile): Use new ParserSettings object.
     * netx/net/sourceforge/jnlp/Parser.java
     (Parser(JNLPFile,URL,Node,boolean,boolean)): Refactored into...
     (Parser(JNLPFile,URL,Node,ParserSettings)): New method.
     (getRootNode): Implementation moved into XMLParser.getRootNode.
     Selects the right subclass of XMLParser to use.
     (getEncoding): Moved to XMLParser.
     * netx/net/sourceforge/jnlp/ParserSettings.java: New file.
     (ParserSettings): New method.
     (ParserSettings(boolean,boolean,boolean)): New method.
     (isExtensionAllowed): New method.
     (isMalfromedXmlAllowed): New method.
     (isStrict): New method.
     * netx/net/sourceforge/jnlp/XMLParser.java
     (getRootNode): New method. Contains implementation from
     Parser.getRootNode.
     (getEncoding): New method. Moved from Parser.
     * netx/net/sourceforge/jnlp/MalformedXMLParser.java: New file.
     (getRootNode): New method. Transform input into valid xml and
     delegate to parent to parse it.
     (xmlizeInputStream): New method. Read contents from an input stream
     and transform it into valid xml.
     * netx/net/sourceforge/jnlp/resources/Messages.properties: Add
     BOXml.
     * netx/net/sourceforge/jnlp/runtime/Boot.java: Add -xml option.
     (getFile): Parse -xml option and create a new ParserSettings object
     based on it.
     * netx/net/sourceforge/jnlp/runtime/JNLPClassLoader.java
     (getInstance(URL,String,Version,UpdatePolicy)): Refactored into...
     (getInstance(URL,String,Version,ParserSettings,UpdatePolicy): New
     method.
     (initializeExtensions): Use the same parser settings to parse the
     extension as used in the original file.

Cheers,
Omair

[1] http://home.ccil.org/~cowan/XML/tagsoup/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tagsoup-01.patch
Type: text/x-patch
Size: 35688 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/distro-pkg-dev/attachments/20110110/274c590b/tagsoup-01.patch 


More information about the distro-pkg-dev mailing list