[icedtea-web] RFC: use tagsoup to try and parse malformed JNLP files
Adam Domurad
adomurad at redhat.com
Wed Jun 6 12:12:17 PDT 2012
I have attached the patch of what I ended up with after applying the
patch to the recent code, to make further review easier hopefully.
I have also attached only the updated test changes from the same patch.
(Note that my changes to Makefile.am are dubious, I was just doing what
was necessary to test the patch.)
I'd appreciate if another reviewer ran the tests as I'm having trouble
getting with running a few of them (in general). From what I see though
this patch passes more tests than normal, including two of the malformed
XML unit tests.
>From what I've seen the changes to the code look solid, I'd comment more
but I'm not too sure of the impact of some of the changes.
> Hi,
>
> I have come across a number of JNLP files that are not valid xml. Netx
> can not parse these files using a xml parser, and fails to run them. I
> spent some time looking for a solution and came across TagSoup[1]. The
> TagSoup library parses a malformed HTML document into a well-formed
> xml-like HTML document, but it works almost perfectly for our purposes too.
>
> The attached patch makes use of TagSoup for parsing input jnlp files.
>
> Parsing is currently implemented in two passes. In the first pass,
> TagSoup reads the "xml" (which can be malformed and hence not really
> xml), and outputs valid XML. Netx then uses this valid XML and uses it's
> own XML parser to parse the file.
>
> The patch requires TagSoup as an optional dependency. To use TagSoup,
> run configure (--with-tagsoup can be used to point to a TagSoup jar). To
> not use TagSoup (even if it installed), use --with-tagsoup=no
>
> The patch also adds an additional command line option, -xml ,to the
> javaws binary. This option can be used to force Netx to use the normal
> xml parser instead of TagSoup to parse the jnlp file.
>
> Any thoughts or comments?
>
> ChangeLog:
> 2011-01-10 Omair Majid <omajid at redhat.com>
>
> * Makefile.am: Add NETX_EXCLUDE_SRCS, NETX_DUMMY_CLASSPATH
> (netx-source-files.txt): Selectively exclude some sources from
> compilation.
> (stamps/netx.stamp): Depend on netx-dummy.jar
> (netx-dummy.jar): New target. Empty jar. Used so there is always at
> least one class on the classpath.
> ($(NETX_DIR)/launcher/%.o): Add classpath.
> * NEWS: Update with fix.
> * acinclude.m4: Add IT_CHECK_FOR_TAGSOUP.
> * configure.ac: Call IT_CHECK_FOR_TAGSOUP.
> * netx/net/sourceforge/jnlp/JNLPFile.java: Add new member
> parserSettings.
> (JNLPFile(URL)): Pass a ParserSettings object.
> (JNLPFile(URL,boolean)): Refactored into...
> (JNLPFile(URL,ParserSettings)): New method.
> (JNLPFile(URL,Version,boolean)): Refactored into...
> (JNLPFile(URL,Version,ParserSettings)): New method.
> (JNLPFile(URL,Version,boolean,UpdatePolicy)): Refactored into...
> (JNLPFile(URL,Version,ParserSettings,UpdatePolicy)): New method.
> (JNLPFile(URL,String,Version,boolean,UpdatePolicy)): Refactored
> into...
> (JNLPFile(URL,String,Version,ParserSettings,UpdatePolicy)): New
> method.
> (JNLPFile(InputStream,boolean)): Refactored into...
> (JNLPFile(InputStream,ParserSettings)): New method.
> (getParserSettings): New method.
> (parse(Node,boolean,URL)): Refactored into...
> (parse(InputStream,URL)): New method. Invoke parser to get the root
> node and then parse it.
> * netx/net/sourceforge/jnlp/Launcher.java
> (toFile): Use new ParserSettings object.
> * netx/net/sourceforge/jnlp/Parser.java
> (Parser(JNLPFile,URL,Node,boolean,boolean)): Refactored into...
> (Parser(JNLPFile,URL,Node,ParserSettings)): New method.
> (getRootNode): Implementation moved into XMLParser.getRootNode.
> Selects the right subclass of XMLParser to use.
> (getEncoding): Moved to XMLParser.
> * netx/net/sourceforge/jnlp/ParserSettings.java: New file.
> (ParserSettings): New method.
> (ParserSettings(boolean,boolean,boolean)): New method.
> (isExtensionAllowed): New method.
> (isMalfromedXmlAllowed): New method.
> (isStrict): New method.
> * netx/net/sourceforge/jnlp/XMLParser.java
> (getRootNode): New method. Contains implementation from
> Parser.getRootNode.
> (getEncoding): New method. Moved from Parser.
> * netx/net/sourceforge/jnlp/MalformedXMLParser.java: New file.
> (getRootNode): New method. Transform input into valid xml and
> delegate to parent to parse it.
> (xmlizeInputStream): New method. Read contents from an input stream
> and transform it into valid xml.
> * netx/net/sourceforge/jnlp/resources/Messages.properties: Add
> BOXml.
> * netx/net/sourceforge/jnlp/runtime/Boot.java: Add -xml option.
> (getFile): Parse -xml option and create a new ParserSettings object
> based on it.
> * netx/net/sourceforge/jnlp/runtime/JNLPClassLoader.java
> (getInstance(URL,String,Version,UpdatePolicy)): Refactored into...
> (getInstance(URL,String,Version,ParserSettings,UpdatePolicy): New
> method.
> (initializeExtensions): Use the same parser settings to parse the
> extension as used in the original file.
>
> Cheers,
> Omair
>
> [1] http://home.ccil.org/~cowan/XML/tagsoup/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: update_to_patch.patch
Type: text/x-patch
Size: 41912 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/distro-pkg-dev/attachments/20120606/d8aefd52/update_to_patch.patch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_changes.patch
Type: text/x-patch
Size: 9776 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/distro-pkg-dev/attachments/20120606/d8aefd52/test_changes.patch
More information about the distro-pkg-dev
mailing list