[icedtea-web] RFC: use tagsoup to try and parse malformed JNLP files

Adam Domurad adomurad at redhat.com
Wed Jun 6 12:12:17 PDT 2012


I have attached the patch of what I ended up with after applying the
patch to the recent code, to make further review easier hopefully.

I have also attached only the updated test changes from the same patch.

(Note that my changes to Makefile.am are dubious, I was just doing what
was necessary to test the patch.)

I'd appreciate if another reviewer ran the tests as I'm having trouble
getting with running a few of them (in general). From what I see though
this patch passes more tests than normal, including two of the malformed
XML unit tests.

>From what I've seen the changes to the code look solid, I'd comment more
but I'm not too sure of the impact of some of the changes.

> Hi,
> 
> I have come across a number of JNLP files that are not valid xml. Netx
> can not parse these files using a xml parser, and fails to run them. I
> spent some time looking for a solution and came across TagSoup[1]. The
> TagSoup library parses a malformed HTML document into a well-formed
> xml-like HTML document, but it works almost perfectly for our purposes too.
> 
> The attached patch makes use of TagSoup for parsing input jnlp files.
> 
> Parsing is currently implemented in two passes. In the first pass, 
> TagSoup reads the "xml" (which can be malformed and hence not really 
> xml), and outputs valid XML. Netx then uses this valid XML and uses it's 
> own XML parser to parse the file.
> 
> The patch requires TagSoup as an optional dependency. To use TagSoup,
> run configure (--with-tagsoup can be used to point to a TagSoup jar). To
> not use TagSoup (even if it installed), use --with-tagsoup=no
> 
> The patch also adds an additional command line option, -xml ,to the 
> javaws binary. This option can be used to force Netx to use the normal 
> xml parser instead of TagSoup to parse the jnlp file.
> 
> Any thoughts or comments?
> 
> ChangeLog:
> 2011-01-10  Omair Majid  <omajid at redhat.com>
> 
>      * Makefile.am: Add NETX_EXCLUDE_SRCS, NETX_DUMMY_CLASSPATH
>      (netx-source-files.txt): Selectively exclude some sources from
>      compilation.
>      (stamps/netx.stamp): Depend on netx-dummy.jar
>      (netx-dummy.jar): New target. Empty jar. Used so there is always at
>      least one class on the classpath.
>      ($(NETX_DIR)/launcher/%.o): Add classpath.
>      * NEWS: Update with fix.
>      * acinclude.m4: Add IT_CHECK_FOR_TAGSOUP.
>      * configure.ac: Call IT_CHECK_FOR_TAGSOUP.
>      * netx/net/sourceforge/jnlp/JNLPFile.java: Add new member
>      parserSettings.
>      (JNLPFile(URL)): Pass a ParserSettings object.
>      (JNLPFile(URL,boolean)): Refactored into...
>      (JNLPFile(URL,ParserSettings)): New method.
>      (JNLPFile(URL,Version,boolean)): Refactored into...
>      (JNLPFile(URL,Version,ParserSettings)): New method.
>      (JNLPFile(URL,Version,boolean,UpdatePolicy)): Refactored into...
>      (JNLPFile(URL,Version,ParserSettings,UpdatePolicy)): New method.
>      (JNLPFile(URL,String,Version,boolean,UpdatePolicy)): Refactored
>      into...
>      (JNLPFile(URL,String,Version,ParserSettings,UpdatePolicy)): New
>      method.
>      (JNLPFile(InputStream,boolean)): Refactored into...
>      (JNLPFile(InputStream,ParserSettings)): New method.
>      (getParserSettings): New method.
>      (parse(Node,boolean,URL)): Refactored into...
>      (parse(InputStream,URL)): New method. Invoke parser to get the root
>      node and then parse it.
>      * netx/net/sourceforge/jnlp/Launcher.java
>      (toFile): Use new ParserSettings object.
>      * netx/net/sourceforge/jnlp/Parser.java
>      (Parser(JNLPFile,URL,Node,boolean,boolean)): Refactored into...
>      (Parser(JNLPFile,URL,Node,ParserSettings)): New method.
>      (getRootNode): Implementation moved into XMLParser.getRootNode.
>      Selects the right subclass of XMLParser to use.
>      (getEncoding): Moved to XMLParser.
>      * netx/net/sourceforge/jnlp/ParserSettings.java: New file.
>      (ParserSettings): New method.
>      (ParserSettings(boolean,boolean,boolean)): New method.
>      (isExtensionAllowed): New method.
>      (isMalfromedXmlAllowed): New method.
>      (isStrict): New method.
>      * netx/net/sourceforge/jnlp/XMLParser.java
>      (getRootNode): New method. Contains implementation from
>      Parser.getRootNode.
>      (getEncoding): New method. Moved from Parser.
>      * netx/net/sourceforge/jnlp/MalformedXMLParser.java: New file.
>      (getRootNode): New method. Transform input into valid xml and
>      delegate to parent to parse it.
>      (xmlizeInputStream): New method. Read contents from an input stream
>      and transform it into valid xml.
>      * netx/net/sourceforge/jnlp/resources/Messages.properties: Add
>      BOXml.
>      * netx/net/sourceforge/jnlp/runtime/Boot.java: Add -xml option.
>      (getFile): Parse -xml option and create a new ParserSettings object
>      based on it.
>      * netx/net/sourceforge/jnlp/runtime/JNLPClassLoader.java
>      (getInstance(URL,String,Version,UpdatePolicy)): Refactored into...
>      (getInstance(URL,String,Version,ParserSettings,UpdatePolicy): New
>      method.
>      (initializeExtensions): Use the same parser settings to parse the
>      extension as used in the original file.
> 
> Cheers,
> Omair
> 
> [1] http://home.ccil.org/~cowan/XML/tagsoup/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: update_to_patch.patch
Type: text/x-patch
Size: 41912 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/distro-pkg-dev/attachments/20120606/d8aefd52/update_to_patch.patch 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_changes.patch
Type: text/x-patch
Size: 9776 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/distro-pkg-dev/attachments/20120606/d8aefd52/test_changes.patch 


More information about the distro-pkg-dev mailing list