[icedtea-web] RFC: use tagsoup to try and parse malformed JNLP files
Omair Majid
omajid at redhat.com
Mon Jan 10 13:39:38 PST 2011
Hi,
I have come across a number of JNLP files that are not valid xml. Netx
can not parse these files using a xml parser, and fails to run them. I
spent some time looking for a solution and came across TagSoup[1]. The
TagSoup library parses a malformed HTML document into a well-formed
xml-like HTML document, but it works almost perfectly for our purposes too.
The attached patch makes use of TagSoup for parsing input jnlp files.
Parsing is currently implemented in two passes. In the first pass,
TagSoup reads the "xml" (which can be malformed and hence not really
xml), and outputs valid XML. Netx then uses this valid XML and uses it's
own XML parser to parse the file.
The patch requires TagSoup as an optional dependency. To use TagSoup,
run configure (--with-tagsoup can be used to point to a TagSoup jar). To
not use TagSoup (even if it installed), use --with-tagsoup=no
The patch also adds an additional command line option, -xml ,to the
javaws binary. This option can be used to force Netx to use the normal
xml parser instead of TagSoup to parse the jnlp file.
Any thoughts or comments?
ChangeLog:
2011-01-10 Omair Majid <omajid at redhat.com>
* Makefile.am: Add NETX_EXCLUDE_SRCS, NETX_DUMMY_CLASSPATH
(netx-source-files.txt): Selectively exclude some sources from
compilation.
(stamps/netx.stamp): Depend on netx-dummy.jar
(netx-dummy.jar): New target. Empty jar. Used so there is always at
least one class on the classpath.
($(NETX_DIR)/launcher/%.o): Add classpath.
* NEWS: Update with fix.
* acinclude.m4: Add IT_CHECK_FOR_TAGSOUP.
* configure.ac: Call IT_CHECK_FOR_TAGSOUP.
* netx/net/sourceforge/jnlp/JNLPFile.java: Add new member
parserSettings.
(JNLPFile(URL)): Pass a ParserSettings object.
(JNLPFile(URL,boolean)): Refactored into...
(JNLPFile(URL,ParserSettings)): New method.
(JNLPFile(URL,Version,boolean)): Refactored into...
(JNLPFile(URL,Version,ParserSettings)): New method.
(JNLPFile(URL,Version,boolean,UpdatePolicy)): Refactored into...
(JNLPFile(URL,Version,ParserSettings,UpdatePolicy)): New method.
(JNLPFile(URL,String,Version,boolean,UpdatePolicy)): Refactored
into...
(JNLPFile(URL,String,Version,ParserSettings,UpdatePolicy)): New
method.
(JNLPFile(InputStream,boolean)): Refactored into...
(JNLPFile(InputStream,ParserSettings)): New method.
(getParserSettings): New method.
(parse(Node,boolean,URL)): Refactored into...
(parse(InputStream,URL)): New method. Invoke parser to get the root
node and then parse it.
* netx/net/sourceforge/jnlp/Launcher.java
(toFile): Use new ParserSettings object.
* netx/net/sourceforge/jnlp/Parser.java
(Parser(JNLPFile,URL,Node,boolean,boolean)): Refactored into...
(Parser(JNLPFile,URL,Node,ParserSettings)): New method.
(getRootNode): Implementation moved into XMLParser.getRootNode.
Selects the right subclass of XMLParser to use.
(getEncoding): Moved to XMLParser.
* netx/net/sourceforge/jnlp/ParserSettings.java: New file.
(ParserSettings): New method.
(ParserSettings(boolean,boolean,boolean)): New method.
(isExtensionAllowed): New method.
(isMalfromedXmlAllowed): New method.
(isStrict): New method.
* netx/net/sourceforge/jnlp/XMLParser.java
(getRootNode): New method. Contains implementation from
Parser.getRootNode.
(getEncoding): New method. Moved from Parser.
* netx/net/sourceforge/jnlp/MalformedXMLParser.java: New file.
(getRootNode): New method. Transform input into valid xml and
delegate to parent to parse it.
(xmlizeInputStream): New method. Read contents from an input stream
and transform it into valid xml.
* netx/net/sourceforge/jnlp/resources/Messages.properties: Add
BOXml.
* netx/net/sourceforge/jnlp/runtime/Boot.java: Add -xml option.
(getFile): Parse -xml option and create a new ParserSettings object
based on it.
* netx/net/sourceforge/jnlp/runtime/JNLPClassLoader.java
(getInstance(URL,String,Version,UpdatePolicy)): Refactored into...
(getInstance(URL,String,Version,ParserSettings,UpdatePolicy): New
method.
(initializeExtensions): Use the same parser settings to parse the
extension as used in the original file.
Cheers,
Omair
[1] http://home.ccil.org/~cowan/XML/tagsoup/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tagsoup-01.patch
Type: text/x-patch
Size: 35688 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/distro-pkg-dev/attachments/20110110/274c590b/tagsoup-01.patch
More information about the distro-pkg-dev
mailing list