[icedtea-web] RFC: use tagsoup to try and parse malformed JNLP files

Dr Andrew John Hughes ahughes at redhat.com
Tue Jan 11 06:03:58 PST 2011


On 19:10 Mon 10 Jan     , Omair Majid wrote:
> On 01/10/2011 05:33 PM, Dr Andrew John Hughes wrote:
> > On 16:39 Mon 10 Jan     , Omair Majid wrote:
> >> Hi,
> >>
> >> I have come across a number of JNLP files that are not valid xml. Netx
> >> can not parse these files using a xml parser, and fails to run them. I
> >> spent some time looking for a solution and came across TagSoup[1]. The
> >> TagSoup library parses a malformed HTML document into a well-formed
> >> xml-like HTML document, but it works almost perfectly for our purposes too.
> >>
> >> The attached patch makes use of TagSoup for parsing input jnlp files.
> >>
> >> Parsing is currently implemented in two passes. In the first pass,
> >> TagSoup reads the "xml" (which can be malformed and hence not really
> >> xml), and outputs valid XML. Netx then uses this valid XML and uses it's
> >> own XML parser to parse the file.
> >>
> >> The patch requires TagSoup as an optional dependency. To use TagSoup,
> >> run configure (--with-tagsoup can be used to point to a TagSoup jar). To
> >> not use TagSoup (even if it installed), use --with-tagsoup=no
> >>
> >> The patch also adds an additional command line option, -xml ,to the
> >> javaws binary. This option can be used to force Netx to use the normal
> >> xml parser instead of TagSoup to parse the jnlp file.
> >>
> >> Any thoughts or comments?
> >>
> >> ChangeLog:
> >> 2011-01-10  Omair Majid<omajid at redhat.com>
> >>
> >>       * Makefile.am: Add NETX_EXCLUDE_SRCS, NETX_DUMMY_CLASSPATH
> >>       (netx-source-files.txt): Selectively exclude some sources from
> >>       compilation.
> >>       (stamps/netx.stamp): Depend on netx-dummy.jar
> >>       (netx-dummy.jar): New target. Empty jar. Used so there is always at
> >>       least one class on the classpath.
> >>       ($(NETX_DIR)/launcher/%.o): Add classpath.
> >>       * NEWS: Update with fix.
> >>       * acinclude.m4: Add IT_CHECK_FOR_TAGSOUP.
> >>       * configure.ac: Call IT_CHECK_FOR_TAGSOUP.
> >>       * netx/net/sourceforge/jnlp/JNLPFile.java: Add new member
> >>       parserSettings.
> >>       (JNLPFile(URL)): Pass a ParserSettings object.
> >>       (JNLPFile(URL,boolean)): Refactored into...
> >>       (JNLPFile(URL,ParserSettings)): New method.
> >>       (JNLPFile(URL,Version,boolean)): Refactored into...
> >>       (JNLPFile(URL,Version,ParserSettings)): New method.
> >>       (JNLPFile(URL,Version,boolean,UpdatePolicy)): Refactored into...
> >>       (JNLPFile(URL,Version,ParserSettings,UpdatePolicy)): New method.
> >>       (JNLPFile(URL,String,Version,boolean,UpdatePolicy)): Refactored
> >>       into...
> >>       (JNLPFile(URL,String,Version,ParserSettings,UpdatePolicy)): New
> >>       method.
> >>       (JNLPFile(InputStream,boolean)): Refactored into...
> >>       (JNLPFile(InputStream,ParserSettings)): New method.
> >>       (getParserSettings): New method.
> >>       (parse(Node,boolean,URL)): Refactored into...
> >>       (parse(InputStream,URL)): New method. Invoke parser to get the root
> >>       node and then parse it.
> >>       * netx/net/sourceforge/jnlp/Launcher.java
> >>       (toFile): Use new ParserSettings object.
> >>       * netx/net/sourceforge/jnlp/Parser.java
> >>       (Parser(JNLPFile,URL,Node,boolean,boolean)): Refactored into...
> >>       (Parser(JNLPFile,URL,Node,ParserSettings)): New method.
> >>       (getRootNode): Implementation moved into XMLParser.getRootNode.
> >>       Selects the right subclass of XMLParser to use.
> >>       (getEncoding): Moved to XMLParser.
> >>       * netx/net/sourceforge/jnlp/ParserSettings.java: New file.
> >>       (ParserSettings): New method.
> >>       (ParserSettings(boolean,boolean,boolean)): New method.
> >>       (isExtensionAllowed): New method.
> >>       (isMalfromedXmlAllowed): New method.
> >>       (isStrict): New method.
> >>       * netx/net/sourceforge/jnlp/XMLParser.java
> >>       (getRootNode): New method. Contains implementation from
> >>       Parser.getRootNode.
> >>       (getEncoding): New method. Moved from Parser.
> >>       * netx/net/sourceforge/jnlp/MalformedXMLParser.java: New file.
> >>       (getRootNode): New method. Transform input into valid xml and
> >>       delegate to parent to parse it.
> >>       (xmlizeInputStream): New method. Read contents from an input stream
> >>       and transform it into valid xml.
> >>       * netx/net/sourceforge/jnlp/resources/Messages.properties: Add
> >>       BOXml.
> >>       * netx/net/sourceforge/jnlp/runtime/Boot.java: Add -xml option.
> >>       (getFile): Parse -xml option and create a new ParserSettings object
> >>       based on it.
> >>       * netx/net/sourceforge/jnlp/runtime/JNLPClassLoader.java
> >>       (getInstance(URL,String,Version,UpdatePolicy)): Refactored into...
> >>       (getInstance(URL,String,Version,ParserSettings,UpdatePolicy): New
> >>       method.
> >>       (initializeExtensions): Use the same parser settings to parse the
> >>       extension as used in the original file.
> >>
> >> Cheers,
> >> Omair
> >>
> >> [1] http://home.ccil.org/~cowan/XML/tagsoup/
> >
> > I've just looked at the build changes.  I'll leave someone with better knowledge
> > of the source code to look at those changes.
> >
> 
> Thanks for looking over the changes so quickly!
> 
> > With Makefile.am, I don't see why NETX_DUMMY_CLASSPATH is needed or the additional
> > rule that creates a JAR file.  Neither do you need to set NETX_EXCLUDE_SRCS to empty;
> > this is the default.
> >
> 
> Automake complains if a variable is not set before using += :
> 
> NETX_EXCLUDE_SRCS must be set with `=' before using `+='
> 

Yeah, so we just use '=' now :-)

> > if HAVE_TAGSOUP
> > NETX_CLASSPATH_ARG=-classpath $(TAGSOUP_JAR)
> > NETX_LAUNCHER_ARG="-Xbootclasspath/a:$(TAGSOUP_JAR)"
> > else
> > NETX_EXCLUDE_SRCS+=net.sourceforge.jnlp.MalformedXMLParser.java
> > endif
> >
> > would work fine and you can drop the netx-dummy.jar rule.
> >
> 
> Thanks for the idea. What I wanted to do (a little prematurely, I 
> suppose) was to make sure that more dependencies could be added in the 
> future (with their own configure flags, if necessary) without changing 
> the code too much. I also wanted all build code-paths to be as close as 
> possible. Which is why I wanted to always have a classpath for 
> netx-building (even if it was effectively blank using netx-dummy.jar) 
> But my approach just makes the Makefile look like a mess.
>

Yeah, I guessed your motivation.  I'm just not sure it's worth bending over
backwards to accomodate it.  I guess we can scratch our heads over a good
solution should we need a second dependency.
 
> > Should we really be putting tagsoup on the bootclasspath? What's wrong with the classpath?
> >
> 
> I have tested it out now with classpath and it looks like the javaws 
> launcher does not like it:
> $ javaws XEtchedButtonDemo.jnlp
> Unrecognized option: -classpath /usr/share/java/tagsoup.jar
> Could not create the Java virtual machine.
> 
> There is probably a way around this, I will see if I can find it.
> 

After writing the last reply, it also came to mind that setting this might
cause issues with a classpath passed to javaws (if that's possible).
So needs some testing.  I'm just wary that tagsoup includes unknown code and
it's a bit dangerous to put it on the privileged bootclasspath.  Then again,
I'm not sure any of javaws should be on the bootclasspath.

> > As to excluding the file, have you tested this?  Are you sure no other Java files pull
> > that class in?
> >
> 
> Yup. This is one code path I made sure to test. MalformedXMLParser is a 
> new file I added in this patch. The class is never used directly. Only 
> net.sourceforge.jnlp.Parser uses it, and that too through reflection. 
> Building (and running) without tagsoup works just fine.
> 
> > For configure, the argument should be the path to the jar file.  Otherwise, the JAR file
> > always has to be 'tagsoup.jar' which may not be the case.
> >
> 
> Isnt this already the case? Perhaps I missed something, but the code 
> does this: if --with-tagsoup=no then HAVE_TAGSOUP is set to false. if 
> --with-tagsoup=somevar then somevar is used as the location of the 
> tagsoup.jar. If --with-tagsoup is not used, then /usr/share/java (and 
> other locations) are searched for a tagsoup.jar.
> 

Yes sorry, you're right.  The block I was looking at is only used if
no option is provided by the user, in which case we know the predefined paths.

> > You should also check /usr/share/tagsoup/lib/tagsoup.jar which is the Gentoo installation path.
> > Debian uses /usr/share/java/tagsoup.jar as already checked.
> >
> 
> Ah, thanks. Updated patch attached.
> 

Thanks.  Don't know why they don't use /usr/share/java.

> Cheers,
> Omair

> diff -r dc02a605f905 Makefile.am
> --- a/Makefile.am	Fri Jan 07 08:00:08 2011 -0500
> +++ b/Makefile.am	Mon Jan 10 19:09:30 2011 -0500
> @@ -31,6 +31,8 @@
>  	net.sourceforge.jnlp.services net.sourceforge.jnlp.tools \
>  	net.sourceforge.jnlp.util net.sourceforge.jnlp.controlpanel
>  
> +NETX_EXCLUDE_SRCS=
> +
>  # Conditional defintions
>  if ENABLE_PLUGIN
>  ICEDTEAPLUGIN_CLEAN = clean-IcedTeaPlugin
> @@ -68,6 +70,13 @@
>  endif
>  endif
>  
> +if HAVE_TAGSOUP
> +NETX_CLASSPATH_ARG=-classpath $(TAGSOUP_JAR)
> +NETX_LAUNCHER_ARG="-Xbootclasspath/a:$(TAGSOUP_JAR)", 
> +else
> +NETX_EXCLUDE_SRCS+=net.sourceforge.jnlp.MalformedXMLParser.java
> +endif
> +
>  # Launcher
>  
>  LAUNCHER_SRCDIR = $(abs_top_srcdir)/launcher
> @@ -279,14 +288,19 @@
>  # a patch applied to sun.plugin.AppletViewerPanel and generated sources
>  
>  netx-source-files.txt:
> -	find $(NETX_SRCDIR) -name '*.java' | sort > $@
> +	find $(NETX_SRCDIR) -name '*.java' | sort > $@ ; \
> +	for src in $(NETX_EXCLUDE_SRCS) ; \
> +	do \
> +	  sed -i "/$${src}/ d" $@ ; \
> +	done
>  
> -stamps/netx.stamp: netx-source-files.txt stamps/bootstrap-directory.stamp
> +stamps/netx.stamp: netx-source-files.txt stamps/bootstrap-directory.stamp netx-dummy.jar
>  	mkdir -p $(NETX_DIR)
>  	$(BOOT_DIR)/bin/javac $(IT_JAVACFLAGS) \
>  	    -d $(NETX_DIR) \
>  	    -sourcepath $(NETX_SRCDIR) \
>  	    -bootclasspath $(RUNTIME) \
> +	    $(NETX_CLASSPATH_ARG) \
>  	    @netx-source-files.txt
>  	(cd $(NETX_RESOURCE_DIR); \
>  	 for files in $$(find . -type f); \
> @@ -349,7 +363,7 @@
>  $(NETX_DIR)/launcher/%.o: $(LAUNCHER_SRCDIR)/%.c
>  	mkdir -p $(NETX_DIR)/launcher && \
>  	$(CC) $(LAUNCHER_FLAGS) \
> -	  -DJAVA_ARGS='{ "-J-ms8m", "-J-Djava.icedtea-web.bin=$(DESTDIR)$(bindir)/javaws", "net.sourceforge.jnlp.runtime.Boot",  }' \
> +	  -DJAVA_ARGS='{ "-J-ms8m", "-J-Djava.icedtea-web.bin=$(DESTDIR)$(bindir)/javaws", $(NETX_LAUNCHER_ARG) "net.sourceforge.jnlp.runtime.Boot",  }' \
>  	  -DPROGNAME='"javaws"' -c -o $@ $<
>  
>  $(NETX_DIR)/launcher/controlpanel/%.o: $(LAUNCHER_SRCDIR)/%.c
> diff -r dc02a605f905 NEWS
> --- a/NEWS	Fri Jan 07 08:00:08 2011 -0500
> +++ b/NEWS	Mon Jan 10 19:09:30 2011 -0500
> @@ -8,7 +8,12 @@
>  
>  CVE-XXXX-YYYY: http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=XXXX-YYYY
>  
> -New in release 1.0 (2010-XX-XX):
> +New in release 1.1 (2011-XX-XX):
> +
> +* NetX
> + - Netx can now parse malformed jnlp files using tagsoup
> +
> +New in release 1.0 (2011-XX-XX):
>  
>  * Initial release of IcedTea-Web
>  * Security updates
> diff -r dc02a605f905 acinclude.m4
> --- a/acinclude.m4	Fri Jan 07 08:00:08 2011 -0500
> +++ b/acinclude.m4	Mon Jan 10 19:09:30 2011 -0500
> @@ -297,6 +297,36 @@
>  fi
>  ])
>  
> +
> +AC_DEFUN_ONCE([IT_CHECK_FOR_TAGSOUP],
> +[
> +  AC_MSG_CHECKING([for tagsoup])
> +  AC_ARG_WITH([tagsoup],
> +             [AS_HELP_STRING([--with-tagsoup],
> +                             [support malformed jnlp files])],
> +             [ TAGSOUP_JAR=${withval} ],
> +             [ TAGSOUP_JAR= ])
> +  if test x"${TAGSOUP_JAR}" = xyes ; then
> +    TAGSOUP_JAR=
> +  fi
> +  if test -z "${TAGSOUP_JAR}" ; then
> +    for dir in /usr/share/java /usr/local/share/java \
> +        /usr/share/tagsoup/lib/ ; do
> +      if test -f $dir/tagsoup.jar; then
> +        TAGSOUP_JAR=$dir/tagsoup.jar
> +	    break
> +      fi
> +    done
> +  fi
> +  if test x"${TAGSOUP_JAR}" = x ; then
> +    TAGSOUP_JAR=no
> +  fi
> +  AC_MSG_RESULT(${TAGSOUP_JAR})
> +  AC_SUBST(TAGSOUP_JAR)
> +  AM_CONDITIONAL([HAVE_TAGSOUP], [test x$TAGSOUP_JAR != xno])
> +])
> +
> +
>  dnl Generic macro to check for a Java class
>  dnl Takes the name of the class as an argument.  The macro name
>  dnl is usually the name of the class with '.'
> diff -r dc02a605f905 configure.ac
> --- a/configure.ac	Fri Jan 07 08:00:08 2011 -0500
> +++ b/configure.ac	Mon Jan 10 19:09:30 2011 -0500
> @@ -80,4 +80,6 @@
>  IT_CHECK_FOR_CLASS(SUN_APPLET_APPLETIMAGEREF, [sun.applet.AppletImageRef])
>  IT_CHECK_FOR_APPLETVIEWERPANEL_HOLE
>  
> +IT_CHECK_FOR_TAGSOUP
> +
>  AC_OUTPUT
> diff -r dc02a605f905 netx/net/sourceforge/jnlp/JNLPFile.java
> --- a/netx/net/sourceforge/jnlp/JNLPFile.java	Fri Jan 07 08:00:08 2011 -0500
> +++ b/netx/net/sourceforge/jnlp/JNLPFile.java	Mon Jan 10 19:09:30 2011 -0500
> @@ -67,6 +67,9 @@
>      /** the network location of this JNLP file */
>      protected URL fileLocation;
>  
> +    /** the ParserSettings which were used to parse this file */
> +    protected ParserSettings parserSettings = null;
> +
>      /** A key that uniquely identifies connected instances (main jnlp+ext) */
>      protected String uniqueKey = null;
>  
> @@ -132,7 +135,7 @@
>       * @throws ParseException if the JNLP file was invalid
>       */
>      public JNLPFile(URL location) throws IOException, ParseException {
> -        this(location, false); // not strict
> +        this(location, new ParserSettings());
>      }
>  
>      /**
> @@ -140,12 +143,12 @@
>       * default policy.
>       *
>       * @param location the location of the JNLP file
> -     * @param strict whether to enforce the spec when
> +     * @param settings the parser settings to use while parsing the file
>       * @throws IOException if an IO exception occurred
>       * @throws ParseException if the JNLP file was invalid
>       */
> -    public JNLPFile(URL location, boolean strict) throws IOException, ParseException {
> -        this(location, (Version) null, strict);
> +    public JNLPFile(URL location, ParserSettings settings) throws IOException, ParseException {
> +        this(location, (Version) null, settings);
>      }
>  
>      /**
> @@ -154,12 +157,12 @@
>       *
>       * @param location the location of the JNLP file
>       * @param version the version of the JNLP file
> -     * @param strict whether to enforce the spec when
> +     * @param settings the parser settings to use while parsing the file
>       * @throws IOException if an IO exception occurred
>       * @throws ParseException if the JNLP file was invalid
>       */
> -    public JNLPFile(URL location, Version version, boolean strict) throws IOException, ParseException {
> -        this(location, version, strict, JNLPRuntime.getDefaultUpdatePolicy());
> +    public JNLPFile(URL location, Version version, ParserSettings settings) throws IOException, ParseException {
> +        this(location, version, settings, JNLPRuntime.getDefaultUpdatePolicy());
>      }
>  
>      /**
> @@ -168,14 +171,15 @@
>       *
>       * @param location the location of the JNLP file
>       * @param version the version of the JNLP file
> -     * @param strict whether to enforce the spec when
> +     * @param settings the parser settings to use while parsing the file
>       * @param policy the update policy
>       * @throws IOException if an IO exception occurred
>       * @throws ParseException if the JNLP file was invalid
>       */
> -    public JNLPFile(URL location, Version version, boolean strict, UpdatePolicy policy) throws IOException, ParseException {
> -        Node root = Parser.getRootNode(openURL(location, version, policy));
> -        parse(root, strict, location);
> +    public JNLPFile(URL location, Version version, ParserSettings settings, UpdatePolicy policy) throws IOException, ParseException {
> +        this.parserSettings = settings;
> +
> +        parse(openURL(location, version, policy), location);
>  
>          //Downloads the original jnlp file into the cache if possible
>          //(i.e. If the jnlp file being launched exist locally, but it
> @@ -202,13 +206,13 @@
>       * @param location the location of the JNLP file
>       * @param uniqueKey A string that uniquely identifies connected instances
>       * @param version the version of the JNLP file
> -     * @param strict whether to enforce the spec when
> +     * @param settings the parser settings to use while parsing the file
>       * @param policy the update policy
>       * @throws IOException if an IO exception occurred
>       * @throws ParseException if the JNLP file was invalid
>       */
> -    public JNLPFile(URL location, String uniqueKey, Version version, boolean strict, UpdatePolicy policy) throws IOException, ParseException {
> -        this(location, version, strict, policy);
> +    public JNLPFile(URL location, String uniqueKey, Version version, ParserSettings settings, UpdatePolicy policy) throws IOException, ParseException {
> +        this(location, version, settings, policy);
>          this.uniqueKey = uniqueKey;
>  
>          if (JNLPRuntime.isDebug())
> @@ -218,11 +222,14 @@
>      /**
>       * Create a JNLPFile from an input stream.
>       *
> +     * @param input input stream to read the JNLP file from
> +     * @param settings the parser settings to use while parsing the file
>       * @throws IOException if an IO exception occurred
>       * @throws ParseException if the JNLP file was invalid
>       */
> -    public JNLPFile(InputStream input, boolean strict) throws ParseException {
> -        parse(Parser.getRootNode(input), strict, null);
> +    public JNLPFile(InputStream input, ParserSettings settings) throws ParseException {
> +        this.parserSettings = settings;
> +        parse(input, null);
>      }
>  
>      /**
> @@ -288,6 +295,13 @@
>      }
>  
>      /**
> +     * Returns the ParserSettings that was used to parse this file
> +     */
> +    public ParserSettings getParserSettings() {
> +        return parserSettings;
> +    }
> +
> +    /**
>       * Returns the JNLP file's version.
>       */
>      public Version getFileVersion() {
> @@ -548,12 +562,13 @@
>       * @param strict whether to enforce the spec when
>       * @param location the file location or null
>       */
> -    private void parse(Node root, boolean strict, URL location) throws ParseException {
> +    private void parse(InputStream input, URL location) throws ParseException {
>          try {
>              //if (location != null)
>              //  location = new URL(location, "."); // remove filename
>  
> -            Parser parser = new Parser(this, location, root, strict, true); // true == allow extensions
> +            Node root = Parser.getRootNode(input, parserSettings);
> +            Parser parser = new Parser(this, location, root, parserSettings);
>  
>              // JNLP tag information
>              specVersion = parser.getSpecVersion();
> diff -r dc02a605f905 netx/net/sourceforge/jnlp/Launcher.java
> --- a/netx/net/sourceforge/jnlp/Launcher.java	Fri Jan 07 08:00:08 2011 -0500
> +++ b/netx/net/sourceforge/jnlp/Launcher.java	Mon Jan 10 19:09:30 2011 -0500
> @@ -360,9 +360,11 @@
>              JNLPFile file = null;
>  
>              try {
> -                file = new JNLPFile(location, (Version) null, true, updatePolicy); // strict
> +                ParserSettings settings = new ParserSettings(true, true, false);
> +                file = new JNLPFile(location, (Version) null, settings, updatePolicy); // strict
>              } catch (ParseException ex) {
> -                file = new JNLPFile(location, (Version) null, false, updatePolicy);
> +                ParserSettings settings = new ParserSettings(false, true, true);
> +                file = new JNLPFile(location, (Version) null, settings, updatePolicy);
>  
>                  // only here if strict failed but lax did not fail
>                  LaunchException lex =
> diff -r dc02a605f905 netx/net/sourceforge/jnlp/MalformedXMLParser.java
> --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> +++ b/netx/net/sourceforge/jnlp/MalformedXMLParser.java	Mon Jan 10 19:09:30 2011 -0500
> @@ -0,0 +1,101 @@
> +// Copyright (C) 2011 Red Hat, Inc.
> +//
> +// This library is free software; you can redistribute it and/or
> +// modify it under the terms of the GNU Lesser General Public
> +// License as published by the Free Software Foundation; either
> +// version 2.1 of the License, or (at your option) any later version.
> +//
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +// Lesser General Public License for more details.
> +//
> +// You should have received a copy of the GNU Lesser General Public
> +// License along with this library; if not, write to the Free Software
> +// Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
> +
> +package net.sourceforge.jnlp;
> +
> +import static net.sourceforge.jnlp.runtime.Translator.R;
> +
> +import java.io.ByteArrayInputStream;
> +import java.io.ByteArrayOutputStream;
> +import java.io.IOException;
> +import java.io.InputStream;
> +import java.io.OutputStreamWriter;
> +import java.io.Writer;
> +
> +import net.sourceforge.jnlp.runtime.JNLPRuntime;
> +
> +import org.ccil.cowan.tagsoup.HTMLSchema;
> +import org.ccil.cowan.tagsoup.Parser;
> +import org.ccil.cowan.tagsoup.XMLWriter;
> +import org.xml.sax.InputSource;
> +import org.xml.sax.SAXException;
> +import org.xml.sax.XMLReader;
> +
> +/**
> + * An specialized {@link XMLParser} that uses TagSoup[1] to parse
> + * malformed XML
> + *
> + * Used by net.sourceforge.jnlp.Parser
> + *
> + * [1] http://home.ccil.org/~cowan/XML/tagsoup/
> + */
> +public class MalformedXMLParser extends XMLParser {
> +
> +    /**
> +     * Parses the data from an {@link InputStream} to create a XML tree.
> +     * Returns a {@link Node} representing the root of the tree.
> +     *
> +     * @param input the {@link InputStream} to read data from
> +     * @throws ParseException if an exception occurs while parsing the input
> +     */
> +    @Override
> +    public Node getRootNode(InputStream input) throws ParseException {
> +        if (JNLPRuntime.isDebug()) {
> +            System.out.println("Using MalformedXMLParser");
> +        }
> +        InputStream xmlInput = xmlizeInputStream(input);
> +        return super.getRootNode(xmlInput);
> +    }
> +
> +    /**
> +     * Reads malformed XML from the InputStream original and returns a new
> +     * InputStream which can be used to read a well-formed version of the input
> +     *
> +     * @param original
> +     * @return an {@link InputStream} which can be used to read a well-formed
> +     * version of the input XML
> +     * @throws ParseException
> +     */
> +    private InputStream xmlizeInputStream(InputStream original) throws ParseException {
> +        try {
> +            ByteArrayOutputStream out = new ByteArrayOutputStream();
> +
> +            HTMLSchema schema = new HTMLSchema();
> +            XMLReader reader = new Parser();
> +
> +            reader.setProperty(Parser.schemaProperty, schema);
> +            reader.setFeature(Parser.bogonsEmptyFeature, false);
> +            reader.setFeature(Parser.ignorableWhitespaceFeature, true);
> +            reader.setFeature(Parser.ignoreBogonsFeature, false);
> +
> +            Writer writeger = new OutputStreamWriter(out);
> +            XMLWriter x = new XMLWriter(writeger);
> +
> +            reader.setContentHandler(x);
> +
> +            InputSource s = new InputSource(original);
> +
> +            reader.parse(s);
> +            return new ByteArrayInputStream(out.toByteArray());
> +        } catch (SAXException e) {
> +            throw new ParseException(R("PBadXML"), e);
> +        } catch (IOException e) {
> +            throw new ParseException(R("PBadXML"), e);
> +        }
> +
> +    }
> +
> +}
> diff -r dc02a605f905 netx/net/sourceforge/jnlp/Parser.java
> --- a/netx/net/sourceforge/jnlp/Parser.java	Fri Jan 07 08:00:08 2011 -0500
> +++ b/netx/net/sourceforge/jnlp/Parser.java	Mon Jan 10 19:09:30 2011 -0500
> @@ -1,5 +1,5 @@
>  // Copyright (C) 2001-2003 Jon A. Maxwell (JAM)
> -// Copyright (C) 2009 Red Hat, Inc.
> +// Copyright (C) 2011 Red Hat, Inc.
>  //
>  // This library is free software; you can redistribute it and/or
>  // modify it under the terms of the GNU Lesser General Public
> @@ -20,15 +20,13 @@
>  import static net.sourceforge.jnlp.runtime.Translator.R;
>  
>  import java.io.*;
> +import java.lang.reflect.InvocationTargetException;
> +import java.lang.reflect.Method;
>  import java.net.*;
>  import java.util.*;
> -//import javax.xml.parsers.*; // commented to use right Node
> -//import org.w3c.dom.*;       // class for using Tiny XML | NanoXML
> -//import org.xml.sax.*;
> -//import gd.xml.tiny.*;
> +
>  import net.sourceforge.jnlp.UpdateDesc.Check;
>  import net.sourceforge.jnlp.UpdateDesc.Policy;
> -import net.sourceforge.nanoxml.*;
>  
>  /**
>   * Contains methods to parse an XML document into a JNLPFile.
> @@ -105,15 +103,14 @@
>       * @param file the (uninitialized) file reference
>       * @param base if codebase is not specified, a default base for relative URLs
>       * @param root the root node
> -     * @param strict whether to enforce strict compliance with the JNLP spec
> -     * @param allowExtensions whether to allow extensions to the JNLP spec
> +     * @param settings the parser settings to use when parsing the JNLP file
>       * @throws ParseException if the JNLP file is invalid
>       */
> -    public Parser(JNLPFile file, URL base, Node root, boolean strict, boolean allowExtensions) throws ParseException {
> +    public Parser(JNLPFile file, URL base, Node root, ParserSettings settings) throws ParseException {
>          this.file = file;
>          this.root = root;
> -        this.strict = strict;
> -        this.allowExtensions = allowExtensions;
> +        this.strict = settings.isStrict();
> +        this.allowExtensions = settings.isExtensionAllowed();
>  
>          // ensure it's a JNLP node
>          if (root == null || !root.getNodeName().equals("jnlp"))
> @@ -1205,116 +1202,33 @@
>       *
>       * @throws ParseException if the JNLP file is invalid
>       */
> -    public static Node getRootNode(InputStream input) throws ParseException {
> +    public static Node getRootNode(InputStream input, ParserSettings settings) throws ParseException {
> +        String className = null;
> +        if (settings.isMalfromedXmlAllowed()) {
> +            className = "net.sourceforge.jnlp.MalformedXMLParser";
> +        } else {
> +            className = "net.sourceforge.jnlp.XMLParser";
> +        }
> +
>          try {
> -            /* SAX
> -            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
> -            factory.setValidating(false);
> -            factory.setNamespaceAware(true);
> -            DocumentBuilder builder = factory.newDocumentBuilder();
> -            builder.setErrorHandler(errorHandler);
> +            Class<?> klass = null;
> +            try {
> +                klass = Class.forName(className);
> +            } catch (ClassNotFoundException e) {
> +                klass = Class.forName("net.sourceforge.jnlp.XMLParser");
> +            }
> +            Object instance = klass.newInstance();
> +            Method m = klass.getMethod("getRootNode", InputStream.class);
>  
> -            Document doc = builder.parse(input);
> -            return doc.getDocumentElement();
> -            */
> -
> -            /* TINY
> -            Node document = new Node(TinyParser.parseXML(input));
> -            Node jnlpNode = getChildNode(document, "jnlp"); // skip comments
> -            */
> -
> -            //A BufferedInputStream is used to allow marking and reseting
> -            //of a stream.
> -            BufferedInputStream bs = new BufferedInputStream(input);
> -
> -            /* NANO */
> -            final XMLElement xml = new XMLElement();
> -            final PipedInputStream pin = new PipedInputStream();
> -            final PipedOutputStream pout = new PipedOutputStream(pin);
> -            final InputStreamReader isr = new InputStreamReader(bs, getEncoding(bs));
> -            // Clean the jnlp xml file of all comments before passing
> -            // it to the parser.
> -            new Thread(
> -                    new Runnable() {
> -                        public void run() {
> -                            (new XMLElement()).sanitizeInput(isr, pout);
> -                            try {
> -                                pout.close();
> -                            } catch (IOException ioe) {
> -                                ioe.printStackTrace();
> -                            }
> -                        }
> -                    }).start();
> -            xml.parseFromReader(new InputStreamReader(pin));
> -            Node jnlpNode = new Node(xml);
> -            return jnlpNode;
> -        } catch (Exception ex) {
> -            throw new ParseException(R("PBadXML"), ex);
> +            return (Node) m.invoke(instance, input);
> +        } catch (InvocationTargetException e) {
> +            if (e.getCause() instanceof ParseException) {
> +                throw (ParseException)(e.getCause());
> +            }
> +            throw new ParseException(R("PBadXML"), e);
> +        } catch (Exception e) {
> +            throw new ParseException(R("PBadXML"), e);
>          }
>      }
>  
> -    /**
> -     * Returns the name of the encoding used in this InputStream.
> -     *
> -     * @param input the InputStream
> -     * @return a String representation of encoding
> -     */
> -    private static String getEncoding(InputStream input) throws IOException {
> -        //Fixme: This only recognizes UTF-8, UTF-16, and
> -        //UTF-32, which is enough to parse the prolog portion of xml to
> -        //find out the exact encoding (if it exists). The reason being
> -        //there could be other encodings, such as ISO 8859 which is 8-bits
> -        //but it supports latin characters.
> -        //So what needs to be done is to parse the prolog and retrieve
> -        //the exact encoding from it.
> -
> -        int[] s = new int[4];
> -        String encoding = "UTF-8";
> -
> -        //Determine what the first four bytes are and store
> -        //them into an int array.
> -        input.mark(4);
> -        for (int i = 0; i < 4; i++) {
> -            s[i] = input.read();
> -        }
> -        input.reset();
> -
> -        //Set the encoding base on what the first four bytes of the
> -        //inputstream turn out to be (following the information from
> -        //www.w3.org/TR/REC-xml/#sec-guessing).
> -        if (s[0] == 255) {
> -            if (s[1] == 254) {
> -                if (s[2] != 0 || s[3] != 0) {
> -                    encoding = "UnicodeLittle";
> -                } else {
> -                    encoding = "X-UTF-32LE-BOM";
> -                }
> -            }
> -        } else if (s[0] == 254 && s[1] == 255 && (s[2] != 0 ||
> -                s[3] != 0)) {
> -            encoding = "UTF-16";
> -
> -        } else if (s[0] == 0 && s[1] == 0 && s[2] == 254 &&
> -                s[3] == 255) {
> -            encoding = "X-UTF-32BE-BOM";
> -
> -        } else if (s[0] == 0 && s[1] == 0 && s[2] == 0 &&
> -                s[3] == 60) {
> -            encoding = "UTF-32BE";
> -
> -        } else if (s[0] == 60 && s[1] == 0 && s[2] == 0 &&
> -                s[3] == 0) {
> -            encoding = "UTF-32LE";
> -
> -        } else if (s[0] == 0 && s[1] == 60 && s[2] == 0 &&
> -                s[3] == 63) {
> -            encoding = "UTF-16BE";
> -        } else if (s[0] == 60 && s[1] == 0 && s[2] == 63 &&
> -                s[3] == 0) {
> -            encoding = "UTF-16LE";
> -        }
> -
> -        return encoding;
> -    }
> -
>  }
> diff -r dc02a605f905 netx/net/sourceforge/jnlp/ParserSettings.java
> --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> +++ b/netx/net/sourceforge/jnlp/ParserSettings.java	Mon Jan 10 19:09:30 2011 -0500
> @@ -0,0 +1,55 @@
> +// Copyright (C) 2011 Red Hat, Inc.
> +//
> +// This library is free software; you can redistribute it and/or
> +// modify it under the terms of the GNU Lesser General Public
> +// License as published by the Free Software Foundation; either
> +// version 2.1 of the License, or (at your option) any later version.
> +//
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +// Lesser General Public License for more details.
> +//
> +// You should have received a copy of the GNU Lesser General Public
> +// License along with this library; if not, write to the Free Software
> +// Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
> +
> +package net.sourceforge.jnlp;
> +
> +/**
> + * Encapsulates settings to use with the JNLP Parser
> + */
> +public class ParserSettings {
> +
> +    private final boolean strict;
> +    private final boolean extensionAllowed;
> +    private final boolean malformedXmlAllowed;
> +
> +    /** Create a new ParserSettings with the defautl parser settings */
> +    public ParserSettings() {
> +        this(false, true, true);
> +    }
> +
> +    /** Create a new ParserSettings object */
> +    public ParserSettings(boolean strict, boolean extensionAllowed, boolean malformedXmlAllowed) {
> +        this.strict = strict;
> +        this.extensionAllowed = extensionAllowed;
> +        this.malformedXmlAllowed = malformedXmlAllowed;
> +    }
> +
> +    /** @return true if extensions to the spec are allowed */
> +    public boolean isExtensionAllowed() {
> +        return extensionAllowed;
> +    }
> +
> +    /** @return true if parsing malformed xml is allowed */
> +    public boolean isMalfromedXmlAllowed() {
> +        return malformedXmlAllowed;
> +    }
> +
> +    /** @return true if strict parsing mode is to be used */
> +    public boolean isStrict() {
> +        return strict;
> +    }
> +
> +}
> diff -r dc02a605f905 netx/net/sourceforge/jnlp/XMLParser.java
> --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> +++ b/netx/net/sourceforge/jnlp/XMLParser.java	Mon Jan 10 19:09:30 2011 -0500
> @@ -0,0 +1,163 @@
> +// Copyright (C) 2001-2003 Jon A. Maxwell (JAM)
> +// Copyright (C) 2011 Red Hat, Inc.
> +//
> +// This library is free software; you can redistribute it and/or
> +// modify it under the terms of the GNU Lesser General Public
> +// License as published by the Free Software Foundation; either
> +// version 2.1 of the License, or (at your option) any later version.
> +//
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +// Lesser General Public License for more details.
> +//
> +// You should have received a copy of the GNU Lesser General Public
> +// License along with this library; if not, write to the Free Software
> +// Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
> +
> +package net.sourceforge.jnlp;
> +
> +import static net.sourceforge.jnlp.runtime.Translator.R;
> +
> +import java.io.BufferedInputStream;
> +import java.io.IOException;
> +import java.io.InputStream;
> +import java.io.InputStreamReader;
> +import java.io.PipedInputStream;
> +import java.io.PipedOutputStream;
> +
> +import net.sourceforge.nanoxml.XMLElement;
> +
> +//import javax.xml.parsers.*; // commented to use right Node
> +//import org.w3c.dom.*;       // class for using Tiny XML | NanoXML
> +//import org.xml.sax.*;
> +//import gd.xml.tiny.*;
> +
> +/**
> + * A gateway to the actual implementation of the parsers.
> + *
> + * Used by net.sourceforge.jnlp.Parser
> + */
> +class XMLParser {
> +
> +    /**
> +     * Parses input from an InputStream and returns a Node representing the
> +     * root of the parse tree.
> +     *
> +     * @param input the {@link InputStream} containing the XML
> +     * @return a {@link Node} representing the root of the parsed XML
> +     * @throws ParseException
> +     */
> +    public Node getRootNode(InputStream input) throws ParseException {
> +
> +        try {
> +            /* SAX
> +            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
> +            factory.setValidating(false);
> +            factory.setNamespaceAware(true);
> +            DocumentBuilder builder = factory.newDocumentBuilder();
> +            builder.setErrorHandler(errorHandler);
> +
> +            Document doc = builder.parse(input);
> +            return doc.getDocumentElement();
> +            */
> +
> +            /* TINY
> +            Node document = new Node(TinyParser.parseXML(input));
> +            Node jnlpNode = getChildNode(document, "jnlp"); // skip comments
> +            */
> +
> +            //A BufferedInputStream is used to allow marking and reseting
> +            //of a stream.
> +            BufferedInputStream bs = new BufferedInputStream(input);
> +
> +            /* NANO */
> +            final XMLElement xml = new XMLElement();
> +            final PipedInputStream pin = new PipedInputStream();
> +            final PipedOutputStream pout = new PipedOutputStream(pin);
> +            final InputStreamReader isr = new InputStreamReader(bs, getEncoding(bs));
> +            // Clean the jnlp xml file of all comments before passing
> +            // it to the parser.
> +            new Thread(
> +                    new Runnable() {
> +                        public void run() {
> +                            (new XMLElement()).sanitizeInput(isr, pout);
> +                            try {
> +                                pout.close();
> +                            } catch (IOException ioe) {
> +                                ioe.printStackTrace();
> +                            }
> +                        }
> +                    }).start();
> +            xml.parseFromReader(new InputStreamReader(pin));
> +            Node jnlpNode = new Node(xml);
> +            return jnlpNode;
> +        } catch (Exception ex) {
> +            throw new ParseException(R("PBadXML"), ex);
> +        }
> +    }
> +
> +    /**
> +     * Returns the name of the encoding used in this InputStream.
> +     *
> +     * @param input the InputStream
> +     * @return a String representation of encoding
> +     */
> +    private static String getEncoding(InputStream input) throws IOException {
> +        //Fixme: This only recognizes UTF-8, UTF-16, and
> +        //UTF-32, which is enough to parse the prolog portion of xml to
> +        //find out the exact encoding (if it exists). The reason being
> +        //there could be other encodings, such as ISO 8859 which is 8-bits
> +        //but it supports latin characters.
> +        //So what needs to be done is to parse the prolog and retrieve
> +        //the exact encoding from it.
> +
> +        int[] s = new int[4];
> +        String encoding = "UTF-8";
> +
> +        //Determine what the first four bytes are and store
> +        //them into an int array.
> +        input.mark(4);
> +        for (int i = 0; i < 4; i++) {
> +            s[i] = input.read();
> +        }
> +        input.reset();
> +
> +        //Set the encoding base on what the first four bytes of the
> +        //inputstream turn out to be (following the information from
> +        //www.w3.org/TR/REC-xml/#sec-guessing).
> +        if (s[0] == 255) {
> +            if (s[1] == 254) {
> +                if (s[2] != 0 || s[3] != 0) {
> +                    encoding = "UnicodeLittle";
> +                } else {
> +                    encoding = "X-UTF-32LE-BOM";
> +                }
> +            }
> +        } else if (s[0] == 254 && s[1] == 255 && (s[2] != 0 ||
> +                s[3] != 0)) {
> +            encoding = "UTF-16";
> +
> +        } else if (s[0] == 0 && s[1] == 0 && s[2] == 254 &&
> +                s[3] == 255) {
> +            encoding = "X-UTF-32BE-BOM";
> +
> +        } else if (s[0] == 0 && s[1] == 0 && s[2] == 0 &&
> +                s[3] == 60) {
> +            encoding = "UTF-32BE";
> +
> +        } else if (s[0] == 60 && s[1] == 0 && s[2] == 0 &&
> +                s[3] == 0) {
> +            encoding = "UTF-32LE";
> +
> +        } else if (s[0] == 0 && s[1] == 60 && s[2] == 0 &&
> +                s[3] == 63) {
> +            encoding = "UTF-16BE";
> +        } else if (s[0] == 60 && s[1] == 0 && s[2] == 63 &&
> +                s[3] == 0) {
> +            encoding = "UTF-16LE";
> +        }
> +
> +        return encoding;
> +    }
> +}
> diff -r dc02a605f905 netx/net/sourceforge/jnlp/resources/Messages.properties
> --- a/netx/net/sourceforge/jnlp/resources/Messages.properties	Fri Jan 07 08:00:08 2011 -0500
> +++ b/netx/net/sourceforge/jnlp/resources/Messages.properties	Mon Jan 10 19:09:30 2011 -0500
> @@ -158,6 +158,7 @@
>  BOHeadless  = Disables download window, other UIs.
>  BOStrict    = Enables strict checking of JNLP file format.
>  BOViewer    = Shows the trusted certificate viewer.
> +BOXml       = Uses an XML parser to parse the JNLP file.
>  BXnofork    = Do not create another JVM.
>  BXclearcache= Clean the JNLP application cache.
>  BOHelp      = Print this message and exit.
> diff -r dc02a605f905 netx/net/sourceforge/jnlp/runtime/Boot.java
> --- a/netx/net/sourceforge/jnlp/runtime/Boot.java	Fri Jan 07 08:00:08 2011 -0500
> +++ b/netx/net/sourceforge/jnlp/runtime/Boot.java	Mon Jan 10 19:09:30 2011 -0500
> @@ -34,6 +34,7 @@
>  import net.sourceforge.jnlp.LaunchException;
>  import net.sourceforge.jnlp.Launcher;
>  import net.sourceforge.jnlp.ParseException;
> +import net.sourceforge.jnlp.ParserSettings;
>  import net.sourceforge.jnlp.PropertyDesc;
>  import net.sourceforge.jnlp.ResourcesDesc;
>  import net.sourceforge.jnlp.cache.CacheUtil;
> @@ -104,6 +105,7 @@
>              + "  -noupdate             " + R("BONoupdate") + "\n"
>              + "  -headless             " + R("BOHeadless") + "\n"
>              + "  -strict               " + R("BOStrict") + "\n"
> +            + "  -xml                  " + R("BOXml") + "\n"
>              + "  -Xnofork              " + R("BXnofork") + "\n"
>              + "  -Xclearcache          " + R("BXclearcache") + "\n"
>              + "  -help                 " + R("BOHelp") + "\n";
> @@ -262,13 +264,22 @@
>                  e.printStackTrace();
>          }
>  
> -        boolean strict = (null != getOption("-strict"));
> +        boolean strict = false;
> +        boolean malformedXmlAllowed = true;
>  
> -        JNLPFile file = new JNLPFile(url, strict);
> +        if (null != getOption("-strict")) {
> +            strict = true;
> +        }
> +        if (null != getOption("-xml")) {
> +            malformedXmlAllowed = false;
> +        }
> +        ParserSettings settings = new ParserSettings(strict, true, malformedXmlAllowed);
> +
> +        JNLPFile file = new JNLPFile(url, settings);
>  
>          // Launches the jnlp file where this file originated.
>          if (file.getSourceLocation() != null) {
> -            file = new JNLPFile(file.getSourceLocation(), strict);
> +            file = new JNLPFile(file.getSourceLocation(), settings);
>          }
>  
>          // add in extra params from command line
> diff -r dc02a605f905 netx/net/sourceforge/jnlp/runtime/JNLPClassLoader.java
> --- a/netx/net/sourceforge/jnlp/runtime/JNLPClassLoader.java	Fri Jan 07 08:00:08 2011 -0500
> +++ b/netx/net/sourceforge/jnlp/runtime/JNLPClassLoader.java	Mon Jan 10 19:09:30 2011 -0500
> @@ -50,6 +50,7 @@
>  import net.sourceforge.jnlp.JNLPFile;
>  import net.sourceforge.jnlp.LaunchException;
>  import net.sourceforge.jnlp.ParseException;
> +import net.sourceforge.jnlp.ParserSettings;
>  import net.sourceforge.jnlp.PluginBridge;
>  import net.sourceforge.jnlp.ResourcesDesc;
>  import net.sourceforge.jnlp.SecurityDesc;
> @@ -324,12 +325,12 @@
>       * @param version the file's version
>       * @param policy the update policy to use when downloading resources
>       */
> -    public static JNLPClassLoader getInstance(URL location, String uniqueKey, Version version, UpdatePolicy policy)
> +    public static JNLPClassLoader getInstance(URL location, String uniqueKey, Version version, ParserSettings settings, UpdatePolicy policy)
>              throws IOException, ParseException, LaunchException {
>          JNLPClassLoader loader = urlToLoader.get(uniqueKey);
>  
>          if (loader == null || !location.equals(loader.getJNLPFile().getFileLocation()))
> -            loader = getInstance(new JNLPFile(location, uniqueKey, version, false, policy), policy);
> +            loader = getInstance(new JNLPFile(location, uniqueKey, version, settings, policy), policy);
>  
>          return loader;
>      }
> @@ -348,7 +349,7 @@
>          for (int i = 0; i < ext.length; i++) {
>              try {
>                  String uniqueKey = this.getJNLPFile().getUniqueKey();
> -                JNLPClassLoader loader = getInstance(ext[i].getLocation(), uniqueKey, ext[i].getVersion(), updatePolicy);
> +                JNLPClassLoader loader = getInstance(ext[i].getLocation(), uniqueKey, ext[i].getVersion(), file.getParserSettings(), updatePolicy);
>                  loaderList.add(loader);
>              } catch (Exception ex) {
>                  ex.printStackTrace();


-- 
Andrew :)

Free Java Software Engineer
Red Hat, Inc. (http://www.redhat.com)

Support Free Java!
Contribute to GNU Classpath and IcedTea
http://www.gnu.org/software/classpath
http://icedtea.classpath.org
PGP Key: 94EFD9D8 (http://subkeys.pgp.net)
Fingerprint = F8EF F1EA 401E 2E60 15FA  7927 142C 2591 94EF D9D8



More information about the distro-pkg-dev mailing list