Fwd: [rfc][icedtea-web] renewed tagsoup

Jiri Vanek jvanek at redhat.com
Thu Jun 20 08:17:44 PDT 2013


On 06/10/2013 05:39 PM, Adam Domurad wrote:
> On 06/04/2013 09:21 AM, Jiri Vanek wrote:
>> attached missing attachment
>>
>> -------- Original Message --------
>> Subject: [rfc][icedtea-web] renewed tagsoup
>> Date: Tue, 04 Jun 2013 15:18:40 +0200
>> From: Jiri Vanek <jvanek at redhat.com>
>> To: IcedTea Distro List <distro-pkg-dev at openjdk.java.net>
>>
>> Again, I'm trying to reincarnate old Omair's  tagsoup patch.
>> As this was developed by Omair, then rewritten by Adam, and now by Me, I
>
> I wouldn't say I rewrote it.

Thank for review anyway :)
>
>> think it is quite reviwed :)
>> tests are adapted, and few added
>>
>> Ok for head? Some tests included.
>>
>> Also do not forget  taht the results of this patch are visible only with
>> tagsoup installed.  Without it, or with -xml it behaves as normlay.
>> (note, this will need to go to wiki)
>> By the way the PR1026 is fixed by this, and there can come more
>>
>> I have already made quite deep testing, and some more testing will come,
>> so please do not halt this for tests. Soem reproducers are in row.
>>
>> J.
>
> Thank you very much for doing this!
>
> We should collect what we could be fixed by this:
>
> 1. Mario noticed invalid XML warnings here:
>
> javaws http://docs.oracle.com/javase/tutorialJWS/uiswing/misc/ex7/GradientTranslucentWindowDemo.jnlp
> Status: Still broken. Unfortunately tagsoup cannot fix this unclosed string. I wonder what if
> anything we should do about this ?
>
> 2. From PR1026, applet at http://www.knuddels.de:8080/index.html?v=90aez&c=7
> Status: Now works (as you have noted)
>
> Actually I'm having trouble finding more. We should aim to support as much broken XML as there is
> out there, so hopefully we have more than one confirmed fixed case before deciding to go this route.

I believe there is not osu many of them. But they exists. I recently found my old one (but already 
fixed)
>
>
> Comments inline:
>
>> diff -r 6209aba5fa14 NEWS
>> --- a/NEWS    Tue Jun 04 13:22:38 2013 +0200
>> +++ b/NEWS    Tue Jun 04 14:46:38 2013 +0200
>> @@ -9,8 +9,10 @@
>>  CVE-XXXX-YYYY: http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=XXXX-YYYY
>>
>>  New in release 1.5 (2013-XX-XX):
>> +* IcedTea-Web now using tagsoup as default (tagsoup depndence) sanitizer for input
>
> [typo] depndence -> dependence

fixed
>
>>  * NetX
>>   - Netx can now parse malformed jnlp files using tagsoup
>
> Huh, why is this already in the diff ? This is a better message, IMO. 'for input' is pretty vague.

Because it have fixed the bug. And xml parsing is no longer stricxt with tagsoup

>
>> + - PR1026 - Apps fail to run because of the nanoxml parser's strict XML validation
>>  * Plugin
>>    - PR854: Resizing an applet several times causes 100% CPU load

>> @@ -472,6 +486,10 @@
>>      mkdir -p stamps
>>      touch $@
>>
>> +netx-dummy.jar:
>> +    echo "Manifest-Version: 1.0" > netx-dummy-manifest.mf
>> +    jar cfm $@ netx-dummy-manifest.mf
>> +
>
> What is this used for ?

Good catch. It was in  one older versions. And I removed all traces of this  - and missed this two. 
Removed. thanx!
>
>>  stamps/netx-dist.stamp: stamps/netx.stamp $(abs_top_builddir)/netx.manifest
>>      (cd $(NETX_DIR) ; \
>>       mkdir -p lib ; \
>> @@ -486,6 +504,7 @@
>>
>>  clean-netx:
>>      rm -rf $(NETX_DIR)
>> +    rm -f netx-dummy.jar
>>      rm -f stamps/netx-dist.stamp
>>      rm -f netx-source-files.txt
>>      rm -f stamps/netx.stamp
>> @@ -1006,7 +1025,7 @@
>>      mkdir -p $(NETX_UNIT_TEST_DIR) && \

>
>
>
>> diff -r 01b48fc84f85 -r 6209aba5fa14 NEWS
>> --- a/NEWS    Mon Jun 03 14:16:42 2013 +0200
>> +++ b/NEWS    Tue Jun 04 13:22:38 2013 +0200
>
> Two diffs to the same file ? How'd that happen :-)

?? I'm wondering to!
>
>> @@ -9,6 +9,8 @@
>>  CVE-XXXX-YYYY: http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=XXXX-YYYY
>>
>>  New in release 1.5 (2013-XX-XX):
>> +* NetX
>> + - Netx can now parse malformed jnlp files using tagsoup
>>  * Plugin
>>    - PR854: Resizing an applet several times causes 100% CPU load
>>
>> diff -r 01b48fc84f85 -r 6209aba5fa14 acinclude.m4
>> --- a/acinclude.m4    Mon Jun 03 14:16:42 2013 +0200
>> +++ b/acinclude.m4    Tue Jun 04 13:22:38 2013 +0200
>> @@ -403,6 +403,31 @@
>>  fi
>>  ])
>>

snip

>> -    private JNLPFile(Reader input, boolean strict) throws ParseException {
>> -        // todo: now that we are using NanoXML we can use a Reader
>> -        //parse(Parser.getRootNode(input), strict, null);
>> -    }
>> +//    private JNLPFile(Reader input, ParserSettings settings) throws ParseException {
>> +//        todo: now that we are using NanoXML we can use a Reader
>> +//        parse(Parser.getRootNode(input), settings, null);
>> +//    }
>
> I'm in favour of just axing this.

ok
>
> [..snip..]

>> +            HTMLSchema schema = new HTMLSchema();
>> +            XMLReader reader = new Parser();
>> +
>> +            reader.setProperty(Parser.schemaProperty, schema);
>> +            reader.setFeature(Parser.bogonsEmptyFeature, false);
>> +            reader.setFeature(Parser.ignorableWhitespaceFeature, true);
>> +            reader.setFeature(Parser.ignoreBogonsFeature, false);
>
> It'd be nice to get some comments here about why these are chosen.

just hinted.. Needs more investigations
>
> [..snip..]
>> diff -r 01b48fc84f85 -r 6209aba5fa14 netx/net/sourceforge/jnlp/XmlParser.java
>> --- /dev/null    Thu Jan 01 00:00:00 1970 +0000
>> +++ b/netx/net/sourceforge/jnlp/XmlParser.java    Tue Jun 04 13:22:38 2013 +0200
>> @@ -0,0 +1,183 @@
>> +/*
>> +   Copyright (C) 2013 Red Hat, Inc.
>> +
>> +This file is part of IcedTea.
>> +
>> +IcedTea is free software; you can redistribute it and/or
>> +modify it under the terms of the GNU General Public License as published by
>> +the Free Software Foundation, version 2.
>> +
>> +IcedTea is distributed in the hope that it will be useful,
>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +General Public License for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with IcedTea; see the file COPYING.  If not, write to
>> +the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>> +02110-1301 USA.
>> +
>> +Linking this library statically or dynamically with other modules is
>> +making a combined work based on this library.  Thus, the terms and
>> +conditions of the GNU General Public License cover the whole
>> +combination.
>> +
>> +As a special exception, the copyright holders of this library give you
>> +permission to link this library with independent modules to produce an
>> +executable, regardless of the license terms of these independent
>> +modules, and to copy and distribute the resulting executable under
>> +terms of your choice, provided that you also meet, for each linked
>> +independent module, the terms and conditions of the license of that
>> +module.  An independent module is a module which is not derived from
>> +or based on this library.  If you modify this library, you may extend
>> +this exception to your version of the library, but you are not
>> +obligated to do so.  If you do not wish to do so, delete this
>> +exception statement from your version.
>> + */
>> +
>> +package net.sourceforge.jnlp;
>> +
>> +import static net.sourceforge.jnlp.runtime.Translator.R;
>> +
>> +import java.io.BufferedInputStream;
>> +import java.io.IOException;
>> +import java.io.InputStream;
>> +import java.io.InputStreamReader;
>> +import java.io.PipedInputStream;
>> +import java.io.PipedOutputStream;
>> +
>> +import net.sourceforge.nanoxml.XMLElement;
>> +
>> +//import javax.xml.parsers.*; // commented to use right Node
>> +//import org.w3c.dom.*;       // class for using Tiny XML | NanoXML
>> +//import org.xml.sax.*;
>> +//import gd.xml.tiny.*;
>
> [nit] Hm can we remove these ? I don't insist but they aren't clarifying anything for me, they're
> just confusing.

I would rather keep them in. as remainder of our shame - and as remainder that we can use DOM once :)
>
> [..snip..]
>> diff -r 01b48fc84f85 -r 6209aba5fa14 netx/net/sourceforge/jnlp/resources/Messages.properties
>> --- a/netx/net/sourceforge/jnlp/resources/Messages.properties    Mon Jun 03 14:16:42 2013 +0200
>> +++ b/netx/net/sourceforge/jnlp/resources/Messages.properties    Tue Jun 04 13:22:38 2013 +0200
>> @@ -189,6 +189,7 @@
>>  BOHeadless  = Disables download window, other UIs.
>>  BOStrict    = Enables strict checking of JNLP file format.
>>  BOViewer    = Shows the trusted certificate viewer.
>> +BOXml       = Uses an strict XML parser to parse the JNLP file.
>
> [typo] 'Uses an strict' -> 'Uses a strict'

Fixed (how come! not my sentence! :) )
>
> [..snip..]
>
> Overall looks good to me.
>
> Cheers,
> -Adam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tagsoup2.diff
Type: text/x-patch
Size: 99429 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/distro-pkg-dev/attachments/20130620/8c96dd19/tagsoup2.diff 


More information about the distro-pkg-dev mailing list