Fwd: [rfc][icedtea-web] renewed tagsoup

Fri Jun 21 00:58:57 PDT 2013

On 06/20/2013 07:09 PM, Adam Domurad wrote:
> OK. Looks ready to push after one nit and one typo inside a method name.

thanx pushing.
>
> Do you have any thoughts about the Oracle examples still being broken ? It is rather unfortunate. I
> don't believe tagsoup can support these.
>
> Oracle's parser is quite quirky. It seems like it is written like that tag parser we got rid of.
>
> To fix it, we would need to hack on some support for assuming that:
>
> sometag = "somestring<EOL>
>
> should become
> sometag = "somestring"<EOL>

Hmm.. Contribute to tagsoup?
>
>
> [..snip..]
>
>> + * [1] http://home.ccil.org/~cowan/XML/tagsoup/
>> + */
>> +public class MalformedXMLParser extends XMLParser {
>> +
>> +    /**
>> +     * Parses the data from an {@link InputStream} to create a XML tree.
>> +     * Returns a {@link Node} representing the root of the tree.
>> +     *
>> +     * @param input the {@link InputStream} to read data from
>> +     * @throws ParseException if an exception occurs while parsing the input
>> +     */
>> +    @Override
>> +    public Node getRootNode(InputStream input) throws ParseException {
>> +        if (JNLPRuntime.isDebug()) {
>> +            System.out.println("Using MalformedXMLParser");
>> +        }
>> +        InputStream xmlInput = xmlizeInputStream(input);
>> +        return super.getRootNode(xmlInput);
>> +    }
>> +
>> +    /**
>> +     * Reads malformed XML from the InputStream original and returns a new
>> +     * InputStream which can be used to read a well-formed version of the input
>> +     *
>> +     * @param original
>> +     * @return an {@link InputStream} which can be used to read a well-formed
>> +     * version of the input XML
>> +     * @throws ParseException
>> +     */
>> +    private InputStream xmlizeInputStream(InputStream original) throws ParseException {
>> +        try {
>> +            ByteArrayOutputStream out = new ByteArrayOutputStream();
>> +
>> +            HTMLSchema schema = new HTMLSchema();
>> +            XMLReader reader = new Parser();
>> +
>> +            //TODO walk through the javadoc and tune more such a settings
>> +            //see tagsoup javadoc for details
>
> [nit] s/such a//

removed
>
> Just a note, I played around with them but couldn't find anything particularly useful.

thanx! I have just brief look, so I assume you saved me a bunch of time.
>
>> +            reader.setProperty(Parser.schemaProperty, schema);
>> +            reader.setFeature(Parser.bogonsEmptyFeature, false);
>> +            reader.setFeature(Parser.ignorableWhitespaceFeature, true);
>> +            reader.setFeature(Parser.ignoreBogonsFeature, false);
>> +
>> +            Writer writeger = new OutputStreamWriter(out);
>> +            XMLWriter x = new XMLWriter(writeger);
>> +

>> +    /** @return true if extensions to the spec are allowed */
>> +    public boolean isExtensionAllowed() {
>> +        return extensionAllowed;
>> +    }
>> +
>> +    /** @return true if parsing malformed xml is allowed */
>> +    public boolean isMalfromedXmlAllowed() {
>
> s/Malfromed/Malformed/

fixed
>
>> +        return malformedXmlAllowed;
>> +    }
>> +
>> +    /** @return true if strict parsing mode is to be used */
>>      public boolean isStrict() {
>>          return isStrict;
>>      }
>> -}
>> +
>> +}
>> \ No newline at end of file
>> diff -r e09b9813d6de netx/net/sourceforge/jnlp/PluginBridge.java
>> --- a/netx/net/sourceforge/jnlp/PluginBridge.java    Thu Jun 20 17:00:52 2013 +0200
>> +++ b/netx/net/sourceforge/jnlp/PluginBridge.java    Thu Jun 20 17:16:59 2013 +0200
>> @@ -96,14 +96,15 @@
>>              try {
>>                  // Use codeBase as the context for the URL. If jnlp_href's
>>                  // value is a complete URL, it will replace codeBase's context.
>
> [..snip..]
>
> Thank you for handling this!
>
> Happy hacking,
> -Adam