[icedtea-web] RFC: PR766 javaws fails to parse an <argument> node that contains CDATA

Omair Majid omajid at redhat.com
Wed Sep 21 13:49:19 PDT 2011


On 09/19/2011 01:44 PM, Omair Majid wrote:
> On 09/19/2011 01:01 PM, Deepak Bhole wrote:
>> * Omair Majid<omajid at redhat.com> [2011-09-16 18:03]:
>>> Hi,
>>>
>>> As explained in the bug report [1], icedtea-web ignores CDATA
>>> sections in JNLP files. I dug into the parser code a bit and found a
>>> few things.
>>>
>>> The code that sanitizes comments from jnlp files (sanitizeInput)
>>> actually removes CDATA sections along with comments. This is
>>> probably a bug.
>>>
>>
>> Agreed. I don't know why CDATA is being thrown away either.
>>
>>> (As an aside, I suspect that the reason this is done separately from
>>> the main parser is because the main parser is a rather strict XML
>>> parser and some of the locations where comments can appear in jnlp
>>> files are not accepted by the XML standard. See
>>> https://bugzilla.redhat.com/show_bug.cgi?id=449160 for examples of
>>> jnlp files that contain invalid comments)
>>>
>>> The NanoXML parser itself can parse and see CDATA sections. However,
>>> it is not quite perfect. It has trouble parsing when CDATA sections
>>> appear in certain places, or are surrounded by certain elements.
>>>

[snip]

>> And the
>> changes to handle CDATA will be in another patch?
>
> That's the plan. I will have to look into the code a little more before
> I can figure out exactly what needs to be corrected.
>

[snip]

> If you mean the parser in icedtea-web chokes on CDATA sections (or
> handles these CDATA sections differently) - as shown in the test results
> - then yes, that might be a problem. As stated above I intend to write a
> patch with additional tests for this. As an additional precaution, I
> would like to avoid backporting this patch to any of the release branches.
>

So I spent some time looking into it and it turns out that the version 
of NanoXML that we use does not support mixed content [1] - so it's not 
a problem with using CDATA sections, it's a problem with intermixing 
text and elements.

The good news is that JNLP files should not contain mixed content. 
Nothing the Oracle's docs [2] suggests that mixed content can occur in 
JNLP files. Perhaps another thing to consider is that we have not seen 
any bugs filed about problems in jnlp files caused by mixed content. So 
this should not occur in practice.

Of course, I could implement this, but it turns out that NanoXML/Lite 
(which is what was embedded in netx) does not support that pretty much 
by design [3]. NanoXML/Java does support that [3], but I am hesitant to 
swap out our parser like this. And if I were swapping out the parser, I 
would rather use a parser meant to handle malformed XML documents - like 
tagsoup [4] - along with a well-tested XML parser.

In fact, after considering the matter, I would like push just the two 
patches I posted earlier in this thread and close the bug.

Any thoughts or comments?

Cheers,
Omair

[1] http://www.w3.org/TR/xml/#sec-mixed-content
[2] 
http://download.oracle.com/javase/6/docs/technotes/guides/javaws/developersguide/syntax.html
[3] http://devkix.com/nanoxml.php
[4] 
http://thread.gmane.org/gmane.comp.java.openjdk.distro-packaging.devel/11663



More information about the distro-pkg-dev mailing list