[icedtea-web] RFC: PR766 javaws fails to parse an <argument> node that contains CDATA
Omair Majid
omajid at redhat.com
Wed Sep 21 13:49:19 PDT 2011
On 09/19/2011 01:44 PM, Omair Majid wrote:
> On 09/19/2011 01:01 PM, Deepak Bhole wrote:
>> * Omair Majid<omajid at redhat.com> [2011-09-16 18:03]:
>>> Hi,
>>>
>>> As explained in the bug report [1], icedtea-web ignores CDATA
>>> sections in JNLP files. I dug into the parser code a bit and found a
>>> few things.
>>>
>>> The code that sanitizes comments from jnlp files (sanitizeInput)
>>> actually removes CDATA sections along with comments. This is
>>> probably a bug.
>>>
>>
>> Agreed. I don't know why CDATA is being thrown away either.
>>
>>> (As an aside, I suspect that the reason this is done separately from
>>> the main parser is because the main parser is a rather strict XML
>>> parser and some of the locations where comments can appear in jnlp
>>> files are not accepted by the XML standard. See
>>> https://bugzilla.redhat.com/show_bug.cgi?id=449160 for examples of
>>> jnlp files that contain invalid comments)
>>>
>>> The NanoXML parser itself can parse and see CDATA sections. However,
>>> it is not quite perfect. It has trouble parsing when CDATA sections
>>> appear in certain places, or are surrounded by certain elements.
>>>
[snip]
>> And the
>> changes to handle CDATA will be in another patch?
>
> That's the plan. I will have to look into the code a little more before
> I can figure out exactly what needs to be corrected.
>
[snip]
> If you mean the parser in icedtea-web chokes on CDATA sections (or
> handles these CDATA sections differently) - as shown in the test results
> - then yes, that might be a problem. As stated above I intend to write a
> patch with additional tests for this. As an additional precaution, I
> would like to avoid backporting this patch to any of the release branches.
>
So I spent some time looking into it and it turns out that the version
of NanoXML that we use does not support mixed content [1] - so it's not
a problem with using CDATA sections, it's a problem with intermixing
text and elements.
The good news is that JNLP files should not contain mixed content.
Nothing the Oracle's docs [2] suggests that mixed content can occur in
JNLP files. Perhaps another thing to consider is that we have not seen
any bugs filed about problems in jnlp files caused by mixed content. So
this should not occur in practice.
Of course, I could implement this, but it turns out that NanoXML/Lite
(which is what was embedded in netx) does not support that pretty much
by design [3]. NanoXML/Java does support that [3], but I am hesitant to
swap out our parser like this. And if I were swapping out the parser, I
would rather use a parser meant to handle malformed XML documents - like
tagsoup [4] - along with a well-tested XML parser.
In fact, after considering the matter, I would like push just the two
patches I posted earlier in this thread and close the bug.
Any thoughts or comments?
Cheers,
Omair
[1] http://www.w3.org/TR/xml/#sec-mixed-content
[2]
http://download.oracle.com/javase/6/docs/technotes/guides/javaws/developersguide/syntax.html
[3] http://devkix.com/nanoxml.php
[4]
http://thread.gmane.org/gmane.comp.java.openjdk.distro-packaging.devel/11663
More information about the distro-pkg-dev
mailing list