RFR: JDK-8153781 Issue in XMLScanner: EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when skipping large DOCTYPE section with CRLF at wrong place

Mon Apr 18 21:03:46 UTC 2016

Hi Joe,

here is the updated webrev where I incorporated your suggestions:

http://cr.openjdk.java.net/~clanger/webrevs/8153781.1/

I also added a testcase. As for the message " DoctypedeclNotClosed ": I did it in several languages but there are some languages where I don't have the knowledge to create a localized string. Is there some process for this or would we just be ok with reverting to the English text for that particular message?

Thanks in advance for your review :)

All the best
Christoph

> -----Original Message-----
> From: huizhe wang [mailto:huizhe.wang at oracle.com]
> Sent: Dienstag, 12. April 2016 23:28
> To: Langer, Christoph <christoph.langer at sap.com>
> Cc: core-libs-dev at openjdk.java.net
> Subject: Re: RFR: JDK-8153781 Issue in XMLScanner:
> EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when skipping
> large DOCTYPE section with CRLF at wrong place
>
>
> On 4/12/2016 11:50 AM, Langer, Christoph wrote:
> > Hi Joe,
> >
> > thanks as always for your suggestions and I'll try to work it in. It will probably
> take me a little while as I'm currently busy with another thing. I'll update my
> webrev eventually and add a testcase, too.
>
> Ok.
> >
> > But one question: I understand that the fix in skipDTD will be sufficient to fix
> the issue. Nevetheless, can you explain me why the change in scanData is not
> beneficial or could even cause issues? There are several places in scanData
> when further loads are done. But only at this point where there's exactly one
> character after CRLF at the end of the buffer the method just returns without
> further load. If it wouldn't leave here it seems to me as if it would continue
> correctly and load more data. That would probably also be better in the sense
> of performance I guess??
>
> It's there to handle the situation where a surrogate pair got split in
> between buffers.
>
> -Joe
>
> >
> > Thanks
> > Christoph
> >
> >> -----Original Message-----
> >> From: huizhe wang [mailto:huizhe.wang at oracle.com]
> >> Sent: Dienstag, 12. April 2016 19:54
> >> To: Langer, Christoph <christoph.langer at sap.com>
> >> Cc: core-libs-dev at openjdk.java.net
> >> Subject: Re: RFR: JDK-8153781 Issue in XMLScanner:
> >> EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when
> skipping
> >> large DOCTYPE section with CRLF at wrong place
> >>
> >> Also, EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET was a
> >> wrong msg
> >> id. It would be good to change that to DoctypedeclNotClosed and add a
> >> message to XMLMessages.properties right before
> DoctypedeclUnterminated,
> >> sth. like the following:
> >>
> >> DoctypedeclNotClosed = The document type declaration for root element
> >> type \"{0}\" must be closed with '']''.
> >>
> >> Thanks,
> >> Joe
> >>
> >> On 4/11/2016 5:49 PM, huizhe wang wrote:
> >>> On 4/7/2016 1:45 PM, Langer, Christoph wrote:
> >>>> Hi,
> >>>>
> >>>>
> >>>>
> >>>> I've run into a peculiar issue with Xerces.
> >>>>
> >>>>
> >>>>
> >>>> The problem is happening when a DTD shall be skipped, the DTD is
> >>>> larger than the buffer of the current entity and a CRLF sequence
> >>>> occurs just one char before the buffer end.
> >>>>
> >>>>
> >>>>
> >>>> The reason is that method skipDTD of class XMLDTDScannerImpl (about
> >>>> line 389) calls XMLEntityScanner.scanData() to scan for the next
> >>>> occurrence of ']'. The scanData method might return true which
> >>>> indicates that the delimiter ']' was not found yet and more data is
> >>>> to scan. Other users of scanData would handle this and call this
> >>>> method in a loop until it returns false or some other condition
> >>>> happens. So I've fixed that place like at the other callers of scanData.
> >>> This part of the change looks good.
> >>>>
> >>>>
> >>>> Nevertheless, the scanData method would usually load more data when
> >>>> it is at the end of the buffer. But in the special case when CRLF is
> >>>> found at the end of buffer - 1, scanData would just return true. So I
> >>>> also removed that check at line 1374 in XMLEntityScanner. Do you see
> >>>> any problem with that? Is there any reason behind it which I'm
> >>>> overseeing?
> >>> No need to remove this after the above change. The parser needs to
> >>> retain what's in the xml, e.g., not removing new lines.
> >>>> Furthermore I took the chance for further little cleanups. I've added
> >>>> the new copyright header to the files... is that the correct one?
> >>> Yes, that's the right license header. However,
> >>>>
> >>>> I also aligned the calls to invokeListeners(position) in
> >>>> XMLEntityScanner to always pass the actual position from which the
> >>>> load is started. Do you think this is correct?
> >>> Yes.
> >>>>
> >>>>
> >>>> Here is the bug:
> >>>>
> >>>> https://bugs.openjdk.java.net/browse/JDK-8153781
> >>>>
> >>>>
> >>>>
> >>>> Here is the webrev:
> >>>>
> >>>> http://cr.openjdk.java.net/~clanger/webrevs/8153781.0/
> >>>>
> >>>>
> >>>>
> >>>> Please give me some comments before I finalize my change including a
> >>>> jtreg testcase.
> >>> It would be better if you had included the testcase so that the review
> >>> can be done together with the code change.
> >>>
> >>> Thanks,
> >>> Joe
> >>>
> >>>>
> >>>>
> >>>> Thanks & Best regards
> >>>>
> >>>> Christoph
> >>>>
> >>>>
> >>>>