RFR: JDK-8153781 Issue in XMLScanner: EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when skipping large DOCTYPE section with CRLF at wrong place

Langer, Christoph christoph.langer at sap.com
Tue Apr 19 13:54:55 UTC 2016


Hi again,

I've updated the testcase to increase the probability of it really hitting the issue. The old version was prone to miss the issue in case something changed in the parser.

http://cr.openjdk.java.net/~clanger/webrevs/8153781.2/

Best regards
Christoph



> -----Original Message-----
> From: Langer, Christoph
> Sent: Montag, 18. April 2016 23:04
> To: 'huizhe wang' <huizhe.wang at oracle.com>
> Cc: core-libs-dev at openjdk.java.net
> Subject: RE: RFR: JDK-8153781 Issue in XMLScanner:
> EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when skipping
> large DOCTYPE section with CRLF at wrong place
> 
> Hi Joe,
> 
> here is the updated webrev where I incorporated your suggestions:
> 
> http://cr.openjdk.java.net/~clanger/webrevs/8153781.1/
> 
> I also added a testcase. As for the message " DoctypedeclNotClosed ": I did it in
> several languages but there are some languages where I don't have the
> knowledge to create a localized string. Is there some process for this or would
> we just be ok with reverting to the English text for that particular message?
> 
> Thanks in advance for your review :)
> 
> All the best
> Christoph
> 
> > -----Original Message-----
> > From: huizhe wang [mailto:huizhe.wang at oracle.com]
> > Sent: Dienstag, 12. April 2016 23:28
> > To: Langer, Christoph <christoph.langer at sap.com>
> > Cc: core-libs-dev at openjdk.java.net
> > Subject: Re: RFR: JDK-8153781 Issue in XMLScanner:
> > EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when skipping
> > large DOCTYPE section with CRLF at wrong place
> >
> >
> > On 4/12/2016 11:50 AM, Langer, Christoph wrote:
> > > Hi Joe,
> > >
> > > thanks as always for your suggestions and I'll try to work it in. It will
> probably
> > take me a little while as I'm currently busy with another thing. I'll update my
> > webrev eventually and add a testcase, too.
> >
> > Ok.
> > >
> > > But one question: I understand that the fix in skipDTD will be sufficient to fix
> > the issue. Nevetheless, can you explain me why the change in scanData is not
> > beneficial or could even cause issues? There are several places in scanData
> > when further loads are done. But only at this point where there's exactly one
> > character after CRLF at the end of the buffer the method just returns without
> > further load. If it wouldn't leave here it seems to me as if it would continue
> > correctly and load more data. That would probably also be better in the sense
> > of performance I guess??
> >
> > It's there to handle the situation where a surrogate pair got split in
> > between buffers.
> >
> > -Joe
> >
> > >
> > > Thanks
> > > Christoph
> > >
> > >> -----Original Message-----
> > >> From: huizhe wang [mailto:huizhe.wang at oracle.com]
> > >> Sent: Dienstag, 12. April 2016 19:54
> > >> To: Langer, Christoph <christoph.langer at sap.com>
> > >> Cc: core-libs-dev at openjdk.java.net
> > >> Subject: Re: RFR: JDK-8153781 Issue in XMLScanner:
> > >> EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when
> > skipping
> > >> large DOCTYPE section with CRLF at wrong place
> > >>
> > >> Also, EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET was a
> > >> wrong msg
> > >> id. It would be good to change that to DoctypedeclNotClosed and add a
> > >> message to XMLMessages.properties right before
> > DoctypedeclUnterminated,
> > >> sth. like the following:
> > >>
> > >> DoctypedeclNotClosed = The document type declaration for root element
> > >> type \"{0}\" must be closed with '']''.
> > >>
> > >> Thanks,
> > >> Joe
> > >>
> > >> On 4/11/2016 5:49 PM, huizhe wang wrote:
> > >>> On 4/7/2016 1:45 PM, Langer, Christoph wrote:
> > >>>> Hi,
> > >>>>
> > >>>>
> > >>>>
> > >>>> I've run into a peculiar issue with Xerces.
> > >>>>
> > >>>>
> > >>>>
> > >>>> The problem is happening when a DTD shall be skipped, the DTD is
> > >>>> larger than the buffer of the current entity and a CRLF sequence
> > >>>> occurs just one char before the buffer end.
> > >>>>
> > >>>>
> > >>>>
> > >>>> The reason is that method skipDTD of class XMLDTDScannerImpl (about
> > >>>> line 389) calls XMLEntityScanner.scanData() to scan for the next
> > >>>> occurrence of ']'. The scanData method might return true which
> > >>>> indicates that the delimiter ']' was not found yet and more data is
> > >>>> to scan. Other users of scanData would handle this and call this
> > >>>> method in a loop until it returns false or some other condition
> > >>>> happens. So I've fixed that place like at the other callers of scanData.
> > >>> This part of the change looks good.
> > >>>>
> > >>>>
> > >>>> Nevertheless, the scanData method would usually load more data when
> > >>>> it is at the end of the buffer. But in the special case when CRLF is
> > >>>> found at the end of buffer - 1, scanData would just return true. So I
> > >>>> also removed that check at line 1374 in XMLEntityScanner. Do you see
> > >>>> any problem with that? Is there any reason behind it which I'm
> > >>>> overseeing?
> > >>> No need to remove this after the above change. The parser needs to
> > >>> retain what's in the xml, e.g., not removing new lines.
> > >>>> Furthermore I took the chance for further little cleanups. I've added
> > >>>> the new copyright header to the files... is that the correct one?
> > >>> Yes, that's the right license header. However,
> > >>>>
> > >>>> I also aligned the calls to invokeListeners(position) in
> > >>>> XMLEntityScanner to always pass the actual position from which the
> > >>>> load is started. Do you think this is correct?
> > >>> Yes.
> > >>>>
> > >>>>
> > >>>> Here is the bug:
> > >>>>
> > >>>> https://bugs.openjdk.java.net/browse/JDK-8153781
> > >>>>
> > >>>>
> > >>>>
> > >>>> Here is the webrev:
> > >>>>
> > >>>> http://cr.openjdk.java.net/~clanger/webrevs/8153781.0/
> > >>>>
> > >>>>
> > >>>>
> > >>>> Please give me some comments before I finalize my change including a
> > >>>> jtreg testcase.
> > >>> It would be better if you had included the testcase so that the review
> > >>> can be done together with the code change.
> > >>>
> > >>> Thanks,
> > >>> Joe
> > >>>
> > >>>>
> > >>>>
> > >>>> Thanks & Best regards
> > >>>>
> > >>>> Christoph
> > >>>>
> > >>>>
> > >>>>




More information about the core-libs-dev mailing list