RFR(JDK12/JAXP/java.xml) 8207760: SAXException: Invalid UTF-16 surrogate detected: d83c ?

Joe Wang huizhe.wang at oracle.com
Fri Sep 14 03:13:48 UTC 2018


Thanks Daniel.

I changed the return of writeUTF16Surrogate so that it is clearer within 
writeUTF16Surrogate when an additional index increment is needed. Other 
corresponding changes are in ToHTMLStream and ToTextStream where similar 
calls to the method are made. It's been an issue in some part of JAXP 
impl where error or warning messages are printed out to the console 
(e.g. JDK-8000621). But I kept it as is in ToTextStream for this patch.

webrev01:
http://cr.openjdk.java.net/~joehw/jdk12/8207760/webrev01/

Best,
Joe

On 9/13/18, 2:23 AM, Daniel Fuchs wrote:
> Hi Joe,
>
> On 13/09/2018 00:25, Lance Andersen wrote:
>> Hi Joe,
>>
>> The change  seems reasonable
>
> Agreed. However the following condition in ToStream::handleEscaping
> is a bit cryptic:
>
> 1155             if ((ihs && (i + 1 < end)) || (ils && i != 0)) {
> 1156                 i++ ; // process two input characters
> 1157             }
>
> could the comment be fleshed out to explain it?
>
> I suspect that: `(ihs && (i + 1 < end))` means that
> `writeUTF16Surrogate(c, ch, i, end);` has written the two surrogate, in
> which case i should be incremented in order to skip the low surrogate
> which has just been written.
>
> I am not sure what `(ils && i != 0)` means, though...
>
> best regards
>
> -- daniel
>
>>
>>> On Sep 12, 2018, at 2:11 PM, Joe Wang <huizhe.wang at oracle.com> wrote:
>>>
>>> Hi,
>>>
>>> Please review a patch for a situation where a surrogate pair is at 
>>> the edge of a buffer. What the existing impl did was to report it as 
>>> an error. This patch fixes it by caching the high surrogate and 
>>> prints it out along with the low surrogate. Similar issue exists 
>>> also in the CDATA section and is fixed in this patch. The CDATA impl 
>>> had a couple of bugs where an indent could be written inside the 
>>> CDATA and an unicode character written in between two CDATA 
>>> sections. Both are fixed in this patch.
>>>
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8207760
>>> webrevs: http://cr.openjdk.java.net/~joehw/jdk12/8207760/webrev/
>>>
>>> Thanks,
>>> Joe
>>>
>>>
>>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif> 
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance 
>> Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>>
>>
>>
>


More information about the core-libs-dev mailing list