<Swing Dev> <Swind Dev> [7] Review request for 6836089: Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair

Vladislav Karnaukhov Vladislav.Karnaukhov at oracle.com
Wed Aug 29 15:13:11 UTC 2012


Hello,

has anyone got a chance to review this?

Regards,
- Vlad

On 8/27/2012 6:51 PM, Vladislav Karnaukhov wrote:
> Hello,
>
> could you please review a new version?
> Please find webrev here: 
> http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.04/
>
> Regards,
> - Vlad
>
> On 6/29/2012 2:09 AM, Phil Race wrote:
>> That would work but I think its cleaner to just move it all into 
>> mapNumericReference
>> as below.
>>
>> -phil.
>>
>> diff --git 
>> a/src/share/classes/javax/swing/text/html/parser/Parser.java 
>> b/src/share/classes/javax/swing/text/html/parser/Parser.java
>> --- a/src/share/classes/javax/swing/text/html/parser/Parser.java
>> +++ b/src/share/classes/javax/swing/text/html/parser/Parser.java
>> @@ -952,7 +952,7 @@
>>                          ch = readCh();
>>                          break;
>>                  }
>> -                char data[] = {mapNumericReference((char) n)};
>> +                char data[] = mapNumericReference(n);
>>                  return data;
>>              }
>>              addString('#');
>> @@ -1021,7 +1021,7 @@
>>      }
>>
>>      /**
>> -     * Converts numeric character reference to Unicode character.
>> +     * Converts numeric character reference to char array.
>>       *
>>       * Normally the code in a reference should be always converted
>>       * to the Unicode character with the same code, but due to
>> @@ -1030,13 +1030,21 @@
>>       * to displayable characters with other codes.
>>       *
>>       * @param c the code of numeric character reference.
>> -     * @return the character corresponding to the reference code.
>> +     * @return a char array corresponding to the reference code.
>>       */
>> -    private char mapNumericReference(char c) {
>> -        if (c < 130 || c > 159) {
>> -            return c;
>> +    private char[] mapNumericReference(int c) {
>> +        char[] data;
>> +        if (c >= 0xffff) { // outside unicode BMP.
>> +            try {
>> +                data = Character.toChars(c);
>> +            } catch (IllegalArgumentException e) {
>> +                data = new char[0];
>>          }
>> -        return cp1252Map[c - 130];
>> +        } else {
>> +            data = new char[1];
>> +            data[0] = (c < 130 || c > 159) ? (char)c : cp1252Map[c - 
>> 130];
>> +        }
>> +        return data;
>>      }
>>
>>
>> On 06/28/12 07:58 AM, Vladislav Karnaukhov wrote:
>>> Hello Phil, Pavel,
>>>
>>> thank you for your comments. I've reworked fix and testcase, please 
>>> find new webrev here: 
>>> http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.03/
>>>
>>> Regards,
>>> - Vlad
>>>
>>> On 6/27/2012 10:13 PM, Phil Race wrote:
>>>> Well its not only unnecessary but is likely wrong .. I don't think
>>>> you looked at what mapNumericReference() does.
>>>>
>>>> -phil.
>>>>
>>>> On 6/27/2012 11:03 AM, Vladislav Karnaukhov wrote:
>>>>> Hello Phil,
>>>>>
>>>>> I used Character.toChars() in both branches because I wanted to 
>>>>> delegate code point conversion to char or surrogate pair entirely 
>>>>> to Character class...
>>>>>
>>>>> Regards,
>>>>> - Vlad
>>>>>
>>>>> On 6/26/2012 9:49 PM, Phil Race wrote:
>>>>>> I don't understand why you call Character.toChars() if you've 
>>>>>> just determined
>>>>>> you don't need to ?
>>>>>>
>>>>>> ie what was wrong with
>>>>>>
>>>>>> data = ( n >>> 16 == 0) ? {mapNumericReference((char) n)} : 
>>>>>> Character.toChars(n);
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> In the case of an invalid supplementary pair, maybe it would be 
>>>>>> safer to  return { ' ' } ?
>>>>>>
>>>>>>
>>>>>> One thing I see in the parsing code that is not new or changed 
>>>>>> here, that
>>>>>> may bear examination, is that there's a loop that keeps on 
>>>>>> reading so long
>>>>>> so long there are new digits. I am not sure its wise to keep 
>>>>>> going once
>>>>>> you overflow.
>>>>>>
>>>>>> -phil.
>>>>>>
>>>>>> On 6/26/2012 12:37 AM, Vladislav Karnaukhov wrote:
>>>>>>> Hello Pavel,
>>>>>>>
>>>>>>> I can provide you with the link to 6u19, but this is direct 
>>>>>>> forward-port and no code changes were made.
>>>>>>>
>>>>>>> I'll make changes as you've pointed out in 1) and 2)
>>>>>>>
>>>>>>> About 3) - is it a requirement to use "? :" operator? I 
>>>>>>> personally prefer single-line if-else, but I don't want to argue 
>>>>>>> over code style, and surely I'll follow code design practices.
>>>>>>>
>>>>>>> Regards,
>>>>>>> - Vlad
>>>>>>>
>>>>>>> On 6/25/2012 6:43 PM, Pavel Porvatov wrote:
>>>>>>>> Hi Vladislav,
>>>>>>>>
>>>>>>>> Do you have a link to the fix for 6u19?
>>>>>>>>
>>>>>>>> I didn't investigate the fix deeply, but
>>>>>>>>
>>>>>>>> 1.
>>>>>>>>  private final int MAX_BMP_BOUND = 65535;
>>>>>>>> should be static (otherwise variable name should be in lower case)
>>>>>>>>
>>>>>>>> 2. Add a space in single line comments
>>>>>>>>
>>>>>>>> 3.
>>>>>>>> +                    char data[];
>>>>>>>> +                    if (n <= MAX_BMP_BOUND) {
>>>>>>>> +                        data = 
>>>>>>>> Character.toChars(mapNumericReference((char) n));
>>>>>>>> +                    } else {
>>>>>>>> +                        data = Character.toChars(n);
>>>>>>>> +                    }
>>>>>>>> +
>>>>>>>>                  return data;
>>>>>>>>
>>>>>>>> can be written in one line via "? :" operator and looks more 
>>>>>>>> readable for me
>>>>>>>>
>>>>>>>> Thanks, Pavel
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> please review the fix for 6836089: Swing HTML parser can't 
>>>>>>>>> properly decode codepoints outside the Unicode Plane 0 into a 
>>>>>>>>> surrogate pair. This is a forward port from JDK6 (fixed 
>>>>>>>>> escalated issue, fix integrated) to JDK7.
>>>>>>>>>
>>>>>>>>> The issue is a defect in Swing HTML Parser: if the codepoint 
>>>>>>>>> is outside BMP (Unicode Plain 0), Parser incorrectly decodes 
>>>>>>>>> codepoint into surrogate pair. The fix is to use 
>>>>>>>>> Character.toChars() method if codepoint value is greater than 
>>>>>>>>> upper bound of BMP.
>>>>>>>>>
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.00/
>>>>>>>>> Bug description: 
>>>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6836089
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> - Vlad
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>





More information about the swing-dev mailing list