<Swing Dev> <Swind Dev> [7] Review request for 6836089: Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair

Vladislav Karnaukhov Vladislav.Karnaukhov at oracle.com
Mon Aug 27 14:51:51 UTC 2012


Hello,

could you please review a new version?
Please find webrev here: 
http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.04/

Regards,
- Vlad

On 6/29/2012 2:09 AM, Phil Race wrote:
> That would work but I think its cleaner to just move it all into 
> mapNumericReference
> as below.
>
> -phil.
>
> diff --git 
> a/src/share/classes/javax/swing/text/html/parser/Parser.java 
> b/src/share/classes/javax/swing/text/html/parser/Parser.java
> --- a/src/share/classes/javax/swing/text/html/parser/Parser.java
> +++ b/src/share/classes/javax/swing/text/html/parser/Parser.java
> @@ -952,7 +952,7 @@
>                          ch = readCh();
>                          break;
>                  }
> -                char data[] = {mapNumericReference((char) n)};
> +                char data[] = mapNumericReference(n);
>                  return data;
>              }
>              addString('#');
> @@ -1021,7 +1021,7 @@
>      }
>
>      /**
> -     * Converts numeric character reference to Unicode character.
> +     * Converts numeric character reference to char array.
>       *
>       * Normally the code in a reference should be always converted
>       * to the Unicode character with the same code, but due to
> @@ -1030,13 +1030,21 @@
>       * to displayable characters with other codes.
>       *
>       * @param c the code of numeric character reference.
> -     * @return the character corresponding to the reference code.
> +     * @return a char array corresponding to the reference code.
>       */
> -    private char mapNumericReference(char c) {
> -        if (c < 130 || c > 159) {
> -            return c;
> +    private char[] mapNumericReference(int c) {
> +        char[] data;
> +        if (c >= 0xffff) { // outside unicode BMP.
> +            try {
> +                data = Character.toChars(c);
> +            } catch (IllegalArgumentException e) {
> +                data = new char[0];
>          }
> -        return cp1252Map[c - 130];
> +        } else {
> +            data = new char[1];
> +            data[0] = (c < 130 || c > 159) ? (char)c : cp1252Map[c - 
> 130];
> +        }
> +        return data;
>      }
>
>
> On 06/28/12 07:58 AM, Vladislav Karnaukhov wrote:
>> Hello Phil, Pavel,
>>
>> thank you for your comments. I've reworked fix and testcase, please 
>> find new webrev here: 
>> http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.03/
>>
>> Regards,
>> - Vlad
>>
>> On 6/27/2012 10:13 PM, Phil Race wrote:
>>> Well its not only unnecessary but is likely wrong .. I don't think
>>> you looked at what mapNumericReference() does.
>>>
>>> -phil.
>>>
>>> On 6/27/2012 11:03 AM, Vladislav Karnaukhov wrote:
>>>> Hello Phil,
>>>>
>>>> I used Character.toChars() in both branches because I wanted to 
>>>> delegate code point conversion to char or surrogate pair entirely 
>>>> to Character class...
>>>>
>>>> Regards,
>>>> - Vlad
>>>>
>>>> On 6/26/2012 9:49 PM, Phil Race wrote:
>>>>> I don't understand why you call Character.toChars() if you've just 
>>>>> determined
>>>>> you don't need to ?
>>>>>
>>>>> ie what was wrong with
>>>>>
>>>>> data = ( n >>> 16 == 0) ? {mapNumericReference((char) n)} : 
>>>>> Character.toChars(n);
>>>>>
>>>>> ?
>>>>>
>>>>> In the case of an invalid supplementary pair, maybe it would be 
>>>>> safer to  return { ' ' } ?
>>>>>
>>>>>
>>>>> One thing I see in the parsing code that is not new or changed 
>>>>> here, that
>>>>> may bear examination, is that there's a loop that keeps on reading 
>>>>> so long
>>>>> so long there are new digits. I am not sure its wise to keep going 
>>>>> once
>>>>> you overflow.
>>>>>
>>>>> -phil.
>>>>>
>>>>> On 6/26/2012 12:37 AM, Vladislav Karnaukhov wrote:
>>>>>> Hello Pavel,
>>>>>>
>>>>>> I can provide you with the link to 6u19, but this is direct 
>>>>>> forward-port and no code changes were made.
>>>>>>
>>>>>> I'll make changes as you've pointed out in 1) and 2)
>>>>>>
>>>>>> About 3) - is it a requirement to use "? :" operator? I 
>>>>>> personally prefer single-line if-else, but I don't want to argue 
>>>>>> over code style, and surely I'll follow code design practices.
>>>>>>
>>>>>> Regards,
>>>>>> - Vlad
>>>>>>
>>>>>> On 6/25/2012 6:43 PM, Pavel Porvatov wrote:
>>>>>>> Hi Vladislav,
>>>>>>>
>>>>>>> Do you have a link to the fix for 6u19?
>>>>>>>
>>>>>>> I didn't investigate the fix deeply, but
>>>>>>>
>>>>>>> 1.
>>>>>>>  private final int MAX_BMP_BOUND = 65535;
>>>>>>> should be static (otherwise variable name should be in lower case)
>>>>>>>
>>>>>>> 2. Add a space in single line comments
>>>>>>>
>>>>>>> 3.
>>>>>>> +                    char data[];
>>>>>>> +                    if (n <= MAX_BMP_BOUND) {
>>>>>>> +                        data = 
>>>>>>> Character.toChars(mapNumericReference((char) n));
>>>>>>> +                    } else {
>>>>>>> +                        data = Character.toChars(n);
>>>>>>> +                    }
>>>>>>> +
>>>>>>>                  return data;
>>>>>>>
>>>>>>> can be written in one line via "? :" operator and looks more 
>>>>>>> readable for me
>>>>>>>
>>>>>>> Thanks, Pavel
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> please review the fix for 6836089: Swing HTML parser can't 
>>>>>>>> properly decode codepoints outside the Unicode Plane 0 into a 
>>>>>>>> surrogate pair. This is a forward port from JDK6 (fixed 
>>>>>>>> escalated issue, fix integrated) to JDK7.
>>>>>>>>
>>>>>>>> The issue is a defect in Swing HTML Parser: if the codepoint is 
>>>>>>>> outside BMP (Unicode Plain 0), Parser incorrectly decodes 
>>>>>>>> codepoint into surrogate pair. The fix is to use 
>>>>>>>> Character.toChars() method if codepoint value is greater than 
>>>>>>>> upper bound of BMP.
>>>>>>>>
>>>>>>>> Webrev: http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.00/
>>>>>>>> Bug description: 
>>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6836089
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> - Vlad
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>





More information about the swing-dev mailing list