<Swing Dev> <Swind Dev> [7u6] Review request for 6836089: Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair

Phil Race philip.race at oracle.com
Thu Jun 28 22:09:36 UTC 2012


That would work but I think its cleaner to just move it all into 
mapNumericReference
as below.

-phil.

diff --git a/src/share/classes/javax/swing/text/html/parser/Parser.java 
b/src/share/classes/javax/swing/text/html/parser/Parser.java
--- a/src/share/classes/javax/swing/text/html/parser/Parser.java
+++ b/src/share/classes/javax/swing/text/html/parser/Parser.java
@@ -952,7 +952,7 @@
                          ch = readCh();
                          break;
                  }
-                char data[] = {mapNumericReference((char) n)};
+                char data[] = mapNumericReference(n);
                  return data;
              }
              addString('#');
@@ -1021,7 +1021,7 @@
      }

      /**
-     * Converts numeric character reference to Unicode character.
+     * Converts numeric character reference to char array.
       *
       * Normally the code in a reference should be always converted
       * to the Unicode character with the same code, but due to
@@ -1030,13 +1030,21 @@
       * to displayable characters with other codes.
       *
       * @param c the code of numeric character reference.
-     * @return the character corresponding to the reference code.
+     * @return a char array corresponding to the reference code.
       */
-    private char mapNumericReference(char c) {
-        if (c < 130 || c > 159) {
-            return c;
+    private char[] mapNumericReference(int c) {
+        char[] data;
+        if (c >= 0xffff) { // outside unicode BMP.
+            try {
+                data = Character.toChars(c);
+            } catch (IllegalArgumentException e) {
+                data = new char[0];
          }
-        return cp1252Map[c - 130];
+        } else {
+            data = new char[1];
+            data[0] = (c < 130 || c > 159) ? (char)c : cp1252Map[c - 130];
+        }
+        return data;
      }


On 06/28/12 07:58 AM, Vladislav Karnaukhov wrote:
> Hello Phil, Pavel,
>
> thank you for your comments. I've reworked fix and testcase, please 
> find new webrev here: 
> http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.03/
>
> Regards,
> - Vlad
>
> On 6/27/2012 10:13 PM, Phil Race wrote:
>> Well its not only unnecessary but is likely wrong .. I don't think
>> you looked at what mapNumericReference() does.
>>
>> -phil.
>>
>> On 6/27/2012 11:03 AM, Vladislav Karnaukhov wrote:
>>> Hello Phil,
>>>
>>> I used Character.toChars() in both branches because I wanted to 
>>> delegate code point conversion to char or surrogate pair entirely to 
>>> Character class...
>>>
>>> Regards,
>>> - Vlad
>>>
>>> On 6/26/2012 9:49 PM, Phil Race wrote:
>>>> I don't understand why you call Character.toChars() if you've just 
>>>> determined
>>>> you don't need to ?
>>>>
>>>> ie what was wrong with
>>>>
>>>> data = ( n >>> 16 == 0) ? {mapNumericReference((char) n)} : 
>>>> Character.toChars(n);
>>>>
>>>> ?
>>>>
>>>> In the case of an invalid supplementary pair, maybe it would be 
>>>> safer to  return { ' ' } ?
>>>>
>>>>
>>>> One thing I see in the parsing code that is not new or changed 
>>>> here, that
>>>> may bear examination, is that there's a loop that keeps on reading 
>>>> so long
>>>> so long there are new digits. I am not sure its wise to keep going 
>>>> once
>>>> you overflow.
>>>>
>>>> -phil.
>>>>
>>>> On 6/26/2012 12:37 AM, Vladislav Karnaukhov wrote:
>>>>> Hello Pavel,
>>>>>
>>>>> I can provide you with the link to 6u19, but this is direct 
>>>>> forward-port and no code changes were made.
>>>>>
>>>>> I'll make changes as you've pointed out in 1) and 2)
>>>>>
>>>>> About 3) - is it a requirement to use "? :" operator? I personally 
>>>>> prefer single-line if-else, but I don't want to argue over code 
>>>>> style, and surely I'll follow code design practices.
>>>>>
>>>>> Regards,
>>>>> - Vlad
>>>>>
>>>>> On 6/25/2012 6:43 PM, Pavel Porvatov wrote:
>>>>>> Hi Vladislav,
>>>>>>
>>>>>> Do you have a link to the fix for 6u19?
>>>>>>
>>>>>> I didn't investigate the fix deeply, but
>>>>>>
>>>>>> 1.
>>>>>>  private final int MAX_BMP_BOUND = 65535;
>>>>>> should be static (otherwise variable name should be in lower case)
>>>>>>
>>>>>> 2. Add a space in single line comments
>>>>>>
>>>>>> 3.
>>>>>> +                    char data[];
>>>>>> +                    if (n <= MAX_BMP_BOUND) {
>>>>>> +                        data = 
>>>>>> Character.toChars(mapNumericReference((char) n));
>>>>>> +                    } else {
>>>>>> +                        data = Character.toChars(n);
>>>>>> +                    }
>>>>>> +
>>>>>>                  return data;
>>>>>>
>>>>>> can be written in one line via "? :" operator and looks more 
>>>>>> readable for me
>>>>>>
>>>>>> Thanks, Pavel
>>>>>>> Hello,
>>>>>>>
>>>>>>> please review the fix for 6836089: Swing HTML parser can't 
>>>>>>> properly decode codepoints outside the Unicode Plane 0 into a 
>>>>>>> surrogate pair. This is a forward port from JDK6 (fixed 
>>>>>>> escalated issue, fix integrated) to JDK7.
>>>>>>>
>>>>>>> The issue is a defect in Swing HTML Parser: if the codepoint is 
>>>>>>> outside BMP (Unicode Plain 0), Parser incorrectly decodes 
>>>>>>> codepoint into surrogate pair. The fix is to use 
>>>>>>> Character.toChars() method if codepoint value is greater than 
>>>>>>> upper bound of BMP.
>>>>>>>
>>>>>>> Webrev: http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.00/
>>>>>>> Bug description: 
>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6836089
>>>>>>>
>>>>>>> Regards,
>>>>>>> - Vlad
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>




More information about the swing-dev mailing list