Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint
Xueming Shen
Xueming.Shen at Sun.COM
Thu Mar 11 21:24:44 UTC 2010
Martin, Ulf
Following bug/rfs have been filed.
6934265 Add public method Character.isBMPCodePoint
6934268 Better implementation of Character.isValidCodePoint and
isSupplementaryCodePoint()
6934270: Remove javac warnings from Character.java
6934271: Better handling of longer utf-8 sequences
Masayoshi, Alan would you please help review the corresponding CCC for
6934265 at
http://ccc.sfbay.sun.com/6934265
Martin, don't touch the utf-8 malformed issue for now, and incompatible
change in UTF-8
is A issue.
sherman
Martin Buchholz wrote:
> Ulf, your changes would be easier to get in
> if they were organized as mq patch files that
> could be qimported into an existing mq repo.
>
> I've done that below, which includes a subset of
> your own proposed changes:
>
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/
>
> Sherman (or Alan),
>
> please review and/or file bugs for the above changes.
>
> isBMPCodePoint is a spec addition, requiring additional paperwork.
>
> Sherman, you owe me a response to my now-moldy proposed changes to
> the UTF-8 charset.
>
> The only controversial change would be the change in behavior in
> malformed-utf8, which I can take out.
>
> Martin
>
> On Thu, Mar 11, 2010 at 10:32, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>
>> Sherman,
>>
>> I know, your time ...
>>
>> ... but maybe someone is needed for sponsor here:
>> https://bugs.openjdk.java.net/show_bug.cgi?id=100132
>>
>> Could you do this?
>>
>> Much thanks,
>>
>> -Ulf
>>
>>
>> Am 10.03.2010 19:23, schrieb Xueming Shen:
>>
>>> approved.
>>>
>>> I don't have a spare ws right now.so please just push, it's almost
>>> there:-)
>>>
>>> sherman
>>>
>>> Martin Buchholz wrote:
>>>
>>>> Here's the proposed fix for
>>>> 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int)
>>>>
>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/
>>>>
>>>> I changed the name to isBMPCodePoint in preparation for moving
>>>> it to Character.java.
>>>> (Sherman, perhaps you would like to take on that followon task?)
>>>>
>>>> Sherman, please approve.
>>>>
>>>> Martin
>>>>
>>>> On Sat, Mar 6, 2010 at 13:00, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>>>
>>>>> Very fast Sherman, much thanks.
>>>>>
>>>>> Could you set the bug to accepted and evaluated, so my patch will have a
>>>>> chance to get into the code base?
>>>>>
>>>>> -Ulf
>>>>>
>>>>>
>>>>> Am 03.03.2010 20:11, schrieb Xueming Shen:
>>>>>
>>>>>> #6931812
>>>>>>
>>>>>> Martin Buchholz wrote:
>>>>>>
>>>>>>> Sherman, would you like to file bugs for Ulf's improvements?
>>>>>>>
>>>>>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>>>>>>
>>>>>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz:
>>>>>>>>
>>>>>>>>> Keep in mind that supplementary characters are extremely rare.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes, but many API's in the JDK are used rarely.
>>>>>>>> Why should they waste memory footprint / perform bad, particularly if
>>>>>>>> it
>>>>>>>> doesn't cost anything.
>>>>>>>>
>>>>>>> I admire your perfectionism.
>>>>>>>
>>>>>>>
>>>>>>>>> Therefore the existing implementation
>>>>>>>>>
>>>>>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT
>>>>>>>>> && codePoint<= MAX_CODE_POINT;
>>>>>>>>>
>>>>>>>>> will almost always perform just one comparison against a constant,
>>>>>>>>> which is hard to beat.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> 1. Wondering: I think there are TWO comparisons.
>>>>>>>> 2. Those comparisons need to load 32 bit values from machine code,
>>>>>>>> against
>>>>>>>> only 8 bit values in my case.
>>>>>>>>
>>>>>>> It's a good point. In the machine code, shifts are likely to use
>>>>>>> immediate values, and so will be a small win.
>>>>>>>
>>>>>>> int x = codePoint >>> 16;
>>>>>>> return x != 0 && x < 0x11;
>>>>>>>
>>>>>>> (On modern hardware, these optimizations
>>>>>>> are less valuable than they used to be;
>>>>>>> ordinary integer arithmetic is almost free)
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>
>>
More information about the core-libs-dev
mailing list