Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint

Martin Buchholz martinrb at google.com
Thu Mar 11 19:38:54 UTC 2010


Ulf, your changes would be easier to get in
if they were organized as mq patch files that
could be qimported into an existing mq repo.

I've done that below, which includes a subset of
your own proposed changes:

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/

Sherman (or Alan),

please review and/or file bugs for the above changes.

isBMPCodePoint is a spec addition, requiring additional paperwork.

Sherman, you owe me a response to my now-moldy proposed changes to
the UTF-8 charset.

The only controversial change would be the change in behavior in
malformed-utf8, which I can take out.

Martin

On Thu, Mar 11, 2010 at 10:32, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
> Sherman,
>
> I know, your time ...
>
> ... but maybe someone is needed for sponsor here:
> https://bugs.openjdk.java.net/show_bug.cgi?id=100132
>
> Could you do this?
>
> Much thanks,
>
> -Ulf
>
>
> Am 10.03.2010 19:23, schrieb Xueming Shen:
>>
>> approved.
>>
>> I don't have a spare ws right now.so please just push, it's almost
>> there:-)
>>
>> sherman
>>
>> Martin Buchholz wrote:
>>>
>>> Here's the proposed fix for
>>> 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int)
>>>
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/
>>>
>>> I changed the name to isBMPCodePoint in preparation for moving
>>> it to Character.java.
>>> (Sherman, perhaps you would like to take on that followon task?)
>>>
>>> Sherman, please approve.
>>>
>>> Martin
>>>
>>> On Sat, Mar 6, 2010 at 13:00, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>>>
>>>> Very fast Sherman, much thanks.
>>>>
>>>> Could you set the bug to accepted and evaluated, so my patch will have a
>>>> chance to get into the code base?
>>>>
>>>> -Ulf
>>>>
>>>>
>>>> Am 03.03.2010 20:11, schrieb Xueming Shen:
>>>>>
>>>>> #6931812
>>>>>
>>>>> Martin Buchholz wrote:
>>>>>>
>>>>>> Sherman, would you like to file bugs for Ulf's improvements?
>>>>>>
>>>>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>>>>>>
>>>>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz:
>>>>>>>>
>>>>>>>> Keep in mind that supplementary characters are extremely rare.
>>>>>>>>
>>>>>>> Yes, but many API's in the JDK are used rarely.
>>>>>>> Why should they waste memory footprint / perform bad, particularly if
>>>>>>> it
>>>>>>> doesn't cost anything.
>>>>>>
>>>>>> I admire your perfectionism.
>>>>>>
>>>>>>>> Therefore the existing implementation
>>>>>>>>
>>>>>>>>  return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT
>>>>>>>> &&  codePoint<= MAX_CODE_POINT;
>>>>>>>>
>>>>>>>> will almost always perform just one comparison against a constant,
>>>>>>>> which is hard to beat.
>>>>>>>>
>>>>>>> 1. Wondering: I think there are TWO comparisons.
>>>>>>> 2. Those comparisons need to load 32 bit values from machine code,
>>>>>>> against
>>>>>>> only 8 bit values in my case.
>>>>>>
>>>>>> It's a good point.  In the machine code, shifts are likely to use
>>>>>> immediate values, and so will be a small win.
>>>>>>
>>>>>> int x = codePoint >>> 16;
>>>>>> return x != 0 && x < 0x11;
>>>>>>
>>>>>> (On modern hardware, these optimizations
>>>>>> are less valuable than they used to be;
>>>>>> ordinary integer arithmetic is almost free)
>>>>>>
>>>>>> Martin
>>
>>
>
>



More information about the core-libs-dev mailing list