Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint

Xueming Shen Xueming.Shen at Sun.COM
Tue Mar 16 06:26:42 UTC 2010


CR 6935172 Created, P4 java/classes_io Optimize bit-twiddling in Bits.java

Can I assume the webrev is

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Bits.java

-Sherman

Martin Buchholz wrote:
> OK, next round of review.
>
> I changed my UTF-8 changes to be behavior-preserving,
> removing any hint of controversy, and renamed the patch
> to "utf8-twiddling".
>
> I got Ulf in my head, and can't stop micro-optimizing.
> I added a new micro-optimizing patch for Bits.java.
> Please file a bug.
>
> 6934268: Better implementation of Character.isValidCodePoint and
> isSupplementaryCodePoint()
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint
> 6934265: Add public method Character.isBMPCodePoint
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint
> 6934270: Remove javac warnings from Character.java
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings
> 6934271: Better handling of longer utf-8 sequences
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling
> 6666666: Optimize bit-twiddling in Bits.java
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/qtip tip Bits.java
>
> Now I need to go off to my micro-optimizers-anonymous meeting.
>
> Martin
>
> On Thu, Mar 11, 2010 at 13:24, Xueming Shen <Xueming.Shen at sun.com> wrote:
>   
>> Martin, Ulf
>>
>> Following bug/rfs have been filed.
>>
>> 6934265 Add public method Character.isBMPCodePoint
>> 6934268 Better implementation of Character.isValidCodePoint and
>> isSupplementaryCodePoint()
>> 6934270: Remove javac warnings from Character.java
>> 6934271: Better handling of longer utf-8 sequences
>>
>> Masayoshi, Alan would you please help review the corresponding CCC for
>> 6934265 at
>> http://ccc.sfbay.sun.com/6934265
>>
>> Martin, don't touch the utf-8 malformed issue for now, and incompatible
>> change in UTF-8
>> is A issue.
>>
>> sherman
>>
>> Martin Buchholz wrote:
>>     
>>> Ulf, your changes would be easier to get in
>>> if they were organized as mq patch files that
>>> could be qimported into an existing mq repo.
>>>
>>> I've done that below, which includes a subset of
>>> your own proposed changes:
>>>
>>>
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/
>>>
>>> Sherman (or Alan),
>>>
>>> please review and/or file bugs for the above changes.
>>>
>>> isBMPCodePoint is a spec addition, requiring additional paperwork.
>>>
>>> Sherman, you owe me a response to my now-moldy proposed changes to
>>> the UTF-8 charset.
>>>
>>> The only controversial change would be the change in behavior in
>>> malformed-utf8, which I can take out.
>>>
>>> Martin
>>>
>>> On Thu, Mar 11, 2010 at 10:32, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>>
>>>       
>>>> Sherman,
>>>>
>>>> I know, your time ...
>>>>
>>>> ... but maybe someone is needed for sponsor here:
>>>> https://bugs.openjdk.java.net/show_bug.cgi?id=100132
>>>>
>>>> Could you do this?
>>>>
>>>> Much thanks,
>>>>
>>>> -Ulf
>>>>
>>>>
>>>> Am 10.03.2010 19:23, schrieb Xueming Shen:
>>>>
>>>>         
>>>>> approved.
>>>>>
>>>>> I don't have a spare ws right now.so please just push, it's almost
>>>>> there:-)
>>>>>
>>>>> sherman
>>>>>
>>>>> Martin Buchholz wrote:
>>>>>
>>>>>           
>>>>>> Here's the proposed fix for
>>>>>> 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int)
>>>>>>
>>>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/
>>>>>>
>>>>>> I changed the name to isBMPCodePoint in preparation for moving
>>>>>> it to Character.java.
>>>>>> (Sherman, perhaps you would like to take on that followon task?)
>>>>>>
>>>>>> Sherman, please approve.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> On Sat, Mar 6, 2010 at 13:00, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>>>>>
>>>>>>             
>>>>>>> Very fast Sherman, much thanks.
>>>>>>>
>>>>>>> Could you set the bug to accepted and evaluated, so my patch will have
>>>>>>> a
>>>>>>> chance to get into the code base?
>>>>>>>
>>>>>>> -Ulf
>>>>>>>
>>>>>>>
>>>>>>> Am 03.03.2010 20:11, schrieb Xueming Shen:
>>>>>>>
>>>>>>>               
>>>>>>>> #6931812
>>>>>>>>
>>>>>>>> Martin Buchholz wrote:
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Sherman, would you like to file bugs for Ulf's improvements?
>>>>>>>>>
>>>>>>>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz:
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> Keep in mind that supplementary characters are extremely rare.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> Yes, but many API's in the JDK are used rarely.
>>>>>>>>>> Why should they waste memory footprint / perform bad, particularly
>>>>>>>>>> if
>>>>>>>>>> it
>>>>>>>>>> doesn't cost anything.
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> I admire your perfectionism.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>>> Therefore the existing implementation
>>>>>>>>>>>
>>>>>>>>>>>  return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT
>>>>>>>>>>> &&  codePoint<= MAX_CODE_POINT;
>>>>>>>>>>>
>>>>>>>>>>> will almost always perform just one comparison against a constant,
>>>>>>>>>>> which is hard to beat.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> 1. Wondering: I think there are TWO comparisons.
>>>>>>>>>> 2. Those comparisons need to load 32 bit values from machine code,
>>>>>>>>>> against
>>>>>>>>>> only 8 bit values in my case.
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> It's a good point.  In the machine code, shifts are likely to use
>>>>>>>>> immediate values, and so will be a small win.
>>>>>>>>>
>>>>>>>>> int x = codePoint >>> 16;
>>>>>>>>> return x != 0 && x < 0x11;
>>>>>>>>>
>>>>>>>>> (On modern hardware, these optimizations
>>>>>>>>> are less valuable than they used to be;
>>>>>>>>> ordinary integer arithmetic is almost free)
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>>                   
>>>>>           
>>>>         
>>     




More information about the core-libs-dev mailing list