Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint
Martin Buchholz
martinrb at google.com
Fri Mar 12 23:29:53 UTC 2010
OK, next round of review.
I changed my UTF-8 changes to be behavior-preserving,
removing any hint of controversy, and renamed the patch
to "utf8-twiddling".
I got Ulf in my head, and can't stop micro-optimizing.
I added a new micro-optimizing patch for Bits.java.
Please file a bug.
6934268: Better implementation of Character.isValidCodePoint and
isSupplementaryCodePoint()
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint
6934265: Add public method Character.isBMPCodePoint
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint
6934270: Remove javac warnings from Character.java
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings
6934271: Better handling of longer utf-8 sequences
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling
6666666: Optimize bit-twiddling in Bits.java
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/qtip tip Bits.java
Now I need to go off to my micro-optimizers-anonymous meeting.
Martin
On Thu, Mar 11, 2010 at 13:24, Xueming Shen <Xueming.Shen at sun.com> wrote:
> Martin, Ulf
>
> Following bug/rfs have been filed.
>
> 6934265 Add public method Character.isBMPCodePoint
> 6934268 Better implementation of Character.isValidCodePoint and
> isSupplementaryCodePoint()
> 6934270: Remove javac warnings from Character.java
> 6934271: Better handling of longer utf-8 sequences
>
> Masayoshi, Alan would you please help review the corresponding CCC for
> 6934265 at
> http://ccc.sfbay.sun.com/6934265
>
> Martin, don't touch the utf-8 malformed issue for now, and incompatible
> change in UTF-8
> is A issue.
>
> sherman
>
> Martin Buchholz wrote:
>>
>> Ulf, your changes would be easier to get in
>> if they were organized as mq patch files that
>> could be qimported into an existing mq repo.
>>
>> I've done that below, which includes a subset of
>> your own proposed changes:
>>
>>
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/
>>
>> Sherman (or Alan),
>>
>> please review and/or file bugs for the above changes.
>>
>> isBMPCodePoint is a spec addition, requiring additional paperwork.
>>
>> Sherman, you owe me a response to my now-moldy proposed changes to
>> the UTF-8 charset.
>>
>> The only controversial change would be the change in behavior in
>> malformed-utf8, which I can take out.
>>
>> Martin
>>
>> On Thu, Mar 11, 2010 at 10:32, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>
>>>
>>> Sherman,
>>>
>>> I know, your time ...
>>>
>>> ... but maybe someone is needed for sponsor here:
>>> https://bugs.openjdk.java.net/show_bug.cgi?id=100132
>>>
>>> Could you do this?
>>>
>>> Much thanks,
>>>
>>> -Ulf
>>>
>>>
>>> Am 10.03.2010 19:23, schrieb Xueming Shen:
>>>
>>>>
>>>> approved.
>>>>
>>>> I don't have a spare ws right now.so please just push, it's almost
>>>> there:-)
>>>>
>>>> sherman
>>>>
>>>> Martin Buchholz wrote:
>>>>
>>>>>
>>>>> Here's the proposed fix for
>>>>> 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int)
>>>>>
>>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/
>>>>>
>>>>> I changed the name to isBMPCodePoint in preparation for moving
>>>>> it to Character.java.
>>>>> (Sherman, perhaps you would like to take on that followon task?)
>>>>>
>>>>> Sherman, please approve.
>>>>>
>>>>> Martin
>>>>>
>>>>> On Sat, Mar 6, 2010 at 13:00, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>>>>
>>>>>>
>>>>>> Very fast Sherman, much thanks.
>>>>>>
>>>>>> Could you set the bug to accepted and evaluated, so my patch will have
>>>>>> a
>>>>>> chance to get into the code base?
>>>>>>
>>>>>> -Ulf
>>>>>>
>>>>>>
>>>>>> Am 03.03.2010 20:11, schrieb Xueming Shen:
>>>>>>
>>>>>>>
>>>>>>> #6931812
>>>>>>>
>>>>>>> Martin Buchholz wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Sherman, would you like to file bugs for Ulf's improvements?
>>>>>>>>
>>>>>>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Keep in mind that supplementary characters are extremely rare.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, but many API's in the JDK are used rarely.
>>>>>>>>> Why should they waste memory footprint / perform bad, particularly
>>>>>>>>> if
>>>>>>>>> it
>>>>>>>>> doesn't cost anything.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I admire your perfectionism.
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Therefore the existing implementation
>>>>>>>>>>
>>>>>>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT
>>>>>>>>>> && codePoint<= MAX_CODE_POINT;
>>>>>>>>>>
>>>>>>>>>> will almost always perform just one comparison against a constant,
>>>>>>>>>> which is hard to beat.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 1. Wondering: I think there are TWO comparisons.
>>>>>>>>> 2. Those comparisons need to load 32 bit values from machine code,
>>>>>>>>> against
>>>>>>>>> only 8 bit values in my case.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It's a good point. In the machine code, shifts are likely to use
>>>>>>>> immediate values, and so will be a small win.
>>>>>>>>
>>>>>>>> int x = codePoint >>> 16;
>>>>>>>> return x != 0 && x < 0x11;
>>>>>>>>
>>>>>>>> (On modern hardware, these optimizations
>>>>>>>> are less valuable than they used to be;
>>>>>>>> ordinary integer arithmetic is almost free)
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>
>>>>
>>>
>>>
>
>
More information about the core-libs-dev
mailing list