review request for 6798511/6860431: Include functionality of Surrogate in Character

Xueming Shen Xueming.Shen at Sun.COM
Fri Mar 19 19:56:46 UTC 2010


Martin Buchholz wrote:
> I renamed my patch file from isSupplementaryCodePoint to isValidCodePoint.
>
> 6934268: Better implementation of Character.isValidCodePoint
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isValidCodePoint
>   

It's fine. But if I was you I would not "optimize" it.

> 6934265: Add public method Character.isBMPCodePoint
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint
>   
Looks fine. I will let you know when the CCC is approved.

> 6934270: Remove javac warnings from Character.java
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings
>   
Looks fine.

> 6934271: Better handling of longer utf-8 sequences
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling
>   
Looks good, though the code style looks really really...strange:-)


> 6935172: Optimize bit-twiddling in Bits.java
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Bits.java
>   
Looks fine. I was surprised the javac compiler really generates the code 
for expr + 0 and  expr << 0.
I kinda remember the gcc compiler cat optimize this kind situation to 
just expr (If my memory is correct,
or maybe that was kinda of optimization I was planning to do in one of 
my projects :-) ).

-Sherman


> Martin
>
> On Tue, Mar 16, 2010 at 15:35, Xueming Shen <Xueming.Shen at sun.com> wrote:
>   
>> Martin Buchholz wrote:
>>     
>>> On Tue, Mar 16, 2010 at 13:06, Xueming Shen <Xueming.Shen at sun.com> wrote:
>>>
>>>       
>>>> Martin Buchholz wrote:
>>>>
>>>>         
>>>>> Therefore the existing implementation
>>>>>
>>>>>           
>>>>>>>  return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT
>>>>>>>           &&  codePoint<= MAX_CODE_POINT;
>>>>>>>
>>>>>>> will almost always perform just one comparison against a constant,
>>>>>>> which is hard to beat.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> 1. Wondering: I think there are TWO comparisons.
>>>>>> 2. Those comparisons need to load 32 bit values from machine code,
>>>>>> against
>>>>>> only 8 bit values in my case.
>>>>>>
>>>>>>
>>>>>>             
>>>>> It's a good point.  In the machine code, shifts are likely to use
>>>>> immediate values, and so will be a small win.
>>>>>
>>>>> int x = codePoint >>> 16;
>>>>> return x != 0 && x < 0x11;
>>>>>
>>>>> (On modern hardware, these optimizations
>>>>> are less valuable than they used to be;
>>>>> ordinary integer arithmetic is almost free)
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> I'm not convinced if the proposed code is really better...a "small win".
>>>>
>>>>         
>>> The primary theory here is that branches are expensive,
>>> and we are reducing them by one.
>>>
>>>
>>>       
>> There are still two branches in new impl, if you count the "ifeq" and
>> "if_icmpge"(?)
>>
>> We are trying to "optimize" this piece of code with the assumption that the
>> new impl MIGHT help certain vm (hotspot?)
>> to optimize certain use scenario (some consecutive usages), if the compiler
>> and/or the vm are both smart enough at certain
>> point, with no supporting benchmark data?
>>
>> My concern is that the reality might be that this optimization might even
>> hurt the BMP use
>> case (the majority of the possible real world use scenarios) with a 10%
>> bigger bytecode size.
>>
>> -Sherman
>>
>>
>>
>> public class Character extends java.lang.Object {
>>  public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536;
>>
>>  public static final int MAX_CODE_POINT = 1114111;
>>
>>  public Character();
>>   Code:
>>      0: aload_0             1: invokespecial #1                  // Method
>> java/lang/Object."<init>":()V
>>      4: return
>>  public static boolean isSupplementaryCodePoint(int);
>>   Code:
>>      0: iload_0             1: ldc           #2                  // int
>> 65536
>>      3: if_icmplt     16
>>      6: iload_0             7: ldc           #3                  // int
>> 1114111
>>      9: if_icmpgt     16
>>     12: iconst_1           13: goto          17
>>     16: iconst_0           17: ireturn
>>  public static boolean isSupplementaryCodePoint_new(int);
>>   Code:
>>      0: iload_0             1: bipush        16
>>      3: iushr               4: istore_1
>>      5: iload_1             6: ifeq          19
>>      9: iload_1            10: bipush        17
>>     12: if_icmpge     19
>>     15: iconst_1           16: goto          20
>>     19: iconst_0           20: ireturn       }
>>
>>
>>
>>
>>
>>
>>     




More information about the core-libs-dev mailing list