review request for 6798511/6860431: Include functionality of Surrogate in Character

Ulf Zibis Ulf.Zibis at gmx.de
Mon Mar 22 13:57:46 UTC 2010


Am 21.03.2010 17:23, schrieb Martin Buchholz:
> On Sun, Mar 21, 2010 at 04:28, Ulf Zibis<Ulf.Zibis at gmx.de>  wrote:
>    
>> On Sat, Mar 20, 2010 at 17:13, Ulf Zibis<Ulf.Zibis at gmx.de>  wrote:
>>      
>    
>> I don't think it's a performance problem in the real world.
>>
>>
>> Hm, if someone uses:
>>       if (Character.isBMPCodePoint(codePoint))
>>           ...;
>>       else if (Character.isSupplementaryCodePoint(codePoint)) // instead
>> isValidCodepoint()
>>           ...;
>>       else
>>           ...;
>> he will loose up to 50 % performance as you can see on my benchmark on
>> isSuppCPAlaMartin().
>>      
> Only if their data is full of supplementary characters.
>    

Yes, but we dont't know anything about the purpose of code written there 
in the world, so why not provide best performance or at least give a 
hint in the docs, if it doesn't cost anything.

>    
>> We don't usually put such performance information in the javadoc.
>>
>>
>> In class StringBuilder:
>> "Where possible, it is recommended that this class be used in preference to
>> StringBuffer as it will be faster under most implementations."
>>
>> java.util.List:
>> Note that these operations may execute in time proportional to the index
>> value for some implementations (the LinkedList class, for example).
>>
>> ByteBuffer#get(byte[],int,int):
>> In other words, an invocation of this method of the form
>> src.get(dst, off, len) has exactly the same effect as the loop
>>
>>       for (int i = off; i<  off + len; i++)
>>           dst[i] = src.get();
>>
>> except that it first checks that there are sufficient bytes in this buffer
>> and it is potentially much more efficient.
>>      
> In the above, the performance is a Raison d'être of the API,
> that real users should consider when choosing API.
>    

Oh, on parle français. Je l'aime beaucoup.

>    
>> Anyway, even if isSupplementaryCodePoint() is used isolated, my code will
>> help JIT to use 2-byte shifted adressing and shorter 2-byte immediate value
>> for the compare, but yes, JIT should be able to catch that without this
>> help. But for that case, we could stay on the old implementations too for
>> isBMPCodePoint and is ValidCodePoint.
>>      
> Again, performance with BMP characters is infinitely more important
> than performance with supplementary characters.
>    

You are right. But I can't see any reason, why the fast supplementary 
version would harm the BMP performance.

-Ulf





More information about the core-libs-dev mailing list