review request for 6798511/6860431: Include functionality of Surrogate in Character
Rémi Forax
forax at univ-mlv.fr
Mon Mar 22 14:33:30 UTC 2010
Le 22/03/2010 14:57, Ulf Zibis a écrit :
> Am 21.03.2010 17:23, schrieb Martin Buchholz:
>> On Sun, Mar 21, 2010 at 04:28, Ulf Zibis<Ulf.Zibis at gmx.de> wrote:
>>> On Sat, Mar 20, 2010 at 17:13, Ulf Zibis<Ulf.Zibis at gmx.de> wrote:
>>> I don't think it's a performance problem in the real world.
>>>
>>>
>>> Hm, if someone uses:
>>> if (Character.isBMPCodePoint(codePoint))
>>> ...;
>>> else if (Character.isSupplementaryCodePoint(codePoint)) //
>>> instead
>>> isValidCodepoint()
>>> ...;
>>> else
>>> ...;
>>> he will loose up to 50 % performance as you can see on my benchmark on
>>> isSuppCPAlaMartin().
>> Only if their data is full of supplementary characters.
>
> Yes, but we dont't know anything about the purpose of code written
> there in the world, so why not provide best performance or at least
> give a hint in the docs, if it doesn't cost anything.
>
>>> We don't usually put such performance information in the javadoc.
>>>
>>>
>>> In class StringBuilder:
>>> "Where possible, it is recommended that this class be used in
>>> preference to
>>> StringBuffer as it will be faster under most implementations."
>>>
>>> java.util.List:
>>> Note that these operations may execute in time proportional to the
>>> index
>>> value for some implementations (the LinkedList class, for example).
>>>
>>> ByteBuffer#get(byte[],int,int):
>>> In other words, an invocation of this method of the form
>>> src.get(dst, off, len) has exactly the same effect as the loop
>>>
>>> for (int i = off; i< off + len; i++)
>>> dst[i] = src.get();
>>>
>>> except that it first checks that there are sufficient bytes in this
>>> buffer
>>> and it is potentially much more efficient.
>> In the above, the performance is a Raison d'être of the API,
>> that real users should consider when choosing API.
>
> Oh, on parle français. Je l'aime beaucoup.
Totally off topic but
You mean: "j'aime beaucoup".
je l'aime beaucoup means I love him/her a lot.
>
>>> Anyway, even if isSupplementaryCodePoint() is used isolated, my code
>>> will
>>> help JIT to use 2-byte shifted adressing and shorter 2-byte
>>> immediate value
>>> for the compare, but yes, JIT should be able to catch that without this
>>> help. But for that case, we could stay on the old implementations
>>> too for
>>> isBMPCodePoint and is ValidCodePoint.
>> Again, performance with BMP characters is infinitely more important
>> than performance with supplementary characters.
>
> You are right. But I can't see any reason, why the fast supplementary
> version would harm the BMP performance.
>
> -Ulf
>
Rémi
More information about the core-libs-dev
mailing list