review request for 6798511/6860431: Include functionality of Surrogate in Character
Martin Buchholz
martinrb at google.com
Wed Mar 3 08:00:05 UTC 2010
On Tue, Mar 2, 2010 at 15:34, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
> Am 26.08.2009 20:02, schrieb Xueming Shen:
>>
>> For example, the isBMP(int), it might be convenient, but it can be easily
>> archived by the one line code
>>
>> (int)(char)codePoint == codePoint;
>>
>> or more readable form
>>
>> codePoint < Character.MIN_SUPPLEMENTARY_COE_POINT;
>>
>
> In class sun.nio.cs.Surrogate we have:
> public static boolean isBMP(int uc) {
> return (int) (char) uc == uc;
> }
>
> 1.) It's enough to have:
> return (char)uc == uc;
> better:
> assert MIN_VALUE == 0 && MAX_VALUE == 0xFFFF;
> return (char)uc == uc;
> // Optimized form of: uc >= MIN_VALUE && uc <= MAX_VALUE
>
> 2.) Above code is compiled to (needs 16 bytes of machine code):
> 0x00b87ad8: mov %ebx,%ebp
> 0x00b87ada: and $0xffff,%ebp
> 0x00b87ae0: cmp %ebx,%ebp
> 0x00b87ae2: jne 0x00b87c52
> 0x00b87ae8:
>
> We could code:
> assert MIN_VALUE == 0 && (MAX_VALUE + 1) == (1 << 16);
> return (uc >> 16) == 0;
> // Optimized form of: uc >= MIN_VALUE && uc <= MAX_VALUE
I agree that
return (uc >> 16) == 0;
is marginally better than my
return (int) (char) uc == uc;
(although I think the redundant cast to int
makes the code more readable).
I approve such a change to isBMPCodePoint()
and inclusion of such a method in Character.
> is compiled to (needs only 9 bytes of machine code):
> 0x00b87aac: mov %ebx,%ecx
> 0x00b87aae: sar $0x10,%ecx
> 0x00b87ab1: test %ecx,%ecx
> 0x00b87ab3: je 0x00b87acb
> 0x00b87ab5:
>
> 1.) If we have:
> public static boolean isSupplementaryCodePoint(int codePoint) {
> assert MIN_SUPPLEMENTARY_CODE_POINT == (1 << 16) &&
> (MAX_SUPPLEMENTARY_CODE_POINT + 1) % (1 << 16) == 0;
> return (codePoint >> 16) != 0
> && (codePoint >> 16) < (MAX_SUPPLEMENTARY_CODE_POINT + 1 >> 16);
> // Optimized form of: codePoint >= MIN_SUPPLEMENTARY_CODE_POINT
> // && codePoint <= MAX_SUPPLEMENTARY_CODE_POINT;
> }
Keep in mind that supplementary characters are extremely rare.
Therefore the existing implementation
return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT
&& codePoint <= MAX_CODE_POINT;
will almost always perform just one comparison against a constant,
which is hard to beat.
I'm not sure whether your code above gets the right answer for negative input.
Perhaps you need to do
(codePoint >>> 16) < ...
Martin
> and:
> if (Surrogate.isBMP(uc))
> ...;
> else if (Character.isSupplementaryCodePoint(uc))
> ...;
> else
> ...;
>
> we get (needs only 18 bytes of machine code):
> 0x00b87aac: mov %ebx,%ecx
> 0x00b87aae: sar $0x10,%ecx
> 0x00b87ab1: test %ecx,%ecx
> 0x00b87ab3: je 0x00b87acb
> 0x00b87ab5: cmp $0x11,%ecx
> 0x00b87ab8: jge 0x00b87ce6
> 0x00b87abe:
>
>
> -Ulf
>
>
>
>
>
More information about the core-libs-dev
mailing list