Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint

Martin Buchholz martinrb at google.com
Fri Mar 12 01:46:37 UTC 2010


On Thu, Mar 11, 2010 at 13:14, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
> Am 11.03.2010 20:38, schrieb Martin Buchholz:
>>
>> Ulf, your changes would be easier to get in
>> if they were organized as mq patch files that
>> could be qimported into an existing mq repo.
>>
>
> To be honest, I never heard about mq. Can you point me to some docs please?

http://mercurial.selenic.com/wiki/MqExtension
http://hgbook.red-bean.com/read/managing-change-with-mercurial-queues.html

>> I've done that below, which includes a subset of
>> your own proposed changes:
>>
>>
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/
>>
>
> - Maybe better:  "... using a single {@code char}".
> - Why don't you like using the new isBMPCodePoint() for
> isSupplementaryCodePoint() and toUpperCaseCharArray() ?
> - Same shift magic would enhance isISOControl(),

I propose the following small improvement on your own
version of isISOControl:

    public static boolean isISOControl(int codePoint) {
        // Optimized form of:
        //     (codePoint >= 0x0000 && codePoint <= 0x001F) ||
        //     (codePoint >= 0x007F && codePoint <= 0x009F);
        return codePoint <= 0x009F &&
            (codePoint >= 0x007F || (codePoint >>> 5 == 0));
    }

Because non-ASCII chars get away with only one comparison.

 isHighSurrogate(),
> isLowSurrogate(), in particular if latter occur consecutive.
>  8-bit shift + compare would allow HotSpot to compile to smart 1-byte
> immediate op-codes.

Alright, you've talked me into it,
I can't resist your love of micro-optimizations.

More later.

Martin



More information about the core-libs-dev mailing list