Review patches isBMPCodePoint/2/3

Martin Buchholz martinrb at google.com
Thu Mar 25 23:47:19 UTC 2010


On Thu, Mar 25, 2010 at 15:19, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
> Am 25.03.2010 22:47, schrieb Martin Buchholz:
>>
>> Here's another minor performance tweak to
>>
>>     public String(int[] codePoints, int offset, int count) {
>>
>> that optimizes for BMP.
>>
>>         // Pass 1: Compute precise size of char[]
>>         int n = count;
>>         for (int i = offset; i<  end; i++) {
>>             int c = codePoints[i];
>>             if (Character.isBMPCodePoint(c))
>>                 ;
>>             else if (Character.isSupplementaryCodePoint(c))
>>                 n++;
>>             else throw new IllegalArgumentException(Integer.toString(c));
>>         }
>>
>
> Yes, this is a valuable pattern, you found out.
> I think, it could look smarter/more clear:
>
>            if (Character.isBMPCodePoint(c))
>                continue;
>            if (Character.isSupplementaryCodePoint(c))
>                n++;
>            else
>                throw new IllegalArgumentException(Integer.toString(c));
>
> And this would be faster, as isSupplementaryCodePoint is not optimized for
> following isBMPCodePoint:
>
>            if (Character.isBMPCodePoint(c))
>                continue;
>            if (!Character.isValidCodePoint(c))
>                throw new IllegalArgumentException(Integer.toString(c));
>            n++;

Done.

> Before you go to the meeting, maybe scan the JDK for similar use cases,

Sorry, that's your job.

> before I get addicted too, and don't forget to define c as final.

I see no reason to declare c as final here.

> It's enough, that I'm addicted from:
>        // fill backwards for VM performance reasons, reduces register
> pressure, faster compare against 0
>        for (int i = end; n > 0; ) {
>            int c = codePoints[--i];
>            if (Character.isBMPCodePoint(c))
>                v[--n] = (char)c;
>            else
>                Character.toSurrogates(c, v, n-=2);
>        }

Do you have actual evidence that this is faster?

I can see a different reason why - βουστροφηδόν traversal
is more cache-friendly.

http://en.wikipedia.org/wiki/Boustrophedon
Maybe those ancient Greeks were on to something.

Martin



More information about the core-libs-dev mailing list