review request for 6798511/6860431: Include functionality of Surrogate in Character

Ulf Zibis Ulf.Zibis at gmx.de
Tue Mar 2 23:34:08 UTC 2010


Am 26.08.2009 20:02, schrieb Xueming Shen:
>
> For example, the isBMP(int), it might be convenient, but it can be 
> easily archived by the one line code
>
> (int)(char)codePoint == codePoint;
>
> or more readable form
>
>    codePoint < Character.MIN_SUPPLEMENTARY_COE_POINT;
>

In class sun.nio.cs.Surrogate we have:
     public static boolean isBMP(int uc) {
         return (int) (char) uc == uc;
     }

1.) It's enough to have:
         return (char)uc == uc;
     better:
         assert MIN_VALUE == 0 && MAX_VALUE == 0xFFFF;
         return (char)uc == uc;
         // Optimized form of: uc >= MIN_VALUE && uc <= MAX_VALUE

2.) Above code is compiled to (needs 16 bytes of machine code):
   0x00b87ad8: mov    %ebx,%ebp
   0x00b87ada: and    $0xffff,%ebp
   0x00b87ae0: cmp    %ebx,%ebp
   0x00b87ae2: jne    0x00b87c52
   0x00b87ae8:

     We could code:
         assert MIN_VALUE == 0 && (MAX_VALUE + 1) == (1 << 16);
         return (uc >> 16) == 0;
         // Optimized form of: uc >= MIN_VALUE && uc <= MAX_VALUE

     is compiled to (needs only 9 bytes of machine code):
   0x00b87aac: mov    %ebx,%ecx
   0x00b87aae: sar    $0x10,%ecx
   0x00b87ab1: test   %ecx,%ecx
   0x00b87ab3: je     0x00b87acb
   0x00b87ab5:

1.) If we have:
     public static boolean isSupplementaryCodePoint(int codePoint) {
         assert MIN_SUPPLEMENTARY_CODE_POINT == (1 << 16) &&
                 (MAX_SUPPLEMENTARY_CODE_POINT + 1) % (1 << 16) == 0;
         return (codePoint >> 16) != 0
&& (codePoint >> 16) < (MAX_SUPPLEMENTARY_CODE_POINT + 1 >> 16);
         // Optimized form of: codePoint >= MIN_SUPPLEMENTARY_CODE_POINT
         // && codePoint <= MAX_SUPPLEMENTARY_CODE_POINT;
     }
and:
         if (Surrogate.isBMP(uc))
             ...;
         else if (Character.isSupplementaryCodePoint(uc))
             ...;
         else
             ...;

     we get (needs only 18 bytes of machine code):
   0x00b87aac: mov    %ebx,%ecx
   0x00b87aae: sar    $0x10,%ecx
   0x00b87ab1: test   %ecx,%ecx
   0x00b87ab3: je     0x00b87acb
   0x00b87ab5: cmp    $0x11,%ecx
   0x00b87ab8: jge    0x00b87ce6
   0x00b87abe:


-Ulf







More information about the core-libs-dev mailing list