RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

Thu Feb 6 16:42:44 UTC 2014

Hi,

Am 06.02.2014 00:57, schrieb Xueming Shen:
> On 02/05/2014 03:28 PM, Ulf Zibis wrote:
>> Additionally you could use Character.isSurrogate() and Character.isSupplementaryCode<point() at 
>> appropriate places. Both are better optimized for JIT.
>
> j.l.C.isSupplementaryCodePoint() checks up boundary of supp, we probably don't need it
> here, as the returning code point is either a ERROR or a valid unicode code point.

Sorry, I was in error. I meant using !isBmpCodePoint() which is
     codePoint >>> 16 != 0
which is should be slightly faster than
     codePoint >= Character.MIN_SUPPLEMENTARY_CODE_POINT
as the latter needs a 32-bit value to be loaded.

> I'm not sure about the j.l.C.isSurrogate(), which takes a char and we have an int here.
> I would expect the javac will inline the constants for me, but I don't know whether jit
> can inline and then optimize away the explicit casting i2c. Not a big deal though.

Why you use (in) here, you could do the cast later?
Yes, IIRC from my HSdis inspection, i2c is a Noop and yes, isSurrogate() is better to read.
Additionally I remember, we had discussion if isSurrogate() would be faster with
     (byte)(ch >>> 11) == (byte)(0xD8 >>> 3)
So if isSurrogate() will be optimised in future or even intrinsified, using isSurrogate() should be 
better than
     ch >= MIN_HIGH_SURROGATE&&  c <= MAX_HIGH_SURROGATE
Please note:
     ch >= MIN_SURROGATE && ch < (MAX_SURROGATE + 1)
generally is better than:
     ch >= MIN_HIGH_SURROGATE&&  c <= MAX_HIGH_SURROGATE
See: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6984886


-Ulf