RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement
Ulf Zibis
Ulf.Zibis at CoSoCo.de
Thu Feb 6 16:42:44 UTC 2014
Hi,
Am 06.02.2014 00:57, schrieb Xueming Shen:
> On 02/05/2014 03:28 PM, Ulf Zibis wrote:
>> Additionally you could use Character.isSurrogate() and Character.isSupplementaryCode<point() at
>> appropriate places. Both are better optimized for JIT.
>
> j.l.C.isSupplementaryCodePoint() checks up boundary of supp, we probably don't need it
> here, as the returning code point is either a ERROR or a valid unicode code point.
Sorry, I was in error. I meant using !isBmpCodePoint() which is
codePoint >>> 16 != 0
which is should be slightly faster than
codePoint >= Character.MIN_SUPPLEMENTARY_CODE_POINT
as the latter needs a 32-bit value to be loaded.
> I'm not sure about the j.l.C.isSurrogate(), which takes a char and we have an int here.
> I would expect the javac will inline the constants for me, but I don't know whether jit
> can inline and then optimize away the explicit casting i2c. Not a big deal though.
Why you use (in) here, you could do the cast later?
Yes, IIRC from my HSdis inspection, i2c is a Noop and yes, isSurrogate() is better to read.
Additionally I remember, we had discussion if isSurrogate() would be faster with
(byte)(ch >>> 11) == (byte)(0xD8 >>> 3)
So if isSurrogate() will be optimised in future or even intrinsified, using isSurrogate() should be
better than
ch >= MIN_HIGH_SURROGATE&& c <= MAX_HIGH_SURROGATE
Please note:
ch >= MIN_SURROGATE && ch < (MAX_SURROGATE + 1)
generally is better than:
ch >= MIN_HIGH_SURROGATE&& c <= MAX_HIGH_SURROGATE
See: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6984886
-Ulf
More information about the core-libs-dev
mailing list