RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement
Remi Forax
forax at univ-mlv.fr
Wed Feb 5 21:00:32 UTC 2014
On 02/05/2014 07:43 PM, Xueming Shen wrote:
> On 02/05/2014 11:09 AM, Paul Sandoz wrote:
>> On Feb 5, 2014, at 6:58 PM, Xueming Shen<xueming.shen at oracle.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Let's try to wrap it up, otherwise I may drop the ball somewhere :-)
>>>
>>> On 01/22/2014 07:20 AM, Paul Sandoz wrote:
>>>>
>>>> if (lang == "tr" || lang == "az" || lang == "lt") {
>>>> // local dependent
>>>> return toLowerCaseEx(result, firstUpper, locale, true);
>>>> }
>>>> // otherwise false is passed to subsequent calls to toLowerCaseEx
>>>>
>>>> ?
>>>>
>>> toLowerCaseEx will also be invoked later (in your another suggestion
>>> next), which
>>> needs a "localeDependent".
>>>
>> But is not the second call to toLowerCaseEx always invoked with a
>> value of false?
>>
>> 2574 String lang = locale.getLanguage();
>> 2575 final boolean localeDependent = (lang == "tr" || lang ==
>> "az" || lang == "lt");
>> 2576 if (localeDependent) {
>> 2577 return toLowerCaseEx(result, firstUpper, locale,
>> localeDependent); //<-- localeDependent is true
>> 2578 }
>> 2579 for (int i = firstUpper; i< len; i++) {
>> 2580 int c = (int)value[i];
>> 2581 if (c>= Character.MIN_HIGH_SURROGATE&& c<=
>> Character.MAX_HIGH_SURROGATE ||
>> 2582 c == '\u03A3' || // GREEK CAPITAL LETTER SIGMA
>> 2583 (c = Character.toLowerCase(c))>=
>> Character.MIN_SUPPLEMENTARY_CODE_POINT) {
>> 2584 return toLowerCaseEx(result, i, locale,
>> localeDependent); //<-- localDependent is false
>> 2585 }
>> 2586 result[i] = (char)c;
>> 2587 }
>> 2588 return new String(result, true);
>>
>>
>
> You are absolutely right :-) I will update as suggested.
>
> -sherman
Hi Sherman,
the code can be faster if the first loop call toLowerCaseEx in case of a
surrogate instead of modifying srcCount because in that case the JIT
will see that the increment is constant.
Note that with the current code, the performance of toLowerCase if it
was never called with a string that contains a surrogate are great but
if toLowerCase is called once with a string that contains a surrogate,
even if toLowercase is called after with strings that never contain a
surrogate performance will be slower than it was. Calling toLowerCaseEx
will make a toLowerCase with a surrogate slower but make toLowerCase
faster in the common case.
cheers,
Rémi
More information about the core-libs-dev
mailing list