RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

Remi Forax forax at univ-mlv.fr
Wed Feb 5 21:00:32 UTC 2014


On 02/05/2014 07:43 PM, Xueming Shen wrote:
> On 02/05/2014 11:09 AM, Paul Sandoz wrote:
>> On Feb 5, 2014, at 6:58 PM, Xueming Shen<xueming.shen at oracle.com>  
>> wrote:
>>
>>> Hi,
>>>
>>> Let's try to wrap it up, otherwise I may drop the ball somewhere :-)
>>>
>>> On 01/22/2014 07:20 AM, Paul Sandoz wrote:
>>>>
>>>> if (lang == "tr" || lang == "az" || lang == "lt") {
>>>>    // local dependent
>>>>    return toLowerCaseEx(result, firstUpper, locale, true);
>>>> }
>>>> // otherwise false is passed to subsequent calls to toLowerCaseEx
>>>>
>>>> ?
>>>>
>>> toLowerCaseEx will also be invoked later (in your another suggestion 
>>> next), which
>>> needs a "localeDependent".
>>>
>> But is not the second call to toLowerCaseEx always invoked with a 
>> value of false?
>>
>> 2574         String lang = locale.getLanguage();
>> 2575         final boolean localeDependent = (lang == "tr" || lang == 
>> "az" || lang == "lt");
>> 2576         if (localeDependent) {
>> 2577             return toLowerCaseEx(result, firstUpper, locale, 
>> localeDependent);  //<-- localeDependent is true
>> 2578         }
>> 2579         for (int i = firstUpper; i<  len; i++) {
>> 2580             int c = (int)value[i];
>> 2581             if (c>= Character.MIN_HIGH_SURROGATE&&  c<= 
>> Character.MAX_HIGH_SURROGATE ||
>> 2582                 c == '\u03A3' ||      // GREEK CAPITAL LETTER SIGMA
>> 2583                 (c = Character.toLowerCase(c))>= 
>> Character.MIN_SUPPLEMENTARY_CODE_POINT) {
>> 2584                 return toLowerCaseEx(result, i, locale, 
>> localeDependent);  //<-- localDependent is false
>> 2585             }
>> 2586             result[i] = (char)c;
>> 2587         }
>> 2588         return new String(result, true);
>>
>>
>
> You are absolutely right :-) I will update as suggested.
>
> -sherman

Hi Sherman,
the code can be faster if the first loop call toLowerCaseEx in case of a 
surrogate instead of modifying srcCount because in that case the JIT 
will see that the increment is constant.

Note that with the current code, the performance of toLowerCase if it 
was never called with a string that contains a surrogate are great but 
if toLowerCase is called once with a string that contains a surrogate, 
even if toLowercase is called after with strings that never contain a 
surrogate performance will be slower than it was. Calling toLowerCaseEx 
will make a toLowerCase with a surrogate slower but make toLowerCase 
faster in the common case.

cheers,
Rémi







More information about the core-libs-dev mailing list