RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement
Ulf Zibis
Ulf.Zibis at CoSoCo.de
Wed Jan 22 22:10:56 UTC 2014
Am 22.01.2014 16:20, schrieb Paul Sandoz:
> On Jan 21, 2014, at 11:05 PM, Xueming Shen <xueming.shen at oracle.com> wrote:
>> On 01/20/2014 09:24 AM, Paul Sandoz wrote:
>>> - it would be nice to get rid of the pseudo goto using the "scan" labelled block.
>> webrev has been updated to remove the pseudo goto by checking the "first" against
>> "len" after the loop break.
> Much for readable :-)
I think, you should compare the performance of both versions on modern + 32-bit CPUs.
>>> - you might be able to optimize by doing (could depend on the answer to the next point):
>>>
>>> int c = (int)value[i];
>>> int lc = Character.toLowerCase(c);
>>> if (.....) { result[i] = (char)lc; } else { return toLowerCaseEx(result, i, locale, localeDependent); }
>>>
>>> - Do you need to check ERROR for the result of toLowerCase?
>>>
>>> 2586 if (c == Character.ERROR ||
>>>
>> Yes, Character.toLowerCase() should never return ERROR (while the package private
>> Character.toUpperCaseEx() will). In theory there is no need to check if the return
>> value of Character.toUpperCase(int) > min_supplementary_code_point in our loop,
>> because there is no bmp character returns a supplementary code point as its lower
>> case. But since it's a data driven mapping table, there is no guarantee the unicode
>> data table is not going to change in the "future", so I still keep the check.
In my opinion this check should be subject of JDK's build test, but not of runtime code.
> or:
>
> int c = (int)value[i];
> int lc = Character.toLowerCase(c); // is that safe?
> if (c < '\u03A3' || (c < Character.MIN_HIGH_SURROGATE && c != 'u03A3' && lc < Character.MIN_SUPPLEMENTARY_CODE_POINT))) {
> result[i] = (char)lc;
> } else {
> return toLowerCaseEx(result, i, locale, localeDependent);
> }
>
> FWIW i personally find those solutions easier to read, if they are safe w.r.t. Character.toLowerCase and that annoying greek character.
I would like the 3rd version.
-Ulf
More information about the core-libs-dev
mailing list