RFR: JDK-8032012, , String.toLowerCase/toUpperCase performance improvement

Xueming Shen xueming.shen at oracle.com
Tue Jan 21 22:05:57 UTC 2014


Hi Paul, thanks for reviewing the changeset, comment inlined below.

On 01/20/2014 09:24 AM, Paul Sandoz wrote:
>
> Some quick comments.
>
> In String.toLowerCase:
>
> - it would be nice to get rid of the pseudo goto using the "scan" labelled block.
>

webrev has been updated to remove the pseudo goto by checking the "first" against
"len" after the loop break.

> - there is no need for the "localeDependent" local variable.
>

I just tried to not throw away the result of this "localeDependent", which is still needed
in the toXCaseEx() methods.

> - you might be able to optimize by doing (could depend on the answer to the next point):
>
>   int c = (int)value[i];
>   int lc = Character.toLowerCase(c);
>   if (.....) { result[i] = (char)lc; } else { return toLowerCaseEx(result, i, locale, localeDependent); }
>
> - Do you need to check ERROR for the result of toLowerCase?
>
> 2586             if (c == Character.ERROR ||
>

Yes, Character.toLowerCase() should never return ERROR (while the package private
Character.toUpperCaseEx() will). In theory there is no need to check if the return
value of  Character.toUpperCase(int) > min_supplementary_code_point in our loop,
because there is no bmp character returns a supplementary code point as its lower
case. But since it's a data driven mapping table, there is no guarantee the unicode
data table is not going to change in the "future", so I still keep the check. The webrev
has been updated to use one single "if" inside the loop.

I have a "single if" implementation for the toUpperCase() as well, though don't sure
which one is better/faster :-)

http://cr.openjdk.java.net/~sherman/8032012/

-Sherman




More information about the core-libs-dev mailing list