RFR [9] 8151384: Examine sun.misc.ASCIICaseInsensitiveComparator
Chris Hegarty
chris.hegarty at oracle.com
Mon Mar 7 16:29:42 UTC 2016
sun.misc.ASCIICaseInsensitiveComparator appears to be a specialized
comparator for comparing strings that contain only ASCII characters.
Its main usage seems to be in sorted maps that support the character
set implementation. This is startup/performance sensitive code. It
looks like an "optimized" version of Strings public case insensitive
comparator, when the strings are known to contain only ASCII
characters. The public string case insensitive comparator, in some
cases, does a toUpperCase and a toLowerCase.
ASCIICaseInsensitiveComparator is trying to avoid this.
Looking at String.CASE_INSENSITIVE_ORDER it looks like it can be,
somewhat easily, optimized to give similar performance to that of
ASCIICaseInsensitiveComparator without much risk. This will allow
usages of ASCIICaseInsensitiveComparator to be replaced with
String.CASE_INSENSITIVE_ORDER. For one, internal getChar does not
pay the cost of bounds checks that charAt does ( which is used
by ASCIICaseInsensitiveComparator ).
What is in the webrev is specialized versions of compare when
the coder of the strings match. Alternatively, this could be pushed
down to String[Latin1|UTF16].
Webrev & bug:
http://cr.openjdk.java.net/~chegar/8151384/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8151384
Benchmarks and results ( based, somewhat, on Aleksey's [1] ):
http://cr.openjdk.java.net/~chegar/8151384/bench/
Two micro benchmarks:
1) Compare performance of comparing available charset names
with ASCIICaseInsensitiveComparator and CASE_INSENSITIVE_ORDER.
After the changes, CASE_INSENSITIVE_ORDER marginally out
performs ASCIICaseInsensitiveComparator.
2) Compare general performance of CASE_INSENSITIVE_ORDER.
The results show improved performance for all cases,
especially when one, or more, strings contains UTF16.
Note: this issue is not intending to optimize
String.CASE_INSENSITIVE_ORDER as much as possible,
just to make reasonable changes that improve performance to a point
where it is a reasonable replacement for
ASCIICaseInsensitiveComparator. Further optimization should not be
prevented, or twarted, by this work.
Note: the usage of ASCIICaseInsensitiveComparator in jar attributes
appears to have been done to avoid the allocation cost of toLowerCase.
This seems acceptable for hashCode, but could be avoided, if necessary.
-Chris.
[2] http://cr.openjdk.java.net/~shade/density/
More information about the core-libs-dev
mailing list