RFR [9] 8151384: Examine sun.misc.ASCIICaseInsensitiveComparator

Peter Levart peter.levart at gmail.com
Wed Mar 9 15:58:18 UTC 2016



On 03/09/2016 04:30 PM, Chris Hegarty wrote:
> On 9 Mar 2016, at 14:43, Peter Levart <peter.levart at gmail.com> wrote:
>
>> On 03/09/2016 02:44 PM, Chris Hegarty wrote:
>>> On 9 Mar 2016, at 13:03, Claes Redestad <claes.redestad at oracle.com> wrote:
>>>
>>>> On 2016-03-09 13:17, Peter Levart wrote:
>>>>>> When digging through old history to try to find out why java.util.jar.Attributes
>>>>>> was ever using ASCIICaseInsensitiveComparator, it was not clear that
>>>>>> performance was the motivation.
>>>>> I guess looking-up a manifest attribute is not a performance critical operation, you are right.
>>>> Could this be an old startup optimization, since first call to String.toLowerCase/toUpperCase will initialize and pull in java.util.Locale and friends? If so it's probably not effective any more.
>>>>
>>>> Coincidentally - due to a recent regression - we're currently spending quite a bit of time parsing manifests of all jar files on the classpath, making ASCIICaseInsensitiveComparator show up prominently in some startup profiles.
>>> Not any more ( it is no longer with us )!!
>>>
>>> Interesting… let me know if you issues once this change makes its
>>> way into a promoted build, or during your performance investigations.
>>>
>>> BTW. I am not against doing something “smarter” for Attributes.hashCode.
>>> I just didn’t think it was relevant, or performance sensitive, any more.
>>>
>>> -Chris
>> Hi Chris,
>>
>> I have another concern. Let's say Attributes keys are LATIN1. So for comparison, the StringLatin1.compareToCI is used:
>>
>>     public static int compareToCI(byte[] value, byte[] other) {
>>         int len1 = value.length;
>>         int len2 = other.length;
>>         int lim = Math.min(len1, len2);
>>         for (int k = 0; k < lim; k++) {
>>             if (value[k] != other[k]) {
>>                 char c1 = (char) CharacterDataLatin1.instance.toUpperCase(getChar(value, k));
>>                 char c2 = (char) CharacterDataLatin1.instance.toUpperCase(getChar(other, k));
>>                 if (c1 != c2) {
>>                     c1 = (char) CharacterDataLatin1.instance.toLowerCase(c1);
>>                     c2 = (char) CharacterDataLatin1.instance.toLowerCase(c2);
>>                     if (c1 != c2) {
>>                         return c1 - c2;
>>                     }
>>                 }
>>             }
>>         }
>>         return len1 - len2;
>>     }
>>
>> comparing this with Name.hashCode:
>>
>>         public int hashCode() {
>>             if (hashCode == -1) {
>>                 hashCode = name.toLowerCase(Locale.ROOT).hashCode();
>>             }
>>             return hashCode;
>>         }
>>
>>
>> ...is it possible that for some pair of keys, compareToCI would result in 0, but hashCode(s) would differ? For example, the uppercased keys would be the same, but the .toLowerCase(Locale.ROOT) not? Maybe not for LATIN1 keys, but what if one uses non-latin1 keys (StringUTF16.compareToCI is similar)?
> Can this really happen? ASCIICaseInsensitiveComparator was asserting that
> string characters were ASCII, so this situation would have triggered an assert
> with the old code, no?

If assertions were enabled, yes. Otherwise it would have compared 
non-ascii characters with case sensitivity and so would have hashCode - 
it would have always been consistent with equals(). Now we have the 
following change of behavior:

- no assertion failure on non-ascii characters when Name.equals() is called
- hashCode is not guaranteed to be consistent in such cases (or maybe it 
is, but at the mercy of current Unicode tables).

This would most probably go unnoticed, but wouldn't it be nicer if the 
code guaranteed the consistency of hashCode?

Regards, Peter

P.S.

Do you happen to know why String.compareToIgnoreCase / 
CASE_INSENSITIVE_ORDER is defined to compare characters transformed 
through the following function:

Character.toLowerCase(Character.toUpperCase(character))

...and not simply:

Character.toLowerCase(character) or Character.toUpperCase(character)


>
> -Chris.
>




More information about the core-libs-dev mailing list