RFR [9] 8151384: Examine sun.misc.ASCIICaseInsensitiveComparator

Peter Levart peter.levart at gmail.com
Wed Mar 9 15:37:02 UTC 2016


Hi Chris,

So what do you think of providing a hashCode for 
java.util.jar.Attributes.Name that is obviously consistent with its 
equals method (and not dependent on the good will of Unicode tables) 
*and* also provide it as a public API to serve others, like for example:

http://cr.openjdk.java.net/~plevart/jdk9-dev/String.CASE_INSENSITIVE_HASHER/webrev.01/

Regards, Peter


On 03/09/2016 03:43 PM, Peter Levart wrote:
>
>
> On 03/09/2016 02:44 PM, Chris Hegarty wrote:
>> On 9 Mar 2016, at 13:03, Claes Redestad <claes.redestad at oracle.com> 
>> wrote:
>>
>>> On 2016-03-09 13:17, Peter Levart wrote:
>>>>> When digging through old history to try to find out why 
>>>>> java.util.jar.Attributes
>>>>> was ever using ASCIICaseInsensitiveComparator, it was not clear that
>>>>> performance was the motivation.
>>>> I guess looking-up a manifest attribute is not a performance 
>>>> critical operation, you are right.
>>> Could this be an old startup optimization, since first call to 
>>> String.toLowerCase/toUpperCase will initialize and pull in 
>>> java.util.Locale and friends? If so it's probably not effective any 
>>> more.
>>>
>>> Coincidentally - due to a recent regression - we're currently 
>>> spending quite a bit of time parsing manifests of all jar files on 
>>> the classpath, making ASCIICaseInsensitiveComparator show up 
>>> prominently in some startup profiles.
>> Not any more ( it is no longer with us )!!
>>
>> Interesting… let me know if you issues once this change makes its
>> way into a promoted build, or during your performance investigations.
>>
>> BTW. I am not against doing something “smarter” for Attributes.hashCode.
>> I just didn’t think it was relevant, or performance sensitive, any more.
>>
>> -Chris
>
> Hi Chris,
>
> I have another concern. Let's say Attributes keys are LATIN1. So for 
> comparison, the StringLatin1.compareToCI is used:
>
>     public static int compareToCI(byte[] value, byte[] other) {
>         int len1 = value.length;
>         int len2 = other.length;
>         int lim = Math.min(len1, len2);
>         for (int k = 0; k < lim; k++) {
>             if (value[k] != other[k]) {
>                 char c1 = (char) 
> CharacterDataLatin1.instance.toUpperCase(getChar(value, k));
>                 char c2 = (char) 
> CharacterDataLatin1.instance.toUpperCase(getChar(other, k));
>                 if (c1 != c2) {
>                     c1 = (char) 
> CharacterDataLatin1.instance.toLowerCase(c1);
>                     c2 = (char) 
> CharacterDataLatin1.instance.toLowerCase(c2);
>                     if (c1 != c2) {
>                         return c1 - c2;
>                     }
>                 }
>             }
>         }
>         return len1 - len2;
>     }
>
> comparing this with Name.hashCode:
>
>         public int hashCode() {
>             if (hashCode == -1) {
>                 hashCode = name.toLowerCase(Locale.ROOT).hashCode();
>             }
>             return hashCode;
>         }
>
>
> ...is it possible that for some pair of keys, compareToCI would result 
> in 0, but hashCode(s) would differ? For example, the uppercased keys 
> would be the same, but the .toLowerCase(Locale.ROOT) not? Maybe not 
> for LATIN1 keys, but what if one uses non-latin1 keys 
> (StringUTF16.compareToCI is similar)?
>
> Regards, Peter
>




More information about the core-libs-dev mailing list