RFR [9] 8151384: Examine sun.misc.ASCIICaseInsensitiveComparator
Peter Levart
peter.levart at gmail.com
Wed Mar 9 15:37:02 UTC 2016
Hi Chris,
So what do you think of providing a hashCode for
java.util.jar.Attributes.Name that is obviously consistent with its
equals method (and not dependent on the good will of Unicode tables)
*and* also provide it as a public API to serve others, like for example:
http://cr.openjdk.java.net/~plevart/jdk9-dev/String.CASE_INSENSITIVE_HASHER/webrev.01/
Regards, Peter
On 03/09/2016 03:43 PM, Peter Levart wrote:
>
>
> On 03/09/2016 02:44 PM, Chris Hegarty wrote:
>> On 9 Mar 2016, at 13:03, Claes Redestad <claes.redestad at oracle.com>
>> wrote:
>>
>>> On 2016-03-09 13:17, Peter Levart wrote:
>>>>> When digging through old history to try to find out why
>>>>> java.util.jar.Attributes
>>>>> was ever using ASCIICaseInsensitiveComparator, it was not clear that
>>>>> performance was the motivation.
>>>> I guess looking-up a manifest attribute is not a performance
>>>> critical operation, you are right.
>>> Could this be an old startup optimization, since first call to
>>> String.toLowerCase/toUpperCase will initialize and pull in
>>> java.util.Locale and friends? If so it's probably not effective any
>>> more.
>>>
>>> Coincidentally - due to a recent regression - we're currently
>>> spending quite a bit of time parsing manifests of all jar files on
>>> the classpath, making ASCIICaseInsensitiveComparator show up
>>> prominently in some startup profiles.
>> Not any more ( it is no longer with us )!!
>>
>> Interesting… let me know if you issues once this change makes its
>> way into a promoted build, or during your performance investigations.
>>
>> BTW. I am not against doing something “smarter” for Attributes.hashCode.
>> I just didn’t think it was relevant, or performance sensitive, any more.
>>
>> -Chris
>
> Hi Chris,
>
> I have another concern. Let's say Attributes keys are LATIN1. So for
> comparison, the StringLatin1.compareToCI is used:
>
> public static int compareToCI(byte[] value, byte[] other) {
> int len1 = value.length;
> int len2 = other.length;
> int lim = Math.min(len1, len2);
> for (int k = 0; k < lim; k++) {
> if (value[k] != other[k]) {
> char c1 = (char)
> CharacterDataLatin1.instance.toUpperCase(getChar(value, k));
> char c2 = (char)
> CharacterDataLatin1.instance.toUpperCase(getChar(other, k));
> if (c1 != c2) {
> c1 = (char)
> CharacterDataLatin1.instance.toLowerCase(c1);
> c2 = (char)
> CharacterDataLatin1.instance.toLowerCase(c2);
> if (c1 != c2) {
> return c1 - c2;
> }
> }
> }
> }
> return len1 - len2;
> }
>
> comparing this with Name.hashCode:
>
> public int hashCode() {
> if (hashCode == -1) {
> hashCode = name.toLowerCase(Locale.ROOT).hashCode();
> }
> return hashCode;
> }
>
>
> ...is it possible that for some pair of keys, compareToCI would result
> in 0, but hashCode(s) would differ? For example, the uppercased keys
> would be the same, but the .toLowerCase(Locale.ROOT) not? Maybe not
> for LATIN1 keys, but what if one uses non-latin1 keys
> (StringUTF16.compareToCI is similar)?
>
> Regards, Peter
>
More information about the core-libs-dev
mailing list