[9] Review request for 8140503: JavaFX "Pair" Hash Code Collisions
Vadim Pakhnushev
vadim.pakhnushev at oracle.com
Mon Nov 23 20:32:00 UTC 2015
How about "Improve javafx.util.Pair hash code computation" ?
Vadim
On 23.11.2015 22:56, Jim Graham wrote:
> Looks fine to me. Perhaps the synopsis on the bug should be changed
> to "... hash code computation causes more collisions than it should"?
>
> ...jim
>
> On 11/23/15 6:11 AM, Vadim Pakhnushev wrote:
>> Guys,
>> Could you please review the updated fix?
>>
>> https://bugs.openjdk.java.net/browse/JDK-8140503
>> http://cr.openjdk.java.net/~vadim/8140503/webrev.01/
>>
>> Thanks,
>> Vadim
>>
>> On 04.11.2015 1:43, Kevin Rushforth wrote:
>>> Inlining it seems like a fine way to go to me, too. As another point
>>> of reference, we use 7/31 in other places in JavaFX.
>>>
>>> -- Kevin
>>>
>>>
>>> Jim Graham wrote:
>>>> Absolutely, this report uncovered an unrelated NPE bug which is
>>>> helpful, and apparently different constants might affect the
>>>> collision probabilities (which were already very unlikely to begin
>>>> with), but the bug report was definitely filed primarily as a
>>>> misunderstanding.
>>>>
>>>> One thing to note - perhaps small numbers are more likely to be
>>>> encountered so the larger the prime, the less likelihood of getting a
>>>> collision. I wouldn't recommend using too large of a prime since the
>>>> probability is already so low even with 13, and larger primes may
>>>> increase the size of the byte code and compiled code since larger
>>>> constants take more complicated instructions to deal with. Josh Bloch
>>>> recommended a base of 17 and a multiplier of 37 in Chapter 3 of
>>>> Effective Java, but he also admitted that the numbers were arbitrary
>>>> except for the multiplier needing to be an odd prime.
>>>>
>>>> One reason 13 might be too low depends on the probability that Pair
>>>> is used in hash cases with lots of very small numbers. Adding in a
>>>> base prime and using a little bit larger prime multiplier quickly
>>>> turns lots of small numbers into massively distributed hashes - even
>>>> with 31 or 17/37.
>>>>
>>>> I kind of feel like using the existing utility is overkill for just 2
>>>> objects since it requires constructing an array object to use it. I'd
>>>> be just as happy (perhaps even more so) if we just upgraded the code
>>>> to check key for null and use a little larger prime, but keep it
>>>> inlined...
>>>>
>>>> ...jim
>>>>
>>>> On 11/3/15 1:43 PM, Alexander Kouznetsov wrote:
>>>>> Moreover, the following two sentences:
>>>>>
>>>>> "However, this is an incorrect way to compute a hash code of two
>>>>> values."
>>>>> "This can lead to hard-to-find bugs anywhere that instances of
>>>>> Pair are
>>>>> used in a data structure like a HashSet or HashTable."
>>>>>
>>>>> seem to indicate misunderstanding of what hashcode is and how it
>>>>> is to
>>>>> be used.
>>>>>
>>>>> Best regards,
>>>>> Alexander Kouznetsov
>>>>> (408) 276-0387
>>>>>
>>>>> On 3 ноя 2015 13:42, Alexander Kouznetsov wrote:
>>>>>> After the fix, you should expect another incident report of
>>>>>>
>>>>>> Objects.hash(1, 0) == Objects.hash(0, 31)
>>>>>>
>>>>>> always true :-)
>>>>>>
>>>>>> I'd rather file another bug on key == null causing NPE and closing
>>>>>> this one as incomplete or not an issue.
>>>>>>
>>>>>> Best regards,
>>>>>> Alexander Kouznetsov
>>>>>> (408) 276-0387
>>>>>>
>>>>>> On 3 ноя 2015 12:07, Vadim Pakhnushev wrote:
>>>>>>> Hmm, yeah, the actual difference is in the prime number only
>>>>>>> (that is
>>>>>>> changing the algorithm only doesn't improve anything), so the only
>>>>>>> remaining reason to fix this is that Objects.hash guards against
>>>>>>> null
>>>>>>> values (and I forgot to mention it in the review).
>>>>>>> The key in Pair could actually be null and in this case hashCode
>>>>>>> will
>>>>>>> throw NPE.
>>>>>>>
>>>>>>> Vadim
>>>>>>>
>>>>>>> On 03.11.2015 23:01, Vadim Pakhnushev wrote:
>>>>>>>> Well, not exactly... Previously it was 13*hash(a) + hash(b) and
>>>>>>>> now
>>>>>>>> it's 31*(31 + hash(a)) + hash(b).
>>>>>>>> And apparently it improves the quality somehow. I did a test with
>>>>>>>> 100^4 combinations and collision probability dropped by the factor
>>>>>>>> of 3 from 0.065% to 0.022%.
>>>>>>>> Not really impressive, but still, and it uses well-defined utility
>>>>>>>> method.
>>>>>>>> Yeah, I know it's not really a bug since you don't want to rely on
>>>>>>>> the hashCode at all...
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vadim
>>>>>>>>
>>>>>>>> On 03.11.2015 22:35, Jim Graham wrote:
>>>>>>>>> All this does is change the prime constant used to produce the
>>>>>>>>> hash
>>>>>>>>> value.
>>>>>>>>>
>>>>>>>>> Objects.hash(a, b) uses 31*hash(a) + hash(b) instead of the
>>>>>>>>> 13*hash(a) + hash(b) that the embedded implementation uses.
>>>>>>>>>
>>>>>>>>> I don't really think this is a bug. The fact that Integer objects
>>>>>>>>> make it easy to reverse engineer and compute collisions of any
>>>>>>>>> reasonable hash combination computation don't mean that the
>>>>>>>>> technique has a bug, it just means that the submitter can read
>>>>>>>>> the
>>>>>>>>> code and think of a counter-example.
>>>>>>>>>
>>>>>>>>> If there are practical problems being caused for some particular
>>>>>>>>> and popular use case by the use of this particular constant "13",
>>>>>>>>> then we need to understand those issues and come up with a more
>>>>>>>>> comprehensive solution than to simply hand off to another
>>>>>>>>> mechanism
>>>>>>>>> which uses the same procedure with a different prime constant...
>>>>>>>>>
>>>>>>>>> ...jim
>>>>>>>>>
>>>>>>>>> On 11/3/15 3:06 AM, Vadim Pakhnushev wrote:
>>>>>>>>>> Hi Chien,
>>>>>>>>>>
>>>>>>>>>> Could you please review the fix:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8140503
>>>>>>>>>> http://cr.openjdk.java.net/~vadim/8140503/webrev.00/
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vadim
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>
More information about the openjfx-dev
mailing list