[9] Review request for 8140503: JavaFX "Pair" Hash Code Collisions

Jim Graham james.graham at oracle.com
Mon Nov 23 19:56:11 UTC 2015


Looks fine to me.  Perhaps the synopsis on the bug should be changed to 
"... hash code computation causes more collisions than it should"?

			...jim

On 11/23/15 6:11 AM, Vadim Pakhnushev wrote:
> Guys,
> Could you please review the updated fix?
>
> https://bugs.openjdk.java.net/browse/JDK-8140503
> http://cr.openjdk.java.net/~vadim/8140503/webrev.01/
>
> Thanks,
> Vadim
>
> On 04.11.2015 1:43, Kevin Rushforth wrote:
>> Inlining it seems like a fine way to go to me, too. As another point
>> of reference, we use 7/31 in other places in JavaFX.
>>
>> -- Kevin
>>
>>
>> Jim Graham wrote:
>>> Absolutely, this report uncovered an unrelated NPE bug which is
>>> helpful, and apparently different constants might affect the
>>> collision probabilities (which were already very unlikely to begin
>>> with), but the bug report was definitely filed primarily as a
>>> misunderstanding.
>>>
>>> One thing to note - perhaps small numbers are more likely to be
>>> encountered so the larger the prime, the less likelihood of getting a
>>> collision. I wouldn't recommend using too large of a prime since the
>>> probability is already so low even with 13, and larger primes may
>>> increase the size of the byte code and compiled code since larger
>>> constants take more complicated instructions to deal with. Josh Bloch
>>> recommended a base of 17 and a multiplier of 37 in Chapter 3 of
>>> Effective Java, but he also admitted that the numbers were arbitrary
>>> except for the multiplier needing to be an odd prime.
>>>
>>> One reason 13 might be too low depends on the probability that Pair
>>> is used in hash cases with lots of very small numbers. Adding in a
>>> base prime and using a little bit larger prime multiplier quickly
>>> turns lots of small numbers into massively distributed hashes - even
>>> with 31 or 17/37.
>>>
>>> I kind of feel like using the existing utility is overkill for just 2
>>> objects since it requires constructing an array object to use it. I'd
>>> be just as happy (perhaps even more so) if we just upgraded the code
>>> to check key for null and use a little larger prime, but keep it
>>> inlined...
>>>
>>> ...jim
>>>
>>> On 11/3/15 1:43 PM, Alexander Kouznetsov wrote:
>>>> Moreover, the following two sentences:
>>>>
>>>> "However, this is an incorrect way to compute a hash code of two
>>>> values."
>>>> "This can lead to hard-to-find bugs anywhere that instances of Pair are
>>>> used in a data structure like a HashSet or HashTable."
>>>>
>>>> seem to indicate misunderstanding of what hashcode is and how it is to
>>>> be used.
>>>>
>>>> Best regards,
>>>> Alexander Kouznetsov
>>>> (408) 276-0387
>>>>
>>>> On 3 ноя 2015 13:42, Alexander Kouznetsov wrote:
>>>>> After the fix, you should expect another incident report of
>>>>>
>>>>> Objects.hash(1, 0) == Objects.hash(0, 31)
>>>>>
>>>>> always true :-)
>>>>>
>>>>> I'd rather file another bug on key == null causing NPE and closing
>>>>> this one as incomplete or not an issue.
>>>>>
>>>>> Best regards,
>>>>> Alexander Kouznetsov
>>>>> (408) 276-0387
>>>>>
>>>>> On 3 ноя 2015 12:07, Vadim Pakhnushev wrote:
>>>>>> Hmm, yeah, the actual difference is in the prime number only (that is
>>>>>> changing the algorithm only doesn't improve anything), so the only
>>>>>> remaining reason to fix this is that Objects.hash guards against null
>>>>>> values (and I forgot to mention it in the review).
>>>>>> The key in Pair could actually be null and in this case hashCode will
>>>>>> throw NPE.
>>>>>>
>>>>>> Vadim
>>>>>>
>>>>>> On 03.11.2015 23:01, Vadim Pakhnushev wrote:
>>>>>>> Well, not exactly... Previously it was 13*hash(a) + hash(b) and now
>>>>>>> it's 31*(31 + hash(a)) + hash(b).
>>>>>>> And apparently it improves the quality somehow. I did a test with
>>>>>>> 100^4 combinations and collision probability dropped by the factor
>>>>>>> of 3 from 0.065% to 0.022%.
>>>>>>> Not really impressive, but still, and it uses well-defined utility
>>>>>>> method.
>>>>>>> Yeah, I know it's not really a bug since you don't want to rely on
>>>>>>> the hashCode at all...
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vadim
>>>>>>>
>>>>>>> On 03.11.2015 22:35, Jim Graham wrote:
>>>>>>>> All this does is change the prime constant used to produce the hash
>>>>>>>> value.
>>>>>>>>
>>>>>>>> Objects.hash(a, b) uses 31*hash(a) + hash(b) instead of the
>>>>>>>> 13*hash(a) + hash(b) that the embedded implementation uses.
>>>>>>>>
>>>>>>>> I don't really think this is a bug. The fact that Integer objects
>>>>>>>> make it easy to reverse engineer and compute collisions of any
>>>>>>>> reasonable hash combination computation don't mean that the
>>>>>>>> technique has a bug, it just means that the submitter can read the
>>>>>>>> code and think of a counter-example.
>>>>>>>>
>>>>>>>> If there are practical problems being caused for some particular
>>>>>>>> and popular use case by the use of this particular constant "13",
>>>>>>>> then we need to understand those issues and come up with a more
>>>>>>>> comprehensive solution than to simply hand off to another mechanism
>>>>>>>> which uses the same procedure with a different prime constant...
>>>>>>>>
>>>>>>>> ...jim
>>>>>>>>
>>>>>>>> On 11/3/15 3:06 AM, Vadim Pakhnushev wrote:
>>>>>>>>> Hi Chien,
>>>>>>>>>
>>>>>>>>> Could you please review the fix:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8140503
>>>>>>>>> http://cr.openjdk.java.net/~vadim/8140503/webrev.00/
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vadim
>>>>>>>
>>>>>>
>>>>>
>>>>
>


More information about the openjfx-dev mailing list