[9] Review request for 8140503: JavaFX "Pair" Hash Code Collisions

Vadim Pakhnushev vadim.pakhnushev at oracle.com
Mon Nov 23 14:11:06 UTC 2015


Guys,
Could you please review the updated fix?

https://bugs.openjdk.java.net/browse/JDK-8140503
http://cr.openjdk.java.net/~vadim/8140503/webrev.01/

Thanks,
Vadim

On 04.11.2015 1:43, Kevin Rushforth wrote:
> Inlining it seems like a fine way to go to me, too. As another point 
> of reference, we use 7/31 in other places in JavaFX.
>
> -- Kevin
>
>
> Jim Graham wrote:
>> Absolutely, this report uncovered an unrelated NPE bug which is 
>> helpful, and apparently different constants might affect the 
>> collision probabilities (which were already very unlikely to begin 
>> with), but the bug report was definitely filed primarily as a 
>> misunderstanding.
>>
>> One thing to note - perhaps small numbers are more likely to be 
>> encountered so the larger the prime, the less likelihood of getting a 
>> collision. I wouldn't recommend using too large of a prime since the 
>> probability is already so low even with 13, and larger primes may 
>> increase the size of the byte code and compiled code since larger 
>> constants take more complicated instructions to deal with. Josh Bloch 
>> recommended a base of 17 and a multiplier of 37 in Chapter 3 of 
>> Effective Java, but he also admitted that the numbers were arbitrary 
>> except for the multiplier needing to be an odd prime.
>>
>> One reason 13 might be too low depends on the probability that Pair 
>> is used in hash cases with lots of very small numbers. Adding in a 
>> base prime and using a little bit larger prime multiplier quickly 
>> turns lots of small numbers into massively distributed hashes - even 
>> with 31 or 17/37.
>>
>> I kind of feel like using the existing utility is overkill for just 2 
>> objects since it requires constructing an array object to use it. I'd 
>> be just as happy (perhaps even more so) if we just upgraded the code 
>> to check key for null and use a little larger prime, but keep it 
>> inlined...
>>
>> ...jim
>>
>> On 11/3/15 1:43 PM, Alexander Kouznetsov wrote:
>>> Moreover, the following two sentences:
>>>
>>> "However, this is an incorrect way to compute a hash code of two 
>>> values."
>>> "This can lead to hard-to-find bugs anywhere that instances of Pair are
>>> used in a data structure like a HashSet or HashTable."
>>>
>>> seem to indicate misunderstanding of what hashcode is and how it is to
>>> be used.
>>>
>>> Best regards,
>>> Alexander Kouznetsov
>>> (408) 276-0387
>>>
>>> On 3 ноя 2015 13:42, Alexander Kouznetsov wrote:
>>>> After the fix, you should expect another incident report of
>>>>
>>>> Objects.hash(1, 0) == Objects.hash(0, 31)
>>>>
>>>> always true :-)
>>>>
>>>> I'd rather file another bug on key == null causing NPE and closing
>>>> this one as incomplete or not an issue.
>>>>
>>>> Best regards,
>>>> Alexander Kouznetsov
>>>> (408) 276-0387
>>>>
>>>> On 3 ноя 2015 12:07, Vadim Pakhnushev wrote:
>>>>> Hmm, yeah, the actual difference is in the prime number only (that is
>>>>> changing the algorithm only doesn't improve anything), so the only
>>>>> remaining reason to fix this is that Objects.hash guards against null
>>>>> values (and I forgot to mention it in the review).
>>>>> The key in Pair could actually be null and in this case hashCode will
>>>>> throw NPE.
>>>>>
>>>>> Vadim
>>>>>
>>>>> On 03.11.2015 23:01, Vadim Pakhnushev wrote:
>>>>>> Well, not exactly... Previously it was 13*hash(a) + hash(b) and now
>>>>>> it's 31*(31 + hash(a)) + hash(b).
>>>>>> And apparently it improves the quality somehow. I did a test with
>>>>>> 100^4 combinations and collision probability dropped by the factor
>>>>>> of 3 from 0.065% to 0.022%.
>>>>>> Not really impressive, but still, and it uses well-defined utility
>>>>>> method.
>>>>>> Yeah, I know it's not really a bug since you don't want to rely on
>>>>>> the hashCode at all...
>>>>>>
>>>>>> Thanks,
>>>>>> Vadim
>>>>>>
>>>>>> On 03.11.2015 22:35, Jim Graham wrote:
>>>>>>> All this does is change the prime constant used to produce the hash
>>>>>>> value.
>>>>>>>
>>>>>>> Objects.hash(a, b) uses 31*hash(a) + hash(b) instead of the
>>>>>>> 13*hash(a) + hash(b) that the embedded implementation uses.
>>>>>>>
>>>>>>> I don't really think this is a bug. The fact that Integer objects
>>>>>>> make it easy to reverse engineer and compute collisions of any
>>>>>>> reasonable hash combination computation don't mean that the
>>>>>>> technique has a bug, it just means that the submitter can read the
>>>>>>> code and think of a counter-example.
>>>>>>>
>>>>>>> If there are practical problems being caused for some particular
>>>>>>> and popular use case by the use of this particular constant "13",
>>>>>>> then we need to understand those issues and come up with a more
>>>>>>> comprehensive solution than to simply hand off to another mechanism
>>>>>>> which uses the same procedure with a different prime constant...
>>>>>>>
>>>>>>> ...jim
>>>>>>>
>>>>>>> On 11/3/15 3:06 AM, Vadim Pakhnushev wrote:
>>>>>>>> Hi Chien,
>>>>>>>>
>>>>>>>> Could you please review the fix:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8140503
>>>>>>>> http://cr.openjdk.java.net/~vadim/8140503/webrev.00/
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vadim
>>>>>>
>>>>>
>>>>
>>>



More information about the openjfx-dev mailing list