RFR: 8308803: Improve java/util/UUID/UUIDTest.java [v2]
Aleksey Shipilev
shade at openjdk.org
Wed May 31 14:07:56 UTC 2023
On Wed, 31 May 2023 13:11:08 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
>> This is a non-practical concern, IMO. By spec, `UUID.randomUUID` is generated from the cryptographically secure random, with >120 bits of randomness, so the collision is extremely unlikely. Collision math involves birthday paradox, but Wikipedia article on UUIDs fortunately gives us the approximated solutions already: https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions
>>
>> Quote: "Thus, the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion."
>>
>> In other words, finding a collision in this test with 1M UUIDs points to the implementation issue, not a test bug, with a very high probability. In yet another words, if a unit test with 1M UUIDs is able to find a collision, then this is a strong signal that many production systems that assume extremely low collision probability are up for subtle misbehavior.
>
> My point was that its probably not practical to test (more than once).
> If it fails, it will be considered just as you propose and disregarded and in the meantime consumes test cycles in each of the test contexts. Either provide more information about the conditions under which it failed or remove it.
Sorry, I have trouble following the argument here.
Let me re-iterate: the probability for bona-fide collision is so vanishingly low, the test failure here is a strong signal that something is wrong with the implementation. We can put more guidance in the test comments there, like "This is extremely unlikely to happen. If you see this failing, this highly likely points to the implementation bug, rather than the odd chance."
What I expect to happen when that test fails, is that it prompts the investigation with multiple stress tests to get a better estimate of the actual collision rate. Assuming we actually see a collision, it is likely to be caused by much higher probability error somewhere in the code. In fact, if this test is _actually noisy_ to the point it becomes a testing problem, this already gives us the signal that actual collision rate is many orders of magnitude higher than math predicts, and this becomes an even _stronger_ signal that random UUIDs are seriously broken for practical use.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14134#discussion_r1211776452
More information about the core-libs-dev
mailing list