RFR: 8334015: Add Support for UUID Version 7 (UUIDv7) defined in RFC 9562 [v16]

Tue Sep 30 13:08:52 UTC 2025

On Mon, 29 Sep 2025 15:08:14 GMT, Kieran Farrell <kfarrell at openjdk.org> wrote:

>> With the recent approval of UUIDv7 (https://datatracker.ietf.org/doc/rfc9562/), this PR aims to add a new static method UUID.timestampUUID() which constructs and returns a UUID in support of the new time generated UUID version. 
>> 
>> The specification requires embedding the current timestamp in milliseconds into the first bits 0–47. The version number in bits 48–51, bits 52–63 are available for sub-millisecond precision or for pseudorandom data. The variant is set in bits 64–65. The remaining bits 66–127 are free to use for more pseudorandom data or to employ a counter based approach for increased time percision (https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-version-7).
>> 
>> The choice of implementation comes down to balancing the sensitivity level of being able to distingush UUIDs created below <1ms apart with performance. A test simulating a high-concurrency environment with 4 threads generating 10000 UUIDv7 values in parallel to measure the collision rate of each implementation (the amount of times the time based portion of the UUID was not unique and entries could not distinguished by time) yeilded the following results for each implemtation:
>> 
>> 
>> - random-byte-only - 99.8%
>> - higher-precision - 3.5%
>> - counter-based - 0%
>> 
>> 
>> Performance tests show a decrease in performance as expected with the counter based implementation due to the introduction of synchronization:
>> 
>> - random-byte-only   143.487 ± 10.932  ns/op
>> - higher-precision      149.651 ±  8.438 ns/op
>> - counter-based         245.036 ±  2.943  ns/op
>> 
>> The best balance here might be to employ a higher-precision implementation as the large increase in time sensitivity comes at a very slight performance cost.
>
> Kieran Farrell has updated the pull request incrementally with one additional commit since the last revision:
> 
>   missing semicolon

Adding support for UUID v7 also includes **sorting correctly**, IMO.

This has always been incorrect in the JDK as I see it, but back in the days of UUIDv1 to v4 nobody really cared that much how a UUID would sort. Enter UUID v7 and sorting is now important to get right.

So what is the problem?  The existing `UUIID.compareTo()` method compares the two longs (nothing wrong with that), but those longs are SIGNED and what you need would be UNSIGNED comparison.

The problem was recognized years ago in [JDK-7025832](https://bugs.openjdk.org/browse/JDK-7025832) but was rejected to change it due to concerns over backward compatibility.

The problem - when UUID v7 is introduced - is that it becomes apparent that the JDK does not sort the UUID in the same way as the database does or indeed any other language. Previously, this was less of a concern because there was less of reason to sort UUIDs.

To be specific, what you expect - and what both the old RFC-4122 spec and the newer RFC-9562 states in their own words - is that UUIDs should be lexicographically sorted, i.e. as if by comparing two arrays of bytes (len=16) where each byte is a value 0-255 (
as opposed to a value -128 to 127). An implementation could be:

public int compareToLexi(UUID val) {
    int mostSigBits = Long.compareUnsigned(this.mostSigBits, val.mostSigBits);
    return mostSigBits != 0 ? mostSigBits : Long.compareUnsigned(this.leastSigBits, val.leastSigBits);
}

This would be exactly equal to a method which compares byte arrays as described above.

I do not suggest to change the existing `compareTo()`  logic. But at the very least this legacy problem should be highlighted somewhere in the Javadoc. Addressing this, at least with a comment, would be part of a proper UUIDv7 implementation.

My 2c.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25303#issuecomment-3352041251