UUID.compareTo broken?

Fri Apr 11 06:54:45 UTC 2014

On 04/10/2014 08:21 PM, Steven Schlansker wrote:
> On Apr 9, 2014, at 2:21 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
>
>> On Apr 8, 2014, at 9:15 PM, Mike Duigou <mike.duigou at oracle.com> wrote:
>>
>> That seems a terribly broken usage of UUID for 128 bit numbers or a pair of signed 64 bit numbers :-)
>>
>> Part of me thinks we should not be supporting such broken usage. Might be worth getting some usage data from grepcode or maven central.
>>
> I’m guilty of doing this at a point in the past.  We used it to intermix multiple sources of data - a few that used ‘long’ IDs and a few that used ‘real’ UUIDs.  We took prefixes that had flags never generated by UUID libraries (the “reserved for compatibility with Microsoft” ones IIRC) and slapped those with a long to make pseudo-UUIDs.  That way everything was a UUID and we were guaranteed to never see collisions within our dataset.
>
> We never expected them to give sensible meanings to the various getters e.g. timestamp() but we did expect the UUID class to work generally.
> We never relied on any particular ordering.
>
>>>>> We could provide static methods to return appropriate comparators for version 1 and version 2 UUIDs--I've actually written them before for other projects.
>>>>>
>>>> It would be nice to just have one compareTo that does the right thing based of the UUID types being compared.
>>> If it were up to me only the time and DCE UUIDs would be comparable, there's no ordering relationship for other versions.
>> I think it is fine for random UUIDs to be comparable with each other.
>>
>>
>>> The comparators I've considered adding would only allow comparisons within the same version.
>> Yes, although for a general comparator the primary sort key could be the version value.
> +1 to a “sort first by version then version-specific ordering” — it gives you the best of both worlds IMO.
>
> I think the natural ordering for UUIDs must be able to create an ordering over all possible UUIDs, no matter the version or even if it is valid or not.  If you read UUIDs from an external source you have no way to understand what version they are or aren’t.  Imagine a process that loads data from a database or text file into a TreeMap - it would be awful if a change in the UUID generation scheme on the far side caused the Comparator you used to no longer function.
>

Hi,

Code that relies on UUIDs to have a "natural" order, say 
"chronological", is relying on being given the particular type of UUIDs 
that have the time built-in. When given mixed-type or non-time-based 
UUIDs, such code will break. The purpose of UUID schemes is generating 
globally unique identifiers, not interpreting them. Various types exist 
just because it's practical to generate UUIDs differently in different 
contexts. Programs should not try to extract business information from 
UUIDs (except probably for hunting down virus authors: 
http://en.wikipedia.org/wiki/Melissa_%28computer_virus%29). So I think a 
good general-purpose compareTo() method should try to discourage such 
usage for example by reversing the bits of the UUID value before 
comparing. This would also make algorithms that order UUIDs for 
facilitating quick access (B-Tree, Red-Black-Tree, ...) happier.

Try the following benchmark (with options: -wi 5 -i 5 -t 1 -f 1):

public class TreeMapInsertionBench {

     private static final int N = 100000;

     @GenerateMicroBenchmark()
     @BenchmarkMode(Mode.AverageTime)
     public static void testTreeMapSequentialInsert(BlackHole bh) {
         Map<Integer, Boolean> map = new TreeMap<>();
         for (int i = 0; i < N; i++) {
             map.put(new Integer(i), Boolean.TRUE);
         }
         bh.consume(map);
     }

     @GenerateMicroBenchmark()
     @BenchmarkMode(Mode.AverageTime)
     public static void testTreeMapReversedBitsInsert(BlackHole bh) {
         Map<Integer, Boolean> map = new TreeMap<>();
         for (int i = 0; i < N; i++) {
             map.put(new Integer(Integer.reverse(i)), Boolean.TRUE);
         }
         bh.consume(map);
     }
}

...and the results will be:

Benchmark Mode   Samples         Mean   Mean error    Units
o.s.TreeMapInsertionBench.testTreeMapReversedBitsInsert avgt         
5        8.208        0.100    ms/op
o.s.TreeMapInsertionBench.testTreeMapSequentialInsert avgt         
5       10.498        1.164    ms/op

That's just an opinion.

Regards, Peter