Improving performance and reducing object allocations of java.util.UUID to/from string

Sun Jan 20 00:54:03 UTC 2013

Thank you for the feedback!

On Jan 10, 2013, at 4:50 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:

> On 01/09/2013 09:51 PM, Steven Schlansker wrote:
>> Hello again,
>> 
>> I sent this email a week ago and have received no replies.  Is there
>> any step I have missed necessary to contribute to the JDK libraries?
> 
> I think the crucial part is OCA, as per:
> http://openjdk.java.net/contribute/

I have now taken care of this :-)
The OCA is under the company name "Ness Computing" and was emailed yesterday.

> 
>> I am very interested in making your lives easier, so please let me
>> know if I am in the wrong place or are otherwise misguided.
> 
> You are at the correct place.
> 
> On the first glance, the change looks good for the start. A few comments
> though:
> a) Do you need the masks before or-ing with most/leastSigBits?

So, they seem to exist to satisfy a few Apache Harmony test cases:
http://svn.apache.org/viewvc/harmony/enhanced/java/branches/java6/classlib/modules/luni/src/test/api/common/org/apache/harmony/luni/tests/java/util/UUIDTest.java?revision=929252

For example,
uuid = UUID.fromString("123456789-0-0-0-0");
Assert.assertEquals(0x2345678900000000L, uuid.getMostSignificantBits());
Assert.assertEquals(0x0L, uuid.getLeastSignificantBits());

Without the masking, the values can smear together.  It would be possible to avoid converting extra characters, but I believe that the mask is probably cheaper than an if to check for the presence of extra characters.

Were I writing the code from scratch I would throw an IllegalArgumentException on such a UUID but I assume that is not acceptable for compatibility reasons.

> b) Is there a more standard (and still performant) way to do the hex
> conversion? Look around JDK source, I think there should be something
> else needing the same kind of conversion.

I didn't see any candidates.  I thought of at least the following places to look:  Long, String, Formatter, BitSet, and did some general browsing around java.lang and java.util with no luck.  The Long methods are in java.lang and so could not be used without making them public.  I assume that having private helper methods for my class is preferable to introducing new public methods.  It's entirely possible I missed something, but I'm not seeing it.

> c) I'd go for making utility methods a bit more generic. For one, I
> would rather make decodeHex(String str, int start, int end), and
> encodeHex(char[] dest, int offset, int value).

I can do this if you feel strongly, but I believe that this compromises the readability of the code.  For example, right now you can see:
        long mostSigBits = decode(str, dashPos, 0);
        mostSigBits <<= 16;
        mostSigBits |= decode(str, dashPos, 1);
…

whereas to make it use a "more general" helper it would have to look like this:
        long mostSigBits = decode(str, dashPos[0]+1, dashPos[1]);
        mostSigBits <<= 16;
        mostSigBits |= decode(str, dashPos[1], dashPos[2]+1);

This is an increase from one "magic value" per invocation to three.

It's not a huge difference either way, but I'm not sure that your suggestion is better?
I suppose I could make it non-private helper method available to others, although I am not sure if this is considered a good practice in the JDK code.  Let me know if you think I should do this.

> 
> Microbenchmark glitches:
> a) % is the integer division, and at the scale of the operations you
> are measuring, it could incur significant costs; the usual practice is
> having power-of-2 size, and then (i % size) -> (i & (size - 1)).
I fixed this, but it made no meaningful change to the numbers.
> b) Not sure if you want to stick with random UUIDs for comparisons.
> While the law of large numbers is on your side, 1000 random UUIDs might
> be not random enough.
I don't believe that this has any bearing on the result.  I've run the trials many, many times and the number rarely changes by even 1% from the values that I posted beforehand.  If you think this is an actual problem, I can come up with an alternate methodology.  I don't think it's a problem.

Thanks again for the review.  Please let me know if you agree or believe I am mistaken in my analysis above.