Optimizing UUID#fromString(String)

Sat Jan 27 20:24:27 UTC 2018

Hi Jon,

Promising based on the cited performance improvements.

Can you review a similar thread and improvements from 2013 to see if 
there are secondary
considerations that have already been raised.
http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-January/013494.html

Please include the patch in the body of the email or attach it to the 
email (with a .txt extension)
to meet OpenJDK IP requirements.

Compatibility requirements would rule out tightening the acceptable 
format so a fallback
to a fully compatible parse would be needed.

Thanks, Roger

On 1/27/2018 11:05 AM, Jon Chambers wrote:
> Hello!
>
> I've recently had reason to take a look at performance around parsing and
> stringifying UUIDs. In exploring the space, I identified some opportunities
> to optimize the implementation of UUID#fromString as it currently exists (
> http://hg.openjdk.java.net/jdk/jdk/file/fd237da7a113/src/java.base/share/classes/java/util/UUID.java#l196
> ).
>
> Because UUID strings are of a known structure and length (32 hexadecimal
> digits and four dashes) and because UUIDs are exactly 128 bits in length,
> we know exactly how each character in a UUID string maps to bits in the
> parsed UUID. We always know, for example, that the first character in a
> UUID string maps to the four highest bits in the UUID, the second character
> maps to the four bits below that, and so on.
>
> With that knowledge, we can cut out a lot of the generality and
> bounds-checking we'd normally expect of a string-to-number parser. I've
> built an implementation with that in mind:
> https://github.com/jchambers/fast-uuid/blob/master/src/main/java/com/eatthepath/uuid/FastUUID.java#L108.
> In benchmarks (
> https://github.com/jchambers/fast-uuid/blob/master/benchmark/src/main/java/com/eatthepath/UUIDBenchmark.java#L55-L63),
> this implementation is about six times faster than the current JDK
> implementation (9.0.4+11) and 14 times faster than the implementation in
> 1.8.
>
> The experimental implementation is more strict about UUID format (the
> current JDK implementation allows for variable-length blocks of hex digits
> between dashes while the experimental one doesn't), and I'll defer to you
> folks as to whether its handling of technically-malformed UUID strings is
> acceptable. As discussed via Twitter (
> https://twitter.com/cl4es/status/956308599277486080), we might consider
> using the fixed-length parsing approach if we know the UUID string is
> exactly 36 characters long and fall back to the looser parser otherwise. I
> also recognize that this is partially reinventing the wheel when it comes
> to parsing hex strings, and the tradeoff between consistency and
> performance is certainly worthy of consideration.
>
> Regardless, I wanted to call this optimization opportunity to your
> attention, and would be happy to offer a proper patch if this seems like a
> worthwhile change.
>
> Cheers, and thank you for your consideration!
>
> -Jon