Optimizing UUID#fromString(String)
Jon Chambers
jon.chambers at gmail.com
Sat Jan 27 16:05:09 UTC 2018
Hello!
I've recently had reason to take a look at performance around parsing and
stringifying UUIDs. In exploring the space, I identified some opportunities
to optimize the implementation of UUID#fromString as it currently exists (
http://hg.openjdk.java.net/jdk/jdk/file/fd237da7a113/src/java.base/share/classes/java/util/UUID.java#l196
).
Because UUID strings are of a known structure and length (32 hexadecimal
digits and four dashes) and because UUIDs are exactly 128 bits in length,
we know exactly how each character in a UUID string maps to bits in the
parsed UUID. We always know, for example, that the first character in a
UUID string maps to the four highest bits in the UUID, the second character
maps to the four bits below that, and so on.
With that knowledge, we can cut out a lot of the generality and
bounds-checking we'd normally expect of a string-to-number parser. I've
built an implementation with that in mind:
https://github.com/jchambers/fast-uuid/blob/master/src/main/java/com/eatthepath/uuid/FastUUID.java#L108.
In benchmarks (
https://github.com/jchambers/fast-uuid/blob/master/benchmark/src/main/java/com/eatthepath/UUIDBenchmark.java#L55-L63),
this implementation is about six times faster than the current JDK
implementation (9.0.4+11) and 14 times faster than the implementation in
1.8.
The experimental implementation is more strict about UUID format (the
current JDK implementation allows for variable-length blocks of hex digits
between dashes while the experimental one doesn't), and I'll defer to you
folks as to whether its handling of technically-malformed UUID strings is
acceptable. As discussed via Twitter (
https://twitter.com/cl4es/status/956308599277486080), we might consider
using the fixed-length parsing approach if we know the UUID string is
exactly 36 characters long and fall back to the looser parser otherwise. I
also recognize that this is partially reinventing the wheel when it comes
to parsing hex strings, and the tradeoff between consistency and
performance is certainly worthy of consideration.
Regardless, I wanted to call this optimization opportunity to your
attention, and would be happy to offer a proper patch if this seems like a
worthwhile change.
Cheers, and thank you for your consideration!
-Jon
More information about the core-libs-dev
mailing list