Optimizing UUID#fromString(String)
Claes Redestad
claes.redestad at oracle.com
Mon Jan 29 11:30:05 UTC 2018
Hi,
I've filed two RFEs:
- https://bugs.openjdk.java.net/browse/JDK-8196331 for providing a
fast-path for ASCII/Latin1 input to Character.digit. Patch out
for review
- https://bugs.openjdk.java.net/browse/JDK-8196334 for tracking
the optimizations specific to UUID proposed here
Jon, if you want to provide a patch I'd be happy to sponsor. I have
a few ideas on how to retrofit existing behavior that we can discuss
offline. Also, if you haven't filed an OCA then this might be a good
time to do so[1]
Thanks!
/Claes
[1] http://openjdk.java.net/contribute/
On 2018-01-29 00:19, Claes Redestad wrote:
> Hi Jon,
>
> On 2018-01-27 17:05, Jon Chambers wrote:
>> Regardless, I wanted to call this optimization opportunity to your
>> attention, and would be happy to offer a proper patch if this seems
>> like a
>> worthwhile change.
>
> this does looks promising - and at least points out there might be
> some performance issues lurking here!
>
> Found another compatibility issue, however: The current
> UUID.fromString allow use of any unicode digits parseable into
> digits. Say, Arabic digits (U+0660 - U+0669):
>
> jshell> UUID.fromString("\u0669-\u0669-\u0669-\u0669-\u0669");
> $2 ==> 00000009-0009-0009-0009-000000000009
>
> That you're getting some improvement from avoiding
> Character.digit(..) stands out in a way, so I wondered if there's
> something going on here... and it does seem that this method could
> be optimized by providing a fast-path in the CharacterDataLatin1
> plane...
>
> I experimented with a few solutions, and the best I could find was
> a 256 element byte[] lookup table for digits in CharacterDataLatin1
> (spanning the entire 8-bit space of CharacterDataLatin1 allows us
> to avoid a branch):
>
> http://cr.openjdk.java.net/~redestad/scratch/char.digit.00/
>
> This speeds up your micro testing java.util.UUID
> (UUIDBenchmark.benchmarkUUIDFromString) by ~1.6x. The
> improvement is there on all counters when looking at -prof perfnorm
> (less loads, stores, branches, instructions, better CPI etc)[1], so seems
> like a pretty straightforward thing to consider independently.
>
> Your FastUUID implementation is still quite a bit faster, though, but
> it'd be good to dig around for a bit to see if there are other more
> general improvements like this to be had..
>
> Thanks!
>
> /Claes
>
> [1]
> Baseline:
> Benchmark Mode
> Cnt Score Error Units
> UUIDBenchmark.benchmarkUUIDFromString avgt
> 4 0.505 ± 0.066 us/op
> UUIDBenchmark.benchmarkUUIDFromString:CPI avgt 1.024 #/op
> UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-load-misses
> avgt 3.000 #/op
> UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-loads avgt
> 211.128 #/op
> UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-stores avgt
> 34.894 #/op
> UUIDBenchmark.benchmarkUUIDFromString:L1-icache-load-misses
> avgt 0.067 #/op
> UUIDBenchmark.benchmarkUUIDFromString:LLC-load-misses avgt
> 0.103 #/op
> UUIDBenchmark.benchmarkUUIDFromString:LLC-loads avgt 0.812 #/op
> UUIDBenchmark.benchmarkUUIDFromString:LLC-store-misses avgt
> 0.004 #/op
> UUIDBenchmark.benchmarkUUIDFromString:LLC-stores avgt 0.054
> #/op
> UUIDBenchmark.benchmarkUUIDFromString:branch-misses avgt
> 13.269 #/op
> UUIDBenchmark.benchmarkUUIDFromString:branches avgt 301.931
> #/op
> UUIDBenchmark.benchmarkUUIDFromString:cycles avgt 1585.057 #/op
> UUIDBenchmark.benchmarkUUIDFromString:dTLB-load-misses avgt
> 0.005 #/op
> UUIDBenchmark.benchmarkUUIDFromString:dTLB-loads avgt
> 208.788 #/op
> UUIDBenchmark.benchmarkUUIDFromString:dTLB-store-misses avgt ≈
> 10⁻⁴ #/op
> UUIDBenchmark.benchmarkUUIDFromString:dTLB-stores avgt
> 30.540 #/op
> UUIDBenchmark.benchmarkUUIDFromString:iTLB-load-misses avgt
> 0.002 #/op
> UUIDBenchmark.benchmarkUUIDFromString:iTLB-loads avgt 0.006
> #/op
> UUIDBenchmark.benchmarkUUIDFromString:instructions avgt
> 1547.568 #/op
>
> Results with
> http://cr.openjdk.java.net/~redestad/scratch/char.digit.00/ -- :
> Benchmark Mode
> Cnt Score Error Units
> UUIDBenchmark.benchmarkUUIDFromString avgt
> 4 0.310 ± 0.022 us/op
> UUIDBenchmark.benchmarkUUIDFromString:CPI avgt 0.832 #/op
> UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-load-misses
> avgt 2.026 #/op
> UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-loads avgt
> 208.227 #/op
> UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-stores avgt
> 30.673 #/op
> UUIDBenchmark.benchmarkUUIDFromString:L1-icache-load-misses
> avgt 0.052 #/op
> UUIDBenchmark.benchmarkUUIDFromString:LLC-load-misses avgt
> 0.042 #/op
> UUIDBenchmark.benchmarkUUIDFromString:LLC-loads avgt 1.027 #/op
> UUIDBenchmark.benchmarkUUIDFromString:LLC-store-misses avgt
> 0.007 #/op
> UUIDBenchmark.benchmarkUUIDFromString:LLC-stores avgt 0.044
> #/op
> UUIDBenchmark.benchmarkUUIDFromString:branch-misses avgt
> 0.040 #/op
> UUIDBenchmark.benchmarkUUIDFromString:branches avgt 255.547
> #/op
> UUIDBenchmark.benchmarkUUIDFromString:cycles avgt 981.466 #/op
> UUIDBenchmark.benchmarkUUIDFromString:dTLB-load-misses avgt
> 0.003 #/op
> UUIDBenchmark.benchmarkUUIDFromString:dTLB-loads avgt
> 213.797 #/op
> UUIDBenchmark.benchmarkUUIDFromString:dTLB-store-misses avgt ≈
> 10⁻⁴ #/op
> UUIDBenchmark.benchmarkUUIDFromString:dTLB-stores avgt
> 30.888 #/op
> UUIDBenchmark.benchmarkUUIDFromString:iTLB-load-misses avgt
> 0.002 #/op
> UUIDBenchmark.benchmarkUUIDFromString:iTLB-loads avgt 0.002
> #/op
> UUIDBenchmark.benchmarkUUIDFromString:instructions avgt
> 1180.129 #/op
More information about the core-libs-dev
mailing list