RFR: 8308804: Improve UUID.randomUUID performance with bulk/scalable PRNG access
Aleksey Shipilev
shade at openjdk.org
Thu May 25 11:40:28 UTC 2023
UUID is the very important class that is used to track identities of objects in large scale systems. On some of our systems, `UUID.randomUUID` takes >1% of total CPU time, and is frequently a scalability bottleneck due to `SecureRandom` synchronization.
The major issue with UUID code itself is that it reads from the single `SecureRandom` instance by 16 bytes. So the heavily contended `SecureRandom` is bashed with very small requests. This also has a chilling effect on other users of `SecureRandom`, when there is a heavy UUID generation traffic.
We can improve this by doing the bulk reads from the backing SecureRandom and possibly striping the reads across many instances of it.
Benchmark Mode Cnt Score Error Units
### AArch64 (m6g.4xlarge, Graviton, 16 cores)
# Before
UUIDRandomBench.single thrpt 15 3.545 ± 0.058 ops/us
UUIDRandomBench.max thrpt 15 1.832 ± 0.059 ops/us ; negative scaling
# After
UUIDRandomBench.single thrpt 15 4.421 ± 0.047 ops/us
UUIDRandomBench.max thrpt 15 6.658 ± 0.092 ops/us ; positive scaling, ~1.5x
### x86_64 (c6.8xlarge, Xeon, 18 cores)
# Before
UUIDRandomBench.single thrpt 15 2.710 ± 0.038 ops/us
UUIDRandomBench.max thrpt 15 1.880 ± 0.029 ops/us ; negative scaling
# After
Benchmark Mode Cnt Score Error Units
UUIDRandomBench.single thrpt 15 3.099 ± 0.022 ops/us
UUIDRandomBench.max thrpt 15 3.555 ± 0.062 ops/us ; positive scaling, ~1.2x
Note that there is still a scalability bottleneck in current default random (`NativePRNG`), because it synchronizes over a singleton instance. This PR adds a system property to select the implementation, and there we can clearly see the benefit:
Benchmark Mode Cnt Score Error Units
### x86_64 (c6.8xlarge, Xeon, 18 cores)
# Before, hacked `new SecureRandom()` to `SecureRandom.getInstance("SHA1PRNG")`
UUIDRandomBench.single thrpt 15 3.661 ± 0.008 ops/us
UUIDRandomBench.max thrpt 15 2.400 ± 0.031 ops/us ; faster than NativePRNG, but still negative scalability
# After, -Djava.util.UUID.prngName=SHA1PRNG
UUIDRandomBench.single thrpt 15 3.522 ± 0.009 ops/us
UUIDRandomBench.max thrpt 15 50.506 ± 1.734 ops/us ; positive scaling, ~14x
Other scalable random number providers would improve the similar way. Note that just changing to`SHA1PRNG` right now would not help much, because it would still be very contended. This PR does not change the default PRNG provider, that would need a larger discussion. It only provides the means to select another one.
Since the buffers are allocated on-demand and stay permanently, there are allocation rate improvements: generating an UUID now takes 80 bytes per op instead of 120 bytes per op. The buffer cache also takes memory. Back-envelope: for large 192-core machine that takes UUIDs in all threads, the default settings add up to 768K of additional memory.
Additional testing:
- [x] Updated tests from #14134
- [ ] Linux AArch64 fastdebug `tier1 tier2 tier3`
The new options are not strictly speaking required for this work to be useful, but it would be convenient to have them around for field tuning and diagnostics.
-------------
Commit messages:
- More touchups
- Comment updates
- Runtime options and touchups
- Add benchmark
- Initial work
Changes: https://git.openjdk.org/jdk/pull/14135/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14135&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8308804
Stats: 237 lines in 2 files changed: 221 ins; 9 del; 7 mod
Patch: https://git.openjdk.org/jdk/pull/14135.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/14135/head:pull/14135
PR: https://git.openjdk.org/jdk/pull/14135
More information about the core-libs-dev
mailing list