RFR: 8308804: Improve UUID.randomUUID performance with bulk/scalable PRNG access [v7]
Brett Okken
duke at openjdk.org
Thu Jun 1 13:02:11 UTC 2023
On Thu, 1 Jun 2023 09:37:29 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> UUID is the very important class that is used to track identities of objects in large scale systems. On some of our systems, `UUID.randomUUID` takes >1% of total CPU time, and is frequently a scalability bottleneck due to `SecureRandom` synchronization.
>>
>> The major issue with UUID code itself is that it reads from the single `SecureRandom` instance by 16 bytes. So the heavily contended `SecureRandom` is bashed with very small requests. This also has a chilling effect on other users of `SecureRandom`, when there is a heavy UUID generation traffic.
>>
>> We can improve this by doing the bulk reads from the backing SecureRandom and possibly striping the reads across many instances of it.
>>
>>
>> Benchmark Mode Cnt Score Error Units
>>
>> ### AArch64 (m6g.4xlarge, Graviton, 16 cores)
>>
>> # Before
>> UUIDRandomBench.single thrpt 15 3.545 ± 0.058 ops/us
>> UUIDRandomBench.max thrpt 15 1.832 ± 0.059 ops/us ; negative scaling
>>
>> # After
>> UUIDRandomBench.single thrpt 15 4.823 ± 0.023 ops/us
>> UUIDRandomBench.max thrpt 15 6.561 ± 0.054 ops/us ; positive scaling, ~1.5x
>>
>> ### x86_64 (c6.8xlarge, Xeon, 18 cores)
>>
>> # Before
>> UUIDRandomBench.single thrpt 15 2.710 ± 0.038 ops/us
>> UUIDRandomBench.max thrpt 15 1.880 ± 0.029 ops/us ; negative scaling
>>
>> # After
>> Benchmark Mode Cnt Score Error Units
>> UUIDRandomBench.single thrpt 15 3.109 ± 0.026 ops/us
>> UUIDRandomBench.max thrpt 15 3.561 ± 0.071 ops/us ; positive scaling, ~1.2x
>>
>>
>> Note that there is still a scalability bottleneck in current default random (`NativePRNG`), because it synchronizes over a singleton instance for SHA1 mixer, then the engine itself, etc. -- it is quite a whack-a-mole to figure out the synchronization story there. The scalability fix in current default `SecureRandom` would be much more intrusive and risky, since it would change a core crypto class with unknown bug fanout.
>>
>> Using the bulk reads even when the underlying PRNG is heavily synchronized is still a win. A more scalable PRNG would benefit from this as well. This PR adds a system property to select the PRNG implementation, and there we can clearly see the benefit with more scalable PRNG sources:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>>
>> ### x86_64 (c6.8xlarge, Xeon, 18 cores)
>>
>> # Before, hacked `new SecureRandom()` to `SecureRandom.getInstance("SHA1PRNG")`
>> UUIDRandomBench.single thrpt ...
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits:
>
> - Revert test changes
> - Merge branch 'master' into JDK-8308804-uuid-buffers
> - Simplify clinit
> - Remove the properties
> - Swap lsb/msb
> - Fine-tune exceptions
> - Handle privileged properties
> - Use ByteArray to convert. Do version/variant preparations straight on locals. Move init out of optimistic lock section.
> - More touchups
> - Comment updates
> - ... and 3 more: https://git.openjdk.org/jdk/compare/4460429d...fd7eaa1a
src/java.base/share/classes/java/util/UUID.java line 234:
> 232: long msb = ByteArray.getLong(buf, 0);
> 233: long lsb = ByteArray.getLong(buf, 8);
> 234: return fromRandom(msb, lsb);
is there any value in moving this outside of the critical section?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14135#discussion_r1213117598
More information about the core-libs-dev
mailing list