RFR: 8308804: Improve UUID.randomUUID performance with bulk/scalable PRNG access
Aleksey Shipilev
shade at openjdk.org
Fri May 26 08:42:57 UTC 2023
On Fri, 26 May 2023 00:13:57 GMT, Andrei Pangin <apangin at openjdk.org> wrote:
>> UUID is the very important class that is used to track identities of objects in large scale systems. On some of our systems, `UUID.randomUUID` takes >1% of total CPU time, and is frequently a scalability bottleneck due to `SecureRandom` synchronization.
>>
>> The major issue with UUID code itself is that it reads from the single `SecureRandom` instance by 16 bytes. So the heavily contended `SecureRandom` is bashed with very small requests. This also has a chilling effect on other users of `SecureRandom`, when there is a heavy UUID generation traffic.
>>
>> We can improve this by doing the bulk reads from the backing SecureRandom and possibly striping the reads across many instances of it.
>>
>>
>> Benchmark Mode Cnt Score Error Units
>>
>> ### AArch64 (m6g.4xlarge, Graviton, 16 cores)
>>
>> # Before
>> UUIDRandomBench.single thrpt 15 3.545 ± 0.058 ops/us
>> UUIDRandomBench.max thrpt 15 1.832 ± 0.059 ops/us ; negative scaling
>>
>> # After
>> UUIDRandomBench.single thrpt 15 4.421 ± 0.047 ops/us
>> UUIDRandomBench.max thrpt 15 6.658 ± 0.092 ops/us ; positive scaling, ~1.5x
>>
>> ### x86_64 (c6.8xlarge, Xeon, 18 cores)
>>
>> # Before
>> UUIDRandomBench.single thrpt 15 2.710 ± 0.038 ops/us
>> UUIDRandomBench.max thrpt 15 1.880 ± 0.029 ops/us ; negative scaling
>>
>> # After
>> Benchmark Mode Cnt Score Error Units
>> UUIDRandomBench.single thrpt 15 3.099 ± 0.022 ops/us
>> UUIDRandomBench.max thrpt 15 3.555 ± 0.062 ops/us ; positive scaling, ~1.2x
>>
>>
>> Note that there is still a scalability bottleneck in current default random (`NativePRNG`), because it synchronizes over a singleton instance for SHA1 mixer, then the engine itself, etc. -- it is quite a whack-a-mole to figure out the synchronization story there. The scalability fix in current default `SecureRandom` would be much more intrusive and risky, since it would change a core crypto class with unknown bug fanout.
>>
>> Using the bulk reads even when the underlying PRNG is heavily synchronized is still a win. A more scalable PRNG would benefit from this as well. This PR adds a system property to select the PRNG implementation, and there we can clearly see the benefit with more scalable PRNG sources:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>>
>> ### x86_64 (c6.8xlarge, Xeon, 18 cores)
>>
>> # Before, hacked `new SecureRandom()` to `SecureRandom.getInstance("SHA1PRNG")`
>> UUIDRandomBench.single thrpt ...
>
> src/java.base/share/classes/java/util/UUID.java line 224:
>
>> 222: // Try to pull the UUID from the current buffer at current position.
>> 223: if (stamp != 0) {
>> 224: int p = (int)VH_POS.getAndAdd(this, UUID_CHUNK);
>
> An atomic update together with an optimistic lock looks non-idiomatic use of StampedLock to me. Won't a simple CAS loop be simpler? E.g. in pseudocode:
>
> while ((p = this.pos) + 16) < buf.length) {
> long msb = getLong(buf, p);
> long lsb = getLong(buf, p + 8);
> if (cas(this.pos, p, p + 16)) {
> return new UUID(msb, lsb);
> }
> }
>
> // refill buffer under lock
Nope, I don't think you cannot do this, because there is a race on concurrently-updating `buf`. The StampedLock (RW lock), protects the buffer-reading threads from seeing the `buf` undergoing the initialization by buffer-writing threads. Atomic pos only arbitrates the concurrent buffer-threads. That atomic does not help to resolve the buf races.
> src/java.base/share/classes/java/util/UUID.java line 260:
>
>> 258: buf[c + 8] &= 0x3f; /* clear variant */
>> 259: buf[c + 8] |= (byte) 0x80; /* set to IETF variant */
>> 260: }
>
> I'm not sure I understand the point about initialization. Why not just setting the required version bits right in UUID constructor without updating the array? This will be faster and simpler IMO.
Right, we can do things straight on locals, without ever touching the heap, let's see...
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14135#discussion_r1206430447
PR Review Comment: https://git.openjdk.org/jdk/pull/14135#discussion_r1206433588
More information about the core-libs-dev
mailing list