[crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v5]

Thu Apr 13 15:01:30 UTC 2023

On Wed, 12 Apr 2023 15:02:05 GMT, Radim Vansa <duke at openjdk.org> wrote:

>> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource.
>
> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add synchronized context

It follows the same principle but its use is not interchangeable (and cannot be made so) - had you replaced existing place that uses `ReadWriteLock` with this one it wouldn't work.

I already made a benchmark this includes a noop baseline (`unsync`), and executes the `quick` method as fast as it can in 8 threads, and `slow` method with 10/100 ms think time (single thread):

Benchmark                      (impl)  (pause)   Mode  Cnt           Score           Error  Units
SwitchPointBenchmark.g:quick   unsync       10  thrpt    5  2420777624.146 ± 249641306.573  ops/s
SwitchPointBenchmark.g:slow    unsync       10  thrpt    5          99.248 ±         0.264  ops/s
SwitchPointBenchmark.g:quick   unsync      100  thrpt    5  2244724220.494 ± 328435039.061  ops/s
SwitchPointBenchmark.g:slow    unsync      100  thrpt    5           9.992 ±         0.002  ops/s
SwitchPointBenchmark.g:quick   rwlock       10  thrpt    5     4414608.947 ±   1525681.326  ops/s
SwitchPointBenchmark.g:slow    rwlock       10  thrpt    5          99.191 ±         0.160  ops/s
SwitchPointBenchmark.g:quick   rwlock      100  thrpt    5     4541641.249 ±   3166622.432  ops/s
SwitchPointBenchmark.g:slow    rwlock      100  thrpt    5           9.989 ±         0.003  ops/s
SwitchPointBenchmark.g:quick  rculock       10  thrpt    5   196537498.940 ± 305743615.522  ops/s
SwitchPointBenchmark.g:slow   rculock       10  thrpt    5          94.168 ±         2.159  ops/s
SwitchPointBenchmark.g:quick  rculock      100  thrpt    5   772304327.917 ±  28329265.290  ops/s
SwitchPointBenchmark.g:slow   rculock      100  thrpt    5           9.909 ±         0.025  ops/s

In case of 10 ms think time (which is really extremely often) results show more than 20x speedup compared to ReentrantReadWriteLock.readLock().lock()+unlock() combo, and just 10x slowdown vs. noop. With 100 ms think time it's order of magnitude better, > 150x speedup vs. < 3x slowdown.

I've also run benchmark with no pause time to see the maximum frequency of synchronization, and it shows about 4.5k syncs/s (it would be less with more threads and longer stacks for sure).

Benchmark                      (impl)  (pause)   Mode  Cnt        Score       Error  Units
SwitchPointBenchmark.g:quick  rculock        0  thrpt    5  1417151.441 ± 72183.322  ops/s
SwitchPointBenchmark.g:slow   rculock        0  thrpt    5     4486.629 ±   201.970  ops/s

Note that these results use single fork VM and just few short iterations, but it gives some idea about the order of magnitude.

-------------

PR Comment: https://git.openjdk.org/crac/pull/58#issuecomment-1507121658