[crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v7]

Tue Apr 25 13:31:48 UTC 2023

On Tue, 25 Apr 2023 07:34:51 GMT, Radim Vansa <duke at openjdk.org> wrote:

>> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource.
>
> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Don't use SwitchPoints

Okay, using just a volatile field does not change much. Anton suggested to have a look on the totally uncontended readers case; a single-threaded benchmark results (including the 'new' implementation that fakes an empty read-lock) look like this:

Benchmark              (blackhole)   (impl)   Mode  Cnt           Score           Error  Units
InlinedCall.newImpl          false      N/A  thrpt    3  2441232707.809 ± 138092251.983  ops/s
InlinedCall.rcuLocked        false      N/A  thrpt    3  1116925952.834 ± 105484443.430  ops/s
InlinedCall.rwLocked         false      N/A  thrpt    3    59334557.465 ±    267412.636  ops/s
InlinedCall.unsync           false      N/A  thrpt    3  2424408670.896 ± 696453915.897  ops/s
VirtualCall.component        false   unsync  thrpt    3  2439576011.016 ± 194572187.841  ops/s
VirtualCall.component        false   rwlock  thrpt    3    59518346.446 ±     91540.972  ops/s
VirtualCall.component        false  rculock  thrpt    3  1059312172.029 ±  91850879.226  ops/s
VirtualCall.component        false      new  thrpt    3  1845830506.910 ± 256628384.459  ops/s

When I run this with 6 threads I get this:

Benchmark              (blackhole)   (impl)   Mode  Cnt            Score            Error  Units
InlinedCall.newImpl          false      N/A  thrpt    3  12031688167.608 ± 4218322551.532  ops/s
InlinedCall.rcuLocked        false      N/A  thrpt    3   5660447942.467 ±  237218291.900  ops/s
InlinedCall.rwLocked         false      N/A  thrpt    3    341089781.812 ±   65370977.211  ops/s
InlinedCall.unsync           false      N/A  thrpt    3  11648096223.269 ±  282528592.912  ops/s
VirtualCall.component        false   unsync  thrpt    3  11406362088.019 ± 1207457766.872  ops/s
VirtualCall.component        false   rwlock  thrpt    3    331446602.891 ±   10006346.435  ops/s
VirtualCall.component        false  rculock  thrpt    3   5213272454.145 ±  707041721.132  ops/s
VirtualCall.component        false      new  thrpt    3   8796928884.245 ± 2177441524.351  ops/s

I've tried to see why the virtual invocation vs. inlined of `unsync` does not change while `new` has a significant difference, but I can't really tell after looking into `perfasm` results. I've also checked with disabled inlining of the entry method, and the result is quite different. It's hard to tell which version would be the 'right' one - I think that the advantage of RCU vs. RW lock is clear, and having a 2 fold slowdown vs. empty implementation isn't that bad. `perfasm` points most of the weight to those volatile reads.

-------------

PR Comment: https://git.openjdk.org/crac/pull/58#issuecomment-1521792720