[crac] RFR: RCU Lock - RW lock with very lightweight read- and heavyweight write-locking [v7]
Radim Vansa
duke at openjdk.org
Tue Apr 25 13:31:48 UTC 2023
On Tue, 25 Apr 2023 07:34:51 GMT, Radim Vansa <duke at openjdk.org> wrote:
>> This implementation is suitable for uses where the write-locking happens very rarely (if at all), as in the case of CRaC checkpoint, and we don't want to slow down regular access to the protected resource.
>
> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision:
>
> Don't use SwitchPoints
Okay, using just a volatile field does not change much. Anton suggested to have a look on the totally uncontended readers case; a single-threaded benchmark results (including the 'new' implementation that fakes an empty read-lock) look like this:
Benchmark (blackhole) (impl) Mode Cnt Score Error Units
InlinedCall.newImpl false N/A thrpt 3 2441232707.809 ± 138092251.983 ops/s
InlinedCall.rcuLocked false N/A thrpt 3 1116925952.834 ± 105484443.430 ops/s
InlinedCall.rwLocked false N/A thrpt 3 59334557.465 ± 267412.636 ops/s
InlinedCall.unsync false N/A thrpt 3 2424408670.896 ± 696453915.897 ops/s
VirtualCall.component false unsync thrpt 3 2439576011.016 ± 194572187.841 ops/s
VirtualCall.component false rwlock thrpt 3 59518346.446 ± 91540.972 ops/s
VirtualCall.component false rculock thrpt 3 1059312172.029 ± 91850879.226 ops/s
VirtualCall.component false new thrpt 3 1845830506.910 ± 256628384.459 ops/s
When I run this with 6 threads I get this:
Benchmark (blackhole) (impl) Mode Cnt Score Error Units
InlinedCall.newImpl false N/A thrpt 3 12031688167.608 ± 4218322551.532 ops/s
InlinedCall.rcuLocked false N/A thrpt 3 5660447942.467 ± 237218291.900 ops/s
InlinedCall.rwLocked false N/A thrpt 3 341089781.812 ± 65370977.211 ops/s
InlinedCall.unsync false N/A thrpt 3 11648096223.269 ± 282528592.912 ops/s
VirtualCall.component false unsync thrpt 3 11406362088.019 ± 1207457766.872 ops/s
VirtualCall.component false rwlock thrpt 3 331446602.891 ± 10006346.435 ops/s
VirtualCall.component false rculock thrpt 3 5213272454.145 ± 707041721.132 ops/s
VirtualCall.component false new thrpt 3 8796928884.245 ± 2177441524.351 ops/s
I've tried to see why the virtual invocation vs. inlined of `unsync` does not change while `new` has a significant difference, but I can't really tell after looking into `perfasm` results. I've also checked with disabled inlining of the entry method, and the result is quite different. It's hard to tell which version would be the 'right' one - I think that the advantage of RCU vs. RW lock is clear, and having a 2 fold slowdown vs. empty implementation isn't that bad. `perfasm` points most of the weight to those volatile reads.
-------------
PR Comment: https://git.openjdk.org/crac/pull/58#issuecomment-1521792720
More information about the crac-dev
mailing list