RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3]
Maurizio Cimadamore
mcimadamore at openjdk.org
Mon Jul 15 12:02:51 UTC 2024
On Mon, 15 Jul 2024 11:47:43 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:
> I've update the benchmark to run with 3 separate threads: 1 thread that is just creating and closing shared arenas in a loop, 1 that is accessing memory using the FFM API, and 1 that is accessing a `byte[]`.
>
> Current:
>
> ```
> Benchmark Mode Cnt Score Error Units
> ConcurrentClose.sharedClose avgt 10 50.093 ± 6.200 us/op
> ConcurrentClose.sharedClose:closing avgt 10 46.269 ± 0.786 us/op
> ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 98.072 ± 19.061 us/op
> ConcurrentClose.sharedClose:otherAccess avgt 10 5.938 ± 0.058 us/op
> ```
>
> I do see a pretty big difference on the memory segment accessing thread when I remove deoptimization altogether:
>
> ```
> Benchmark Mode Cnt Score Error Units
> ConcurrentClose.sharedClose avgt 10 22.664 ± 0.409 us/op
> ConcurrentClose.sharedClose:closing avgt 10 45.351 ± 1.554 us/op
> ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 16.671 ± 0.251 us/op
> ConcurrentClose.sharedClose:otherAccess avgt 10 5.969 ± 0.089 us/op
> ```
>
> When I remove the `has_scoped_access()` check before the deopt, I expect the `otherAccess` thread to be affected, but the effect isn't nearly as big as with the FFM thread. I think this is likely due to the `otherAccess` benchmark being less sensitive to optimization (i.e. it already runs fairly fast in the interpreter). I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same:
>
> ```
> Benchmark Mode Cnt Score Error Units
> ConcurrentClose.sharedClose avgt 10 52.745 ± 1.071 us/op
> ConcurrentClose.sharedClose:closing avgt 10 46.670 ± 0.453 us/op
> ConcurrentClose.sharedClose:memorySegmentAccess avgt 10 102.663 ± 3.430 us/op
> ConcurrentClose.sharedClose:otherAccess avgt 10 8.901 ± 0.109 us/op
> ```
>
> I think, to really test the effect of the `has_scoped_access` check, we need to look at a more realistic scenario.
Interesting benchmark. What is the baseline here? E.g. can we also compare against same benchmark that is using a confined arena to do the closing?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228335857
More information about the core-libs-dev
mailing list