RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v3]

Mon Jul 15 12:02:51 UTC 2024

On Mon, 15 Jul 2024 11:47:43 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:

> I've update the benchmark to run with 3 separate threads: 1 thread that is just creating and closing shared arenas in a loop, 1 that is accessing memory using the FFM API, and 1 that is accessing a `byte[]`.
> 
> Current:
> 
> ```
> Benchmark                                        Mode  Cnt   Score    Error  Units
> ConcurrentClose.sharedClose                      avgt   10  50.093 ±  6.200  us/op
> ConcurrentClose.sharedClose:closing              avgt   10  46.269 ±  0.786  us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  98.072 ± 19.061  us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   10   5.938 ±  0.058  us/op
> ```
> 
> I do see a pretty big difference on the memory segment accessing thread when I remove deoptimization altogether:
> 
> ```
> Benchmark                                        Mode  Cnt   Score   Error  Units
> ConcurrentClose.sharedClose                      avgt   10  22.664 ± 0.409  us/op
> ConcurrentClose.sharedClose:closing              avgt   10  45.351 ± 1.554  us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  16.671 ± 0.251  us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   10   5.969 ± 0.089  us/op
> ```
> 
> When I remove the `has_scoped_access()` check before the deopt, I expect the `otherAccess` thread to be affected, but the effect isn't nearly as big as with the FFM thread. I think this is likely due to the `otherAccess` benchmark being less sensitive to optimization (i.e. it already runs fairly fast in the interpreter). I also tried using `MethodHandles::arrayElementGetter` for the access, but the numbers I got were pretty much the same:
> 
> ```
> Benchmark                                        Mode  Cnt    Score   Error  Units
> ConcurrentClose.sharedClose                      avgt   10   52.745 ± 1.071  us/op
> ConcurrentClose.sharedClose:closing              avgt   10   46.670 ± 0.453  us/op
> ConcurrentClose.sharedClose:memorySegmentAccess  avgt   10  102.663 ± 3.430  us/op
> ConcurrentClose.sharedClose:otherAccess          avgt   10    8.901 ± 0.109  us/op
> ```
> 
> I think, to really test the effect of the `has_scoped_access` check, we need to look at a more realistic scenario.

Interesting benchmark. What is the baseline here? E.g. can we also compare against same benchmark that is using a confined arena to do the closing?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228335857