New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal

Tue May 7 11:07:29 UTC 2024

On 07/05/2024 11:20, Maurizio Cimadamore wrote:

> We will try to improve this area over time, but please note that the 
> way you are using arenas is not very idiomatic. 

To be clear, the big cost you are seeing with shared arena is not the 
creation per se, but the closing, which is an expensive (but 
deterministic) operation.

Btw, as I said, we do have a benchmark similar to the one you provided:

https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/lang/foreign/MemorySessionClose.java

Here’s the results on my machine:

|Benchmark (mode) Mode Cnt Score Error Units 
MemorySessionClose.confined_close NONE avgt 30 0.049 ± 0.001 us/op 
MemorySessionClose.confined_close MEMORY avgt 30 0.052 ± 0.003 us/op 
MemorySessionClose.confined_close THREADS avgt 30 0.055 ± 0.001 us/op 
MemorySessionClose.implicit_close NONE avgt 30 1.221 ± 0.689 us/op 
MemorySessionClose.implicit_close MEMORY avgt 30 139.873 ± 295.502 us/op 
MemorySessionClose.implicit_close THREADS avgt 30 1.224 ± 0.385 us/op 
MemorySessionClose.implicit_close_systemgc NONE avgt 30 32.905 ± 4.904 
us/op MemorySessionClose.implicit_close_systemgc MEMORY avgt 30 1132.976 
± 62.264 us/op MemorySessionClose.implicit_close_systemgc THREADS avgt 
30 30.159 ± 4.360 us/op MemorySessionClose.shared_close NONE avgt 30 
6.823 ± 0.215 us/op MemorySessionClose.shared_close MEMORY avgt 30 7.113 
± 0.306 us/op MemorySessionClose.shared_close THREADS avgt 30 10.883 ± 
0.503 us/op |

The benchmark has two stress modes:

  * MEMORY - which creates a lot of small arrays to put more strain on GC
  * THREADS - which creates a lot of additional dummy threads, to check
    that the shared arena handshake isn’t affected too much

As you can see, confined here is the fastest. Implicit seems to be 
initially better than shared, but there are some very bad spikes. Esp. 
when the MEMORY is enabled.

Shared arena seems slower, but it is also very predictable. Note that 
these numbers do not fully reveal the full extent of how bad implicit 
actually is. On my system, if I run |top| on a different window, I can 
see resident memory peaking to some 20G when using the implicit scheme, 
which doesn’t happen at all when using either confined/shared (this is 
due to the fact that, whne using implicit, there is always a 
non-deterministic delay between the time when a segment becomes 
unreacable and the time the memort is truly deallocated).

All this to say: it can be very hard to measure performance of memory 
allocation in a complete fashion. It’s like an elephant with many sides, 
and when looking at synthetic benchmarks it can be sometimes easy to 
forget about some of them.

Cheers
Maurizio

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jdk-dev/attachments/20240507/db07f735/attachment-0001.htm>