New candidate JEP: 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue May 7 11:07:29 UTC 2024
On 07/05/2024 11:20, Maurizio Cimadamore wrote:
> We will try to improve this area over time, but please note that the
> way you are using arenas is not very idiomatic.
To be clear, the big cost you are seeing with shared arena is not the
creation per se, but the closing, which is an expensive (but
deterministic) operation.
Btw, as I said, we do have a benchmark similar to the one you provided:
https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/lang/foreign/MemorySessionClose.java
Here’s the results on my machine:
|Benchmark (mode) Mode Cnt Score Error Units
MemorySessionClose.confined_close NONE avgt 30 0.049 ± 0.001 us/op
MemorySessionClose.confined_close MEMORY avgt 30 0.052 ± 0.003 us/op
MemorySessionClose.confined_close THREADS avgt 30 0.055 ± 0.001 us/op
MemorySessionClose.implicit_close NONE avgt 30 1.221 ± 0.689 us/op
MemorySessionClose.implicit_close MEMORY avgt 30 139.873 ± 295.502 us/op
MemorySessionClose.implicit_close THREADS avgt 30 1.224 ± 0.385 us/op
MemorySessionClose.implicit_close_systemgc NONE avgt 30 32.905 ± 4.904
us/op MemorySessionClose.implicit_close_systemgc MEMORY avgt 30 1132.976
± 62.264 us/op MemorySessionClose.implicit_close_systemgc THREADS avgt
30 30.159 ± 4.360 us/op MemorySessionClose.shared_close NONE avgt 30
6.823 ± 0.215 us/op MemorySessionClose.shared_close MEMORY avgt 30 7.113
± 0.306 us/op MemorySessionClose.shared_close THREADS avgt 30 10.883 ±
0.503 us/op |
The benchmark has two stress modes:
* MEMORY - which creates a lot of small arrays to put more strain on GC
* THREADS - which creates a lot of additional dummy threads, to check
that the shared arena handshake isn’t affected too much
As you can see, confined here is the fastest. Implicit seems to be
initially better than shared, but there are some very bad spikes. Esp.
when the MEMORY is enabled.
Shared arena seems slower, but it is also very predictable. Note that
these numbers do not fully reveal the full extent of how bad implicit
actually is. On my system, if I run |top| on a different window, I can
see resident memory peaking to some 20G when using the implicit scheme,
which doesn’t happen at all when using either confined/shared (this is
due to the fact that, whne using implicit, there is always a
non-deterministic delay between the time when a segment becomes
unreacable and the time the memort is truly deallocated).
All this to say: it can be very hard to measure performance of memory
allocation in a complete fashion. It’s like an elephant with many sides,
and when looking at synthetic benchmarks it can be sometimes easy to
forget about some of them.
Cheers
Maurizio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jdk-dev/attachments/20240507/db07f735/attachment-0001.htm>
More information about the jdk-dev
mailing list