RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2]

Mon Jul 15 08:56:54 UTC 2024

On Sun, 14 Jul 2024 11:01:58 GMT, Uwe Schindler <uschindler at openjdk.org> wrote:

> I have one problem with the benchmark: I think it is not measuring the whole setup in a way that is our workload: The basic problem is that we don't want to deoptimize threads which are not related to MemorySegments. So basically, the throughput of those threads should not be affected. For threads currently in a memory-segment read it should have a bit of effect, but it should recover fast.

IMHO there is a bit of confusion in this discussion. When we say that a shared arena close operation is slow, we might mean one of two things:

1. calling the `close()` method itself is slow (this is what the benchmark effectively measures)
2. throughput of unrelated threads is affected (I think this is what Lucene is seeing)

Addressing (2) than (1) (in the sense that, if you sign up for a shared arena close, you know it's going to be deterministic, but expensive, as the javadoc itself admits).

For this reason, I'm unsure about some of the "delaying tactics" I see mentioned here: if we delay the underlying "free"/"unmap" operation, this is only going to affect (1). You still need some global operation (e.g. handshake) to make sure all threads agree on the segment state. Moving the cost of the free/unmap from one place to another is not really going to do much for (2).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228002760