RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v2]

Mon Jul 15 09:04:54 UTC 2024

On Mon, 15 Jul 2024 08:54:11 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> > I have one problem with the benchmark: I think it is not measuring the whole setup in a way that is our workload: The basic problem is that we don't want to deoptimize threads which are not related to MemorySegments. So basically, the throughput of those threads should not be affected. For threads currently in a memory-segment read it should have a bit of effect, but it should recover fast.
> 
> IMHO there is a bit of confusion in this discussion. When we say that a shared arena close operation is slow, we might mean one of two things:
> 
> 1. calling the `close()` method itself is slow (this is what the benchmark effectively measures)
> 2. throughput of unrelated threads is affected (I think this is what Lucene is seeing)
> 
> Addressing (2) than (1) (in the sense that, if you sign up for a shared arena close, you know it's going to be deterministic, but expensive, as the javadoc itself admits).

I fully agree, we mixed two different approaches. The problem is that the benchmark measures both, 1 and 2 per thread. To see an effect of this change, the benchmark should have 3 types of threads: One only closing arenas, another set that consumes scoped memory and a third group doing totally unrelated stuff.

> For this reason, I'm unsure about some of the "delaying tactics" I see mentioned here: if we delay the underlying "free"/"unmap" operation, this is only going to affect (1). You still need some global operation (e.g. handshake) to make sure all threads agree on the segment state. Moving the cost of the free/unmap from one place to another is not really going to do much for (2).

This is indeed unrelated. It is just an idea I also thorught of. In Apache Lucene we are mostly interested to close the shared arena as soon as possible. We don't need to make sure it is closed after the "close" call finished (we don't care), but we can't wait until GC closes the arena possibly after hours or even days. The reason for the latter is that the Arena is a small, long-living instance and GC does not want to free it, as there is no pressure.

So basically for us it would be best to trigger the close and then do other stuff.

Of course we can do that in a separate thread (this is my idea how to improve the closes in lucene). The only problem is that Lucene does not have own threadpools, so this would be responsibility of the caller to possibly close our indexes in a separate thread (and a single one only).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228018619