Strange problem with deoptimization on highly concurrent thread-local handshake reported to Apache Lucene
Jorn Vernee
jorn.vernee at oracle.com
Mon Jul 1 21:29:36 UTC 2024
Hello Uwe,
I've read the various github threads, and I think I have a good idea of
what's going on. Just to recap how shared arena closure works:
- We set the arena's isAlive = false (more or less)
- We submit a handshake from the closing thread to all other threads
- During that handshake, we check whether each thread is accessing the
arena we are trying to close.
- Unfortunately, this requires deoptimizing the top-most frame of a
thread, due to: JDK-8290892 [1]. But note that this is a 'one off'
unpacking of the top-most frame. After which the method finishes running
in the interpreter, and then goes back to executing compiled code.
- If we find a thread accessing the same arena, we make it throw an
exception, to bail out of the access.
- After the handshake finishes, we know that either: 1) no other thread
was accessing the arena, and they will see the up-to-date isAlive =
false because the handshake also works as a means of synchronization. Or
2) if they were accessing the arena, they will get an exception.
- After that it's safe to free the memory.
So, if shared arenas are closed very frequently, as seems to be the case
in the problematic scenarios, we're likely overwhelming all other
threads with handshakes and deoptimizations.
Addressing JDK-8290892 would mean that we only need to deoptimize
threads that are actually accessing the arena that is being closed.
Another possible interim improvement could be to only deoptimize threads
that are actually in the middle of accessing a memory segment, not just
all threads. That's a relatively low-hanging fruit I noticed recently,
but haven't had time to look into yet. I've filed:
https://bugs.openjdk.org/browse/JDK-8335480
This would of course still leave the handshakes, which also require a
bunch of VM-internal synchronization between threads. Shared arenas are
really only meant for long-lived lifetimes. So, either way, you may want
to consider ways of grouping resources into the same shared arena, to
reduce the total number of closures needed, or giving users the option
of indicating that they only need to access a resource from a single
thread, and then switching to a confined arena behind the scenes (which
is much cheaper to close, as you've found).
Jorn
[1]: https://bugs.openjdk.org/browse/JDK-8290892
On 1-7-2024 13:52, Uwe Schindler wrote:
>
> Hi Panama people, hello Maurizio,
>
> sending this again to the mailing list, we had just private discussion
> with Maurizio. Maybe anyone else has an idea or might figure out what
> the problem is. We are not yet ready to open issue against Java 19
> till 22.
>
> There were several issues reported by users of Apache Lucene (a
> wrongly-written benchmark and also Solr users) about bad performance
> in highly concurrent environments. Actually what was found out is that
> when you have many threads closing shared arenas, under some
> circumstances it causes all "reader threads" (those accessing
> MemorySegment no matter which arena they use) suddenly deoptimize.
> This causes immense slowdowns during Lucene searches.
>
> Lucily one of our committers found a workaround and we are
> investigating to write a benchmark shoing the issue. But first let me
> explain what happens:
>
> * Lucene opens MemorySegments with a shared Arenas (one per file)
> and accesses them by multiple threads. Basically for each index
> file we have a shared arena which is closed when the file is closed.
> * There are many shared arenas (one per index file)!!!
> * If you close a shared arena normally you see no large delay on the
> thread calling the close and also no real effect on any thread
> that reads from other MemorySegments
> * But under certain circumstances ALL reading threads accessing any
> MemorySegment slow down dramatically! So once you close one of our
> Arenas, all other MemorySegments using a different shared arena
> suddelny get deoptimized (we have Hotspot logs showing this). Of
> course, the MemorySegment belonging to the closed arena is no
> longer used and this one should in reality the only affected one
> (throwing a IllegalStateEx).
> * The problem seems to occur mainly when multiple Arenas are closed
> in highly concurrent environments. This is why we did not see the
> issue before.
> * If we put a gloval lock around all calls to Arena#close() the
> issues seem to go away.
>
> I plan to write some benchmark showing this issue. Do you have an idea
> what could go wrong? To me it looks like race in the thread-local
> handshakes which may cause some crazy hotspot behaviour causing of
> deotimization of all threads concurrently accessing MemorySegments
> once Arena#close() is called in highly concurrent environments.
>
> This is the main issue where the observation is tracked:
> https://github.com/apache/lucene/issues/13325
>
> These are issues opened:
>
> * https://github.com/dacapobench/dacapobench/issues/264 (the issue
> on this benchmark was of course that they wer opening/closing too
> often, but actually it just showed the problem, so it was very
> helpful). Funny detail: Alexey Shipilev opened the issue!
> * This was the comment showing the issue at huge installations of
> Apache Solr 9.7:
> https://github.com/apache/lucene/pull/13146#pullrequestreview-2089347714
> (David Smiley also talked to me at berlinbuzzwords). They had to
> disable MemorySegment usage in Apache SOlr/Lucene in their
> environment. They have machines with thousands of indexes (and
> therefor 10 thousands of Arenas) open at same time and the close
> rate is very very high!
>
> Uwe
>
> --
> Uwe Schindler
> uschindler at apache.org
> ASF Member, Member of PMC and Committer of Apache Lucene and Apache Solr
> Bremen, Germany
> https://lucene.apache.org/
> https://solr.apache.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240701/0de29324/attachment.htm>
More information about the panama-dev
mailing list