<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hello Uwe,<br>
<br>
I've read the various github threads, and I think I have a good
idea of what's going on. Just to recap how shared arena closure
works:<br>
<br>
- We set the arena's isAlive = false (more or less)<br>
- We submit a handshake from the closing thread to all other
threads<br>
- During that handshake, we check whether each thread is accessing
the arena we are trying to close.<br>
- Unfortunately, this requires deoptimizing the top-most frame
of a thread, due to: JDK-8290892 [1]. But note that this is a 'one
off' unpacking of the top-most frame. After which the method
finishes running in the interpreter, and then goes back to
executing compiled code.<br>
- If we find a thread accessing the same arena, we make it throw
an exception, to bail out of the access.<br>
- After the handshake finishes, we know that either: 1) no other
thread was accessing the arena, and they will see the up-to-date
isAlive = false because the handshake also works as a means of
synchronization. Or 2) if they were accessing the arena, they will
get an exception.<br>
- After that it's safe to free the memory.</p>
<p>So, if shared arenas are closed very frequently, as seems to be
the case in the problematic scenarios, we're likely overwhelming
all other threads with handshakes and deoptimizations.</p>
<p>Addressing JDK-8290892 would mean that we only need to deoptimize
threads that are actually accessing the arena that is being
closed. Another possible interim improvement could be to only
deoptimize threads that are actually in the middle of accessing a
memory segment, not just all threads. That's a relatively
low-hanging fruit I noticed recently, but haven't had time to look
into yet. I've filed: <a class="moz-txt-link-freetext" href="https://bugs.openjdk.org/browse/JDK-8335480">https://bugs.openjdk.org/browse/JDK-8335480</a><br>
<br>
This would of course still leave the handshakes, which also
require a bunch of VM-internal synchronization between threads.
Shared arenas are really only meant for long-lived lifetimes. So,
either way, you may want to consider ways of grouping resources
into the same shared arena, to reduce the total number of closures
needed, or giving users the option of indicating that they only
need to access a resource from a single thread, and then switching
to a confined arena behind the scenes (which is much cheaper to
close, as you've found).</p>
<p>Jorn<br>
</p>
<p>[1]: <a class="moz-txt-link-freetext" href="https://bugs.openjdk.org/browse/JDK-8290892">https://bugs.openjdk.org/browse/JDK-8290892</a></p>
<div class="moz-cite-prefix">On 1-7-2024 13:52, Uwe Schindler wrote:<br>
</div>
<blockquote type="cite" cite="mid:355ef243-cca9-436f-b367-700f77e2ad74@apache.org">
<p>Hi Panama people, hello Maurizio,</p>
<p>sending this again to the mailing list, we had just private
discussion with Maurizio. Maybe anyone else has an idea or might
figure out what the problem is. We are not yet ready to open
issue against Java 19 till 22.<br>
</p>
<p>There were several issues reported by users of Apache Lucene (a
wrongly-written benchmark and also Solr users) about bad
performance in highly concurrent environments. Actually what was
found out is that when you have many threads closing shared
arenas, under some circumstances it causes all "reader threads"
(those accessing MemorySegment no matter which arena they use)
suddenly deoptimize. This causes immense slowdowns during Lucene
searches.</p>
<p>Lucily one of our committers found a workaround and we are
investigating to write a benchmark shoing the issue. But first
let me explain what happens:</p>
<ul>
<li>Lucene opens MemorySegments with a shared Arenas (one per
file) and accesses them by multiple threads. Basically for
each index file we have a shared arena which is closed when
the file is closed.</li>
<li>There are many shared arenas (one per index file)!!!</li>
<li>If you close a shared arena normally you see no large delay
on the thread calling the close and also no real effect on any
thread that reads from other MemorySegments</li>
<li>But under certain circumstances ALL reading threads
accessing any MemorySegment slow down dramatically! So once
you close one of our Arenas, all other MemorySegments using a
different shared arena suddelny get deoptimized (we have
Hotspot logs showing this). Of course, the MemorySegment
belonging to the closed arena is no longer used and this one
should in reality the only affected one (throwing a
IllegalStateEx).<br>
</li>
<li>The problem seems to occur mainly when multiple Arenas are
closed in highly concurrent environments. This is why we did
not see the issue before.</li>
<li>If we put a gloval lock around all calls to Arena#close()
the issues seem to go away.<br>
</li>
</ul>
<p>I plan to write some benchmark showing this issue. Do you have
an idea what could go wrong? To me it looks like race in the
thread-local handshakes which may cause some crazy hotspot
behaviour causing of deotimization of all threads concurrently
accessing MemorySegments once Arena#close() is called in highly
concurrent environments.<br>
</p>
<p>This is the main issue where the observation is tracked: <a class="moz-txt-link-freetext" href="https://github.com/apache/lucene/issues/13325" moz-do-not-send="true">https://github.com/apache/lucene/issues/13325</a></p>
<p>These are issues opened:</p>
<ul>
<li><a class="moz-txt-link-freetext" href="https://github.com/dacapobench/dacapobench/issues/264" moz-do-not-send="true">https://github.com/dacapobench/dacapobench/issues/264</a>
(the issue on this benchmark was of course that they wer
opening/closing too often, but actually it just showed the
problem, so it was very helpful). Funny detail: Alexey
Shipilev opened the issue!</li>
<li>This was the comment showing the issue at huge installations
of Apache Solr 9.7: <a class="moz-txt-link-freetext" href="https://github.com/apache/lucene/pull/13146#pullrequestreview-2089347714" moz-do-not-send="true">https://github.com/apache/lucene/pull/13146#pullrequestreview-2089347714</a>
(David Smiley also talked to me at berlinbuzzwords). They had
to disable MemorySegment usage in Apache SOlr/Lucene in their
environment. They have machines with thousands of indexes (and
therefor 10 thousands of Arenas) open at same time and the
close rate is very very high!</li>
</ul>
<p>Uwe<br>
</p>
<pre class="moz-signature" cols="72">--
Uwe Schindler
<a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:uschindler@apache.org" moz-do-not-send="true">uschindler@apache.org</a>
ASF Member, Member of PMC and Committer of Apache Lucene and Apache Solr
Bremen, Germany
<a class="moz-txt-link-freetext" href="https://lucene.apache.org/" moz-do-not-send="true">https://lucene.apache.org/</a>
<a class="moz-txt-link-freetext" href="https://solr.apache.org/" moz-do-not-send="true">https://solr.apache.org/</a></pre>
</blockquote>
</body>
</html>