Strange problem with deoptimization on highly concurrent thread-local handshake reported to Apache Lucene

Mon Jul 1 11:52:53 UTC 2024

Hi Panama people, hello Maurizio,

sending this again to the mailing list, we had just private discussion 
with Maurizio. Maybe anyone else has an idea or might figure out what 
the problem is. We are not yet ready to open issue against Java 19 till 22.

There were several issues reported by users of Apache Lucene (a 
wrongly-written benchmark and also Solr users) about bad performance in 
highly concurrent environments. Actually what was found out is that when 
you have many threads closing shared arenas, under some circumstances it 
causes all "reader threads" (those accessing MemorySegment no matter 
which arena they use) suddenly deoptimize. This causes immense slowdowns 
during Lucene searches.

Lucily one of our committers found a workaround and we are investigating 
to write a benchmark shoing the issue. But first let me explain what 
happens:

  * Lucene opens MemorySegments with a shared Arenas (one per file) and
    accesses them by multiple threads. Basically for each index file we
    have a shared arena which is closed when the file is closed.
  * There are many shared arenas (one per index file)!!!
  * If you close a shared arena normally you see no large delay on the
    thread calling the close and also no real effect on any thread that
    reads from other MemorySegments
  * But under certain circumstances ALL reading threads accessing any
    MemorySegment slow down dramatically! So once you close one of our
    Arenas, all other MemorySegments using a different shared arena
    suddelny get deoptimized (we have Hotspot logs showing this). Of
    course, the MemorySegment belonging to the closed arena is no longer
    used and this one should in reality the only affected one (throwing
    a IllegalStateEx).
  * The problem seems to occur mainly when multiple Arenas are closed in
    highly concurrent environments. This is why we did not see the issue
    before.
  * If we put a gloval lock around all calls to Arena#close() the issues
    seem to go away.

I plan to write some benchmark showing this issue. Do you have an idea 
what could go wrong? To me it looks like race in the thread-local 
handshakes which may cause some crazy hotspot behaviour causing of 
deotimization of all threads concurrently accessing MemorySegments once 
Arena#close() is called in highly concurrent environments.

This is the main issue where the observation is tracked: 
https://github.com/apache/lucene/issues/13325

These are issues opened:

  * https://github.com/dacapobench/dacapobench/issues/264 (the issue on
    this benchmark was of course that they wer opening/closing too
    often, but actually it just showed the problem, so it was very
    helpful). Funny detail: Alexey Shipilev opened the issue!
  * This was the comment showing the issue at huge installations of
    Apache Solr 9.7:
    https://github.com/apache/lucene/pull/13146#pullrequestreview-2089347714
    (David Smiley also talked to me at berlinbuzzwords). They had to
    disable MemorySegment usage in Apache SOlr/Lucene in their
    environment. They have machines with thousands of indexes (and
    therefor 10 thousands of Arenas) open at same time and the close
    rate is very very high!

Uwe

-- 
Uwe Schindler
uschindler at apache.org  
ASF Member, Member of PMC and Committer of Apache Lucene and Apache Solr
Bremen, Germany
https://lucene.apache.org/
https://solr.apache.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240701/1f170a38/attachment-0001.htm>