Improve scaling of downcalls using MemorySegments allocated with shared arenas

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Mon Dec 8 17:31:17 UTC 2025


>
>> This problem is quite similar to a read/write lock scenario (as you 
>> also mention):
>>
>> * the threads doing the acquires/release are effectively expressing a 
>> desire to "read" a segment in a given piece of code. So, multiple 
>> readers can co-exist.
>> * the thread doing the close is effectively expressing a desire to 
>> "write" the segment -- so it should only be allowed to do so when 
>> there's no readers.
>>
>> In principle, something like this
>>
>> https://docs.oracle.com/en/java/javase/25/docs/api//java.base/java/util/concurrent/locks/StampedLock.html 
>>
> I experimented with StampedLock, but found that it scaled more or less 
> the same as the existing implementation. acquire0() calls 
> tryReadLock(), release0() calling tryUnlockRead() and justClose() 
> calling tryWriteLock(). It appears the compare-and-swap operation is a 
> bottleneck.
Interesting -- thanks for sharing!
>
>> Should work quite well for this use cases. Or, even using a LongAdder 
>> as an acquire/release counter:
>>
>> https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/util/concurrent/atomic/LongAdder.html 
>>
> I found LongAdder/Striped64 interesting as, if necessary, we could 
> look at how it is handling memory. The SharedSession code I wrote 
> allocates memory upfront, such that a scenario with one thread would 
> use as much memory as when there are 128 threads on 128 cores. But 
> besides that, getting the sum is not atomic, and neither is acting 
> upon it. I experimented with  AtomicLong, with a close operation 
> subtracting a very large value to force the counter negative, but that 
> wasn't too different from before, dependent on atomic reads/writes to 
> a single memory location.

Yeah --- after I sent the message I realized that sum() is good enough 
for ensuring e.g. that when closing there's no pending acquire -- but 
not for the opposite: e.g. ensuring that acquire can't happen during a 
close... so it's weaker than a RW lock (but the internals use some 
redundancy to reduce contention, which is kind of what you also do here).

For the purpose of implementation clarity -- would it be useful to wrap 
the various counters plus logic to acquire/release (and "closing" state) 
into a separate abstraction, which is then used by SharedMemorySession? 
A sort of "atomic" LongAdder, if you will :-)

That might make it easier to verify the correctness of the 
implementation, by validating each aspect (the atomic long adder, and 
its use from SharedMemorySession) separately.

Cheers
Maurizio



More information about the panama-dev mailing list