Improve scaling of downcalls using MemorySegments allocated with shared arenas

Stuart Monteith stuart.monteith at arm.com
Wed Dec 10 14:48:09 UTC 2025


Thanks Chris,
	I've taken a look and implemented SharedSession with something similar to your RefCnt. One of the differences with 
SharedSession is that we have a separate close method. I can implement acquire0 with getAndAdd(2), release0 with 
getAndAdd(-2) and close with compareAndSwap(0, 1). With the additional tests against 0x80000001 for acquire0 and 
release0, I have something that passes the unit tests for java/foreign.

The benchmarking is quite promising, but I'll need to look more closely at it - it doesn't scale better on all platforms.

Thanks,
	Stuart




On 08/12/2025 19:45, Chris Vest wrote:
> For what it's worth, in Netty we implement our reference counting with incrementing by 2 instead of 1, and use the low 
> odd bit to indicate the released state.
> This allows us to acquire using getAndAdd, which scales much better than a CAS loop.
> Unfortunately we still need to use a CAS loop when implementing release, so that still has contention problems.
> 
> For reference: https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/common/src/main/java/io/ 
> netty/util/internal/RefCnt.java#L258-L295 <https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/ 
> common/src/main/java/io/netty/util/internal/RefCnt.java#L258-L295>
> 
> On Mon, Dec 8, 2025 at 10:42 AM Maurizio Cimadamore <maurizio.cimadamore at oracle.com 
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
> 
> 
>      > sum() is really just a snapshot, it adds up the counters (Cells), so
>      > it wouldn't ensure the counter was at zero. Immediately after
>      > returning zero a thread could have already incremented it.
>     Yes. What I mean is: you can check if close() should throw because of
>     pending acquires. But, as I said, we can use that in any way to "block"
>     other acquires from happening in case we _do_ want to close. Which
>     leaves us exposed.
>      >
>      >
>      >> For the purpose of implementation clarity -- would it be useful to
>      >> wrap the various counters plus logic to acquire/ release (and
>      >> "closing" state) into a separate abstraction, which is then used by
>      >> SharedMemorySession? A sort of "atomic" LongAdder, if you will :-)
>      >>
>      >> That might make it easier to verify the correctness of the
>      >> implementation, by validating each aspect (the atomic long adder, and
>      >> its use from SharedMemorySession) separately.
>      >
>      > Sure, that would be a bit cleaner, thanks.
> 
>     Thanks.
> 
> 
>     Maurizio
> 



More information about the panama-dev mailing list