Improve scaling of downcalls using MemorySegments allocated with shared arenas
Stuart Monteith
stuart.monteith at arm.com
Wed Dec 10 14:48:09 UTC 2025
Thanks Chris,
I've taken a look and implemented SharedSession with something similar to your RefCnt. One of the differences with
SharedSession is that we have a separate close method. I can implement acquire0 with getAndAdd(2), release0 with
getAndAdd(-2) and close with compareAndSwap(0, 1). With the additional tests against 0x80000001 for acquire0 and
release0, I have something that passes the unit tests for java/foreign.
The benchmarking is quite promising, but I'll need to look more closely at it - it doesn't scale better on all platforms.
Thanks,
Stuart
On 08/12/2025 19:45, Chris Vest wrote:
> For what it's worth, in Netty we implement our reference counting with incrementing by 2 instead of 1, and use the low
> odd bit to indicate the released state.
> This allows us to acquire using getAndAdd, which scales much better than a CAS loop.
> Unfortunately we still need to use a CAS loop when implementing release, so that still has contention problems.
>
> For reference: https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/common/src/main/java/io/
> netty/util/internal/RefCnt.java#L258-L295 <https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/
> common/src/main/java/io/netty/util/internal/RefCnt.java#L258-L295>
>
> On Mon, Dec 8, 2025 at 10:42 AM Maurizio Cimadamore <maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
>
> > sum() is really just a snapshot, it adds up the counters (Cells), so
> > it wouldn't ensure the counter was at zero. Immediately after
> > returning zero a thread could have already incremented it.
> Yes. What I mean is: you can check if close() should throw because of
> pending acquires. But, as I said, we can use that in any way to "block"
> other acquires from happening in case we _do_ want to close. Which
> leaves us exposed.
> >
> >
> >> For the purpose of implementation clarity -- would it be useful to
> >> wrap the various counters plus logic to acquire/ release (and
> >> "closing" state) into a separate abstraction, which is then used by
> >> SharedMemorySession? A sort of "atomic" LongAdder, if you will :-)
> >>
> >> That might make it easier to verify the correctness of the
> >> implementation, by validating each aspect (the atomic long adder, and
> >> its use from SharedMemorySession) separately.
> >
> > Sure, that would be a bit cleaner, thanks.
>
> Thanks.
>
>
> Maurizio
>
More information about the panama-dev
mailing list