Improve scaling of downcalls using MemorySegments allocated with shared arenas

Stuart Monteith stuart.monteith at arm.com
Sun Dec 14 23:07:08 UTC 2025


What I'm finding with the getAndAdd version is there is often an improvement, but the split counting, the multicounting 
as I called it, is much better in terms of performance (I'll share them in a week). I've tried to avoid weird issues 
with the split counting by having the code as simple as I could make it. Keeping the states consistent is important - if 
the code is in the middle of closing, it is important that getting the state of the counter pauses while that is decided.

BR,
	Stuart



On 12/12/2025 19:25, Chris Vest wrote:
> Yeah, we previously also tried split counting, but reverted it because we observed some weird rare issues, and got 
> suspicious of it.
> 
> On Wed, Dec 10, 2025 at 8:21 AM Maurizio Cimadamore <maurizio.cimadamore at oracle.com 
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
> 
>     What I like (a lot) about this is that now we're back to using the same
>     "bit" of information for both liveness and acquire count (IIUC). If
>     that's the case, it would be much simpler to convince ourselves this is
>     correct.
> 
>     Thanks
>     Maurizio
> 
>     On 10/12/2025 14:48, Stuart Monteith wrote:
>      > Thanks Chris,
>      >     I've taken a look and implemented SharedSession with something
>      > similar to your RefCnt. One of the differences with SharedSession is
>      > that we have a separate close method. I can implement acquire0 with
>      > getAndAdd(2), release0 with getAndAdd(-2) and close with
>      > compareAndSwap(0, 1). With the additional tests against 0x80000001 for
>      > acquire0 and release0, I have something that passes the unit tests for
>      > java/foreign.
>      >
>      > The benchmarking is quite promising, but I'll need to look more
>      > closely at it - it doesn't scale better on all platforms.
>      >
>      > Thanks,
>      >     Stuart
>      >
>      >
>      >
>      >
>      > On 08/12/2025 19:45, Chris Vest wrote:
>      >> For what it's worth, in Netty we implement our reference counting
>      >> with incrementing by 2 instead of 1, and use the low odd bit to
>      >> indicate the released state.
>      >> This allows us to acquire using getAndAdd, which scales much better
>      >> than a CAS loop.
>      >> Unfortunately we still need to use a CAS loop when implementing
>      >> release, so that still has contention problems.
>      >>
>      >> For reference:
>      >> https://urldefense.com/v3/__https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/common/
>     src/main/java/io/__;!!ACWV5N9M2RV99hQ!JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-
>     IteJDrpe0GexRIhlFIKg6zUpWV3sr6DvTDI0$ <https://urldefense.com/v3/__https://github.com/netty/netty/
>     blob/2b29b5e87656203fecd1732ffb472a366a1918cc/common/src/main/java/io/__;!!ACWV5N9M2RV99hQ!
>     JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-IteJDrpe0GexRIhlFIKg6zUpWV3sr6DvTDI0$>
>      >> netty/util/internal/RefCnt.java#L258-L295
>      >> <https://urldefense.com/v3/__https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/__;!!
>     ACWV5N9M2RV99hQ!JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-IteJDrpe0GexRIhlFIKg6zUpWV3sr9MgkwZk$
>     <https://urldefense.com/v3/__https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/__;!!
>     ACWV5N9M2RV99hQ!JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-IteJDrpe0GexRIhlFIKg6zUpWV3sr9MgkwZk$>
>      >> common/src/main/java/io/netty/util/internal/RefCnt.java#L258-L295>
>      >>
>      >> On Mon, Dec 8, 2025 at 10:42 AM Maurizio Cimadamore
>      >> <maurizio.cimadamore at oracle.com <mailto:maurizio.cimadamore at oracle.com>
>      >> <mailto:maurizio.cimadamore at oracle.com <mailto:maurizio.cimadamore at oracle.com>>> wrote:
>      >>
>      >>
>      >>      > sum() is really just a snapshot, it adds up the counters
>      >> (Cells), so
>      >>      > it wouldn't ensure the counter was at zero. Immediately after
>      >>      > returning zero a thread could have already incremented it.
>      >>     Yes. What I mean is: you can check if close() should throw
>      >> because of
>      >>     pending acquires. But, as I said, we can use that in any way to
>      >> "block"
>      >>     other acquires from happening in case we _do_ want to close. Which
>      >>     leaves us exposed.
>      >>      >
>      >>      >
>      >>      >> For the purpose of implementation clarity -- would it be
>      >> useful to
>      >>      >> wrap the various counters plus logic to acquire/ release (and
>      >>      >> "closing" state) into a separate abstraction, which is then
>      >> used by
>      >>      >> SharedMemorySession? A sort of "atomic" LongAdder, if you
>      >> will :-)
>      >>      >>
>      >>      >> That might make it easier to verify the correctness of the
>      >>      >> implementation, by validating each aspect (the atomic long
>      >> adder, and
>      >>      >> its use from SharedMemorySession) separately.
>      >>      >
>      >>      > Sure, that would be a bit cleaner, thanks.
>      >>
>      >>     Thanks.
>      >>
>      >>
>      >>     Maurizio
>      >>
>      >
> 



More information about the panama-dev mailing list