Improve scaling of downcalls using MemorySegments allocated with shared arenas
Stuart Monteith
stuart.monteith at arm.com
Sun Dec 14 23:07:08 UTC 2025
What I'm finding with the getAndAdd version is there is often an improvement, but the split counting, the multicounting
as I called it, is much better in terms of performance (I'll share them in a week). I've tried to avoid weird issues
with the split counting by having the code as simple as I could make it. Keeping the states consistent is important - if
the code is in the middle of closing, it is important that getting the state of the counter pauses while that is decided.
BR,
Stuart
On 12/12/2025 19:25, Chris Vest wrote:
> Yeah, we previously also tried split counting, but reverted it because we observed some weird rare issues, and got
> suspicious of it.
>
> On Wed, Dec 10, 2025 at 8:21 AM Maurizio Cimadamore <maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
> What I like (a lot) about this is that now we're back to using the same
> "bit" of information for both liveness and acquire count (IIUC). If
> that's the case, it would be much simpler to convince ourselves this is
> correct.
>
> Thanks
> Maurizio
>
> On 10/12/2025 14:48, Stuart Monteith wrote:
> > Thanks Chris,
> > I've taken a look and implemented SharedSession with something
> > similar to your RefCnt. One of the differences with SharedSession is
> > that we have a separate close method. I can implement acquire0 with
> > getAndAdd(2), release0 with getAndAdd(-2) and close with
> > compareAndSwap(0, 1). With the additional tests against 0x80000001 for
> > acquire0 and release0, I have something that passes the unit tests for
> > java/foreign.
> >
> > The benchmarking is quite promising, but I'll need to look more
> > closely at it - it doesn't scale better on all platforms.
> >
> > Thanks,
> > Stuart
> >
> >
> >
> >
> > On 08/12/2025 19:45, Chris Vest wrote:
> >> For what it's worth, in Netty we implement our reference counting
> >> with incrementing by 2 instead of 1, and use the low odd bit to
> >> indicate the released state.
> >> This allows us to acquire using getAndAdd, which scales much better
> >> than a CAS loop.
> >> Unfortunately we still need to use a CAS loop when implementing
> >> release, so that still has contention problems.
> >>
> >> For reference:
> >> https://urldefense.com/v3/__https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/common/
> src/main/java/io/__;!!ACWV5N9M2RV99hQ!JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-
> IteJDrpe0GexRIhlFIKg6zUpWV3sr6DvTDI0$ <https://urldefense.com/v3/__https://github.com/netty/netty/
> blob/2b29b5e87656203fecd1732ffb472a366a1918cc/common/src/main/java/io/__;!!ACWV5N9M2RV99hQ!
> JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-IteJDrpe0GexRIhlFIKg6zUpWV3sr6DvTDI0$>
> >> netty/util/internal/RefCnt.java#L258-L295
> >> <https://urldefense.com/v3/__https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/__;!!
> ACWV5N9M2RV99hQ!JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-IteJDrpe0GexRIhlFIKg6zUpWV3sr9MgkwZk$
> <https://urldefense.com/v3/__https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/__;!!
> ACWV5N9M2RV99hQ!JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-IteJDrpe0GexRIhlFIKg6zUpWV3sr9MgkwZk$>
> >> common/src/main/java/io/netty/util/internal/RefCnt.java#L258-L295>
> >>
> >> On Mon, Dec 8, 2025 at 10:42 AM Maurizio Cimadamore
> >> <maurizio.cimadamore at oracle.com <mailto:maurizio.cimadamore at oracle.com>
> >> <mailto:maurizio.cimadamore at oracle.com <mailto:maurizio.cimadamore at oracle.com>>> wrote:
> >>
> >>
> >> > sum() is really just a snapshot, it adds up the counters
> >> (Cells), so
> >> > it wouldn't ensure the counter was at zero. Immediately after
> >> > returning zero a thread could have already incremented it.
> >> Yes. What I mean is: you can check if close() should throw
> >> because of
> >> pending acquires. But, as I said, we can use that in any way to
> >> "block"
> >> other acquires from happening in case we _do_ want to close. Which
> >> leaves us exposed.
> >> >
> >> >
> >> >> For the purpose of implementation clarity -- would it be
> >> useful to
> >> >> wrap the various counters plus logic to acquire/ release (and
> >> >> "closing" state) into a separate abstraction, which is then
> >> used by
> >> >> SharedMemorySession? A sort of "atomic" LongAdder, if you
> >> will :-)
> >> >>
> >> >> That might make it easier to verify the correctness of the
> >> >> implementation, by validating each aspect (the atomic long
> >> adder, and
> >> >> its use from SharedMemorySession) separately.
> >> >
> >> > Sure, that would be a bit cleaner, thanks.
> >>
> >> Thanks.
> >>
> >>
> >> Maurizio
> >>
> >
>
More information about the panama-dev
mailing list