Improve scaling of downcalls using MemorySegments allocated with shared arenas
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Dec 15 12:29:49 UTC 2025
Another possibility would be to split the 64 bit state into, say 8-bit
words.
Each word would act as a counter (up to 256 acquires, theoretically).
This would allow acquire/release/close to work on different parts of the
state field (e.g. by issuing a byte-level CAS at the correct offset),
while still allowing the close operation to atomically CAS the entire
counter.
But, I'm not sure this would allow for better contention, as I'd assume
that byte-level CAS will probably translate to a 64-bit CAS with some
extra bit masking logic on top...
Maurizio
On 14/12/2025 23:07, Stuart Monteith wrote:
> What I'm finding with the getAndAdd version is there is often an
> improvement, but the split counting, the multicounting as I called it,
> is much better in terms of performance (I'll share them in a week).
> I've tried to avoid weird issues with the split counting by having the
> code as simple as I could make it. Keeping the states consistent is
> important - if the code is in the middle of closing, it is important
> that getting the state of the counter pauses while that is decided.
>
> BR,
> Stuart
>
>
>
> On 12/12/2025 19:25, Chris Vest wrote:
>> Yeah, we previously also tried split counting, but reverted it
>> because we observed some weird rare issues, and got suspicious of it.
>>
>> On Wed, Dec 10, 2025 at 8:21 AM Maurizio Cimadamore
>> <maurizio.cimadamore at oracle.com
>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>> What I like (a lot) about this is that now we're back to using
>> the same
>> "bit" of information for both liveness and acquire count (IIUC). If
>> that's the case, it would be much simpler to convince ourselves
>> this is
>> correct.
>>
>> Thanks
>> Maurizio
>>
>> On 10/12/2025 14:48, Stuart Monteith wrote:
>> > Thanks Chris,
>> > I've taken a look and implemented SharedSession with
>> something
>> > similar to your RefCnt. One of the differences with
>> SharedSession is
>> > that we have a separate close method. I can implement acquire0
>> with
>> > getAndAdd(2), release0 with getAndAdd(-2) and close with
>> > compareAndSwap(0, 1). With the additional tests against
>> 0x80000001 for
>> > acquire0 and release0, I have something that passes the unit
>> tests for
>> > java/foreign.
>> >
>> > The benchmarking is quite promising, but I'll need to look more
>> > closely at it - it doesn't scale better on all platforms.
>> >
>> > Thanks,
>> > Stuart
>> >
>> >
>> >
>> >
>> > On 08/12/2025 19:45, Chris Vest wrote:
>> >> For what it's worth, in Netty we implement our reference
>> counting
>> >> with incrementing by 2 instead of 1, and use the low odd bit to
>> >> indicate the released state.
>> >> This allows us to acquire using getAndAdd, which scales much
>> better
>> >> than a CAS loop.
>> >> Unfortunately we still need to use a CAS loop when implementing
>> >> release, so that still has contention problems.
>> >>
>> >> For reference:
>> >>
>> https://urldefense.com/v3/__https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/common/
>> src/main/java/io/__;!!ACWV5N9M2RV99hQ!JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-
>> IteJDrpe0GexRIhlFIKg6zUpWV3sr6DvTDI0$
>> <https://urldefense.com/v3/__https://github.com/netty/netty/
>> blob/2b29b5e87656203fecd1732ffb472a366a1918cc/common/src/main/java/io/__;!!ACWV5N9M2RV99hQ!
>> JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-IteJDrpe0GexRIhlFIKg6zUpWV3sr6DvTDI0$>
>> >> netty/util/internal/RefCnt.java#L258-L295
>> >>
>> <https://urldefense.com/v3/__https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/__;!!
>> ACWV5N9M2RV99hQ!JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-IteJDrpe0GexRIhlFIKg6zUpWV3sr9MgkwZk$
>> <https://urldefense.com/v3/__https://github.com/netty/netty/blob/2b29b5e87656203fecd1732ffb472a366a1918cc/__;!!
>> ACWV5N9M2RV99hQ!JMRywug9hgGI_tWR1jAjiE8gIGbCfu9ZYKUrgzQiG8A3Woj6cYJa4S-ZKJ-IteJDrpe0GexRIhlFIKg6zUpWV3sr9MgkwZk$>
>> >>
>> common/src/main/java/io/netty/util/internal/RefCnt.java#L258-L295>
>> >>
>> >> On Mon, Dec 8, 2025 at 10:42 AM Maurizio Cimadamore
>> >> <maurizio.cimadamore at oracle.com
>> <mailto:maurizio.cimadamore at oracle.com>
>> >> <mailto:maurizio.cimadamore at oracle.com
>> <mailto:maurizio.cimadamore at oracle.com>>> wrote:
>> >>
>> >>
>> >> > sum() is really just a snapshot, it adds up the counters
>> >> (Cells), so
>> >> > it wouldn't ensure the counter was at zero.
>> Immediately after
>> >> > returning zero a thread could have already incremented
>> it.
>> >> Yes. What I mean is: you can check if close() should throw
>> >> because of
>> >> pending acquires. But, as I said, we can use that in any
>> way to
>> >> "block"
>> >> other acquires from happening in case we _do_ want to
>> close. Which
>> >> leaves us exposed.
>> >> >
>> >> >
>> >> >> For the purpose of implementation clarity -- would it be
>> >> useful to
>> >> >> wrap the various counters plus logic to acquire/
>> release (and
>> >> >> "closing" state) into a separate abstraction, which
>> is then
>> >> used by
>> >> >> SharedMemorySession? A sort of "atomic" LongAdder, if
>> you
>> >> will :-)
>> >> >>
>> >> >> That might make it easier to verify the correctness
>> of the
>> >> >> implementation, by validating each aspect (the atomic
>> long
>> >> adder, and
>> >> >> its use from SharedMemorySession) separately.
>> >> >
>> >> > Sure, that would be a bit cleaner, thanks.
>> >>
>> >> Thanks.
>> >>
>> >>
>> >> Maurizio
>> >>
>> >
>>
>
More information about the panama-dev
mailing list