Improve scaling of downcalls using MemorySegments allocated with shared arenas
Stuart Monteith
stuart.monteith at arm.com
Mon Dec 22 18:11:17 UTC 2025
Hello Maurizio,
My expectation is that while a compare-and-swap would operate correctly on an individual byte, the contention would
still be operating at a cache-line granularity - at the very least there would be queuing of some sort on a CAS on the
individual bytes of the same 64-bit word.
BR,
Staurt
On 15/12/2025 12:29, Maurizio Cimadamore wrote:
> Another possibility would be to split the 64 bit state into, say 8-bit words.
>
> Each word would act as a counter (up to 256 acquires, theoretically).
>
> This would allow acquire/release/close to work on different parts of the state field (e.g. by issuing a byte-level CAS
> at the correct offset), while still allowing the close operation to atomically CAS the entire counter.
>
> But, I'm not sure this would allow for better contention, as I'd assume that byte-level CAS will probably translate to a
> 64-bit CAS with some extra bit masking logic on top...
>
> Maurizio
>
> On 14/12/2025 23:07, Stuart Monteith wrote:
>> What I'm finding with the getAndAdd version is there is often an improvement, but the split counting, the
>> multicounting as I called it, is much better in terms of performance (I'll share them in a week). I've tried to avoid
>> weird issues with the split counting by having the code as simple as I could make it. Keeping the states consistent is
>> important - if the code is in the middle of closing, it is important that getting the state of the counter pauses
>> while that is decided.
>>
>> BR,
>> Stuart
>>
>>
>>
More information about the panama-dev
mailing list