Atomic operations: your thoughts are welocme
Andrew Haley
aph at redhat.com
Thu Feb 11 13:33:30 UTC 2021
On 11/02/2021 03:59, Kim Barrett wrote:
>> On Feb 8, 2021, at 1:14 PM, Andrew Haley <aph at redhat.com> wrote:
>>
>> I've been looking at the hottest Atomic operations in HotSpot, with a view to
>> finding out if the default memory_order_conservative (which is very expensive
>> on some architectures) can be weakened to something less. It's impossible to
>> fix all of them, but perhaps we can fix some of the most frequent.
>
> Is there any information about the possible performance improvement from
> such changes? 1.5-3M occurrences doesn't mean much without context.
>
> We don't presently have support for sequentially consistent semantics, only
> "conservative". My recollection is that this is in part because there might
> be code that is assuming the possibly stronger "conservative" semantics, and
> in part because there are different and incompatible approaches to
> implementing sequentially consistent semantics on some hardware platforms
> and we didn't want to make assumptions there.
>
> We also don't presently have any cmpxchg implementation that really supports
> anything between conservative and relaxed, nor do we support different order
> constraints for the success vs failure cases. Things can be complicated
> enough as is; while we *could* fill some of that in, I'm not sure we should.
OK. However, even though we don't implement any of them, we do have an
API that includes acq, rel, and seq_cst. The fact that we don't have
anything behind them is, I thought, To Be Done rather than Won't Do.
>> <OopOopIterateDispatch<G1CMOopClosure>::Table::oop_oop_iterate<InstanceKlass, narrowOop>(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178
>>
>> This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this
>> need to be memory_order_conservative, or would something weaker do? Even
>> acq_rel or seq_cst would be better.
>
> I think for setting bits in a bitmap the thing to do would be to identify
> places that are safe and useful (impacts performance) to do so first. Then
> add a weaker variant for use in those places, assuming any are found.
I see. I'm assuming that frequency of use is a useful proxy for impact.
Aleksey has already, very helpfully, measured how significant these are
for Shenandoah, and I suspect all concurrent GCs would benefit in a
similar fashion.
>> <Symbol::decrement_refcount()+44>: :: = 2376632
>> <Symbol::increment_refcount()+44>: :: = 2003895
>>
>> I can't imagine that either of these actually need memory_order_conservative,
>> they're just reference counts.
>
> The "usual" refcount implementation involves relaxed increment and stronger
> ordering for decrement. (If I'm remembering correctly, dec-acquire and a
> release fence on the zero value path before deleting. But I've not thought
> about what one might want for this CAS-based variant that handles boundary
> cases specially.) And as you say, whether any of these could be weakened
> depends on whether there is any code surrounding a use that depends on the
> stronger ordering semantics. At a guess I suspect increment could be changed
> to relaxed, but I've not looked. This one is probably a question for runtime
> folks.
OK, this makes sense. I'm thinking of the long road to getting this stuff
documented so that we can see what side effects of atomic operations are
actually required.
>> <OtherRegionsTable::add_reference(void*, unsigned int)+248>: :: = 1719614
>>
>> BitMap::par_set_bit again.
>>
>> <G1ParScanThreadState::steal_and_trim_queue(GenericTaskQueueSet<OverflowTaskQueue<ScannerTask, (MEMFLAGS)5, 131072u>, (MEMFLAGS)5>*)+432>: :: = 1617659
>>
>> This one is GenericTaskQueue::pop_global calling cmpxchg_age().
>> Again, do we need conservative here?
>
> This needs at least sequentially consistent semantics on the success path.
Yep. That's easy, it's the full barrier in the failure path that
I'd love to eliminate.
> See the referenced paper by Le, et al.
>
> There is also a cmpxchg_age in pop_local_slow. The Le, et al paper doesn't
> deal with that path. But it's also not in your list, which is good since
> this is supposed to be infrequently taken.
Right. I'm trying to concentrate on the low-hanging fruit.
Thank you for the very detailed and informative reply.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the hotspot-gc-dev
mailing list