RFR: 8186838: Generalize Atomic::inc/dec with templates

Mon Sep 4 09:50:14 UTC 2017

Hi Robbin,

I agree that on x86, there isn't a whole lot of other things the 
compiler could do with the intrinsics than what we want it to do due to 
the relatively strong memory model of the machine. So this might be a 
possible simplification on x86 gcc/clang targets (but still not all x86 
targets).

As for PPC and ARMv7 though, that is not true any longer. For example, 
our conservative memory model is more conservative than seq_cst 
semantics. E.g. it also has "leading sync" semantics always guaranteed, 
which is exploited in our code base and would be broken if translated 
simply as seq_cst. Also, since the fencing from the C++ compiler must be 
compliant with what our code generation does, they could end up being 
incompatible due to choice of different fencing conventions. Intrinsic 
provided operations may or may not have leading sync semantics. We can 
hope for it, but we should never rely on it.

Thanks,
/Erik

On 2017-09-04 11:34, Robbin Ehn wrote:
> Hi,
>
> On 09/02/2017 10:31 AM, Andrew Haley wrote:
>> On 01/09/17 15:15, Erik Österlund wrote:
>>> It is not the simplest solution I can think of. The simplest solution I
>>> can think of is to remove all specialized versions of Atomic::inc/dec
>>> and just have it call Atomic::add directly. That would remove the
>>> optimizations we have today, for whatever reason we have them. It would
>>> lead to slightly more conservative fencing on PPC/S390,
>>
>> I see.  Can you say what instructions would be different?
>>
>>> and would lead to slightly less optimal machine encoding on x86
>>> (without immediate values in the instructions). But it would be
>>> simpler for sure. I did not put any judgement into whether our
>>> existing optimizations are worthwhile or not. But if you want to
>>> prioritize simplicity, removing those optimizations is one possible
>>> solution. Would you prefer that?
>>
>> Is this really about optimization?  If we cared about getting this
>> stuff as optimized as possible we'd use intrinsics on GCC/x86 targets.
>> These have been supported for a long time.  But it seems we're
>> determined to preserve the legacy assembly-language implementations
>> and use them everywhere, even where they are not necessary.
>>
>
> Why not use gcc/clang intrinsic on for all platforms we use gcc/clang? 
> (not just gcc/x86)
> For "__atomic_fetch_add (&value, inc, __ATOMIC_RELAXED);"
> gcc seem to generate "lock addl" on x86 and armv8 ldxr,stxr, with 
> acq_rel ldaxr,stlxr, which is what I would expect.
>
> And thus we can remove a lot of code!
>
> (if we should have the relaxed version in API is another question)
>
> /Robbin