RFR: 8186838: Generalize Atomic::inc/dec with templates

Tue Sep 5 08:04:52 UTC 2017

Hi Andrew,

On 2017-09-04 18:05, Andrew Haley wrote:
> Hi,
>
> On 04/09/17 14:31, Erik Österlund wrote:
>
>> On 2017-09-04 12:41, Andrew Haley wrote:
>>> On 04/09/17 11:26, Erik Österlund wrote:
>>>> 1) I want evidence for this claim. Can you get leading and trailing dmb
>>>> sy (rather than dmb ish) for atomic operations on ARMv7?
>>> I hope not.  There is no reason for us to want such a thing in HotSpot.
>>> But even if we did want such a thing, we could crop down to asm: the
>>> point is the usual cases, not weird corner cases.
>> So we can not emit any fencing we want with GCC intrinsics, let alone
>> the fencing we already have and rely on today on ARMv7.
> There are corner cases, for which asm can be used, yes.

Yes, and unfortunately the corner cases is precisely what we want.

>> The discussion about whether we should relax our ARMv7 fencing or
>> not is a different discussion, and is unrelated to the claim that we
>> can get any fencing we want with GCC intrinsics.
> I accept the point in principle, but I suggest it's a bad example: I
> do not believe that we want DMB SY.

I am glad we at least agree about the point in principle. :)
If you are willing to make a case for relaxing dmb sy, please feel free 
to do so in another RFE. Note however that the choice was all but 
accidental. It was a conscious choice. I would prefer not to derail more 
than we already have in this thread if that is okay.

>> The point is that we can not control the
>> fencing arbitrarily, let alone even get the fencing we have today.
> Arbitrarily, no.  But I guess you'd expect me to point out that
> argument can be flipped on its head: if we'd used intrinsics rather
> than asms the mistake of using DMB SY would have been averted.  You
> can look at this issue in (at least) two ways.  :-)

I accept the point that there is a possibility GCC will mess up but 
likewise also that we will mess up and GCC not. Please note however that 
this does not mean that using GCC intrinsics will allow us to blindly 
trust GCC. The fencing done in our runtime must be reflected in our JIT. 
So anything we have messed up in our fencing that is fixed in GCC but 
not in our code generation, would still end up biting us as long as we 
have a JIT that must be tightly bound to our runtime. Conversely, the 
problem might get worse if different versions of GCC have some fix and 
others do not. And clang that uses the same intrinsics might have subtly 
different behaviour. And our memory model itself is subtly different to 
their memory model, making it incompatible. For example, our atomics 
typically conservatively guarantees bidirectional full fencing, while 
theirs does not.

>
>>>> 2) Even if you could and the compiler happens to generate that - we can
>>>> not rely on it because there is no contract to the compiler what fence
>>>> instructions it elects to use. The only contract the compiler needs to
>>>> abide to is how atomic C++ operations interact with other C++
>>>> operations. And we do not want the underlying fencing to silently change
>>>> when performing compiler upgrades.
>>> There is no way that GCC writers would break ABI compatibility in such a
>>> fundamental way.  There would be a firestorm.  I know this because even
>>> if no-one else started the fire, I would.  I am a GCC author.
>> Thank you for your reassurance. I appreciate that you take ABI
>> compatibility seriously. Yet over the years, the bindings have changed
>> over time as our understanding of implications of the memory model has
>> evolved - especially when mixing stronger and weaker accesses on the
>> same fields.
> Absolutely so, yes, and IMVHO such code should be taken from HotSpot
> and quietly put out of its misery.  Even if the program is correct
> it'll require a lot of analysis, and pity the poor programmer who
> comes across it in a few years' time.

Analysis that still has to be done.

>> Even 2017, there are still papers published about how
>> seq_cst mixed with weaker memory ordering needs fixing in the bindings
>> (cf. "Repairing sequential consistency in C/C++11", PLDI'17), resulting
>> in new bindings with both leading sync and trailing sync conventions
>> being proposed (the choice of convention is up to compiler writers).
> Sure, but there's no way that GCC (or any other serious compiler) is
> going to make changes in a way that isn't at least compatible with
> existing binaries.  Power PC has its problems, mostly due to being
> rather old, and I'm not at all surprised to hear that mistakes have
> been made, given that the language used in the processor definition
> and the language used in the C++ language standard don't map onto each
> other in an obvious way.

Perhaps. But it is nevertheless an abstraction crime to rely on that. A 
reliance I would personally prefer to avoid if possible (and it is).

> But none of this extends to Linux/x86, which has a straightforward
> implementation of all of this stuff.

I must agree here. The x86 ISA does not leave a whole lot to the 
imagination of the implementors of the intrinsics. It's gonna be a 
locked atomic instruction of some sort. There is almost zero risk of 
incompatible ABIs. So that is an abstraction crime I would not object to 
if it is desired down the road.

>> I do not feel confident we can rely on these bindings never
>> changing. As there is no contract or explicit ABI, compiler writers
>> are free to do whatever that is consistent within the boundaries of
>> C++ code and the C++ memory model. The actual ABI is hidden from
>> that contract. And I would not happily embed reliance on
>> intentionally undocumented, implicit, unofficial ABIs that are known
>> to have different fencing conventions that may or may not be
>> compatible with what our generated code requires. Generating the
>> code, disassembling, and then assuming whatever binding was observed
>> in the disassembly is a binding contract, is not a reliable
>> approach.
> I suppose I can understand this difference in opinion because my view
> of GCC is very different from yours: to me it's a white box, not a
> black box, and I certainly wouldn't take the approach of just looking
> at the generated code.

I understand where you are coming from. The reason it is a black box to 
me though is primarily because the lack of explicit contracts with the 
compiler forces me to think of it as a black box, unless I want to 
perform an abstraction crime and start relying on unofficial 
implementation details, knowing there are incompatible ABI proposals 
floating around in papers to this date, for compiler writers to choose 
between.

>
>> If we require a specific fence, then I do not see why we would not
>> simply emit this specific fence that we require explicitly, rather than
>> insisting on using some intrinsic and hoping it will emit that exact
>> fence that we rely on through some implicit, undocumented, unofficial
>> ABI, that may silently change over time. I fail to see the attraction.
> That one is easy: if you tell the compiler what you're doing rather
> than hiding it inside an asm, the compiler can generate better code.
> The resulting program is also much simpler.

Okay. For mer personally, safety and correctness is more important than 
optimal code generation. But I see your point. Your intrinsic will (on 
x86) generate lock addl with immediate operands on invocations to 
Atomic::add with constant addends where we will emit lock xaddl, wasting 
a register with current bindings. For me that seems like an unimportant 
premature optimization, but I hear it has some attraction for some 
people (for reasons I do not understand).

> Also, you avoid the risks inherent in writing inline asms: only
> recently have the x86/Linux asms been corrected to add a memory
> clobber.  This is an extremely serious flaw, and it's been around for
> a very long while.  We're talking about risk, yet your risk is of a
> rather theoretical nature, rather than that one which has already
> happened.  We can, of course, argue that we are where we are, and that
> bug is fixed, so it no longer matters, but it does IMO point to where
> the real risk in using inline asm lies.

Point taken - the risk goes two ways.

> However, having said all of that, let me be clear: while I do not
> believe that the inline asms for each platform are the best way of
> doing this, to change them at this point would be unduly disruptive.
> I am not suggesting that they should be changed now.  I am very
> strongly suggesting that they should be changed in the future, and
> that we should move to using intrinsics.

Perhaps we can evaluate that on a case by case basis. As for x86, I am 
not as opposed to moving to GCC intrinsics. I think it seems like a 
premature optimization and abstraction crime, but I will not oppose it, 
because it can't do much harm in practice (as opposed to theory) I guess.

As for this current RFE, the only voice I have heard against removing 
all Atomic::inc specializations is Kim. He thought it would be nice to 
keep the micro optimization that turned Atomic::inc bindings to lock add 
instead of lock xadd on x86. Perhaps instead we nuke Atomic::inc/dec 
specializations and replace Atomic::add with GCC intrinsics where it is 
available. As mentioned previously, it automatically detects it can do 
lock add instead for Atomic::inc, as well as a few more cases where 
Atomic::add is used with constants that can be embedded into immediate 
values and the result is not used. If we do that, is anyone unhappy with 
the idea of nuking Atomic::inc/dec specializations all together?

Thanks,
/Erik