RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException

Mon May 29 12:20:26 UTC 2017

Hi Andrew,

I just thought I'd put my opinions in here as I see I have been 
mentioned a few times already.

First of all, I find using the volatile keyword on things that are 
involved in lock-free protocols meaningful from a readability point of 
view. It allows the reader of the code to see care is needed here.

About the compiler barriers - you are right. Volatile should indeed not 
be necessary if the compiler barriers do everything right. The compiler 
should not reorder things and it should not prevent reloading.

On windows we rely on the deprecated _ReadWriteBarrier(). According to 
MSDN, it guarantees:

"The _ReadWriteBarrier intrinsic limits the compiler optimizations that 
can remove or reorder memory accesses across the point of the call."

This should cut it.

The GCC memory clobber is defined as:

"The "memory" clobber tells the compiler that the assembly code performs 
memory reads or writes to items other than those listed in the input and 
output operands (for example, accessing the memory pointed to by one of 
the input parameters). To ensure memory contains correct values, GCC may 
need to flush specific register values to memory before executing the 
asm. Further, the compiler does not assume that any values read from 
memory before an asm remain unchanged after that asm; it reloads them as 
needed. Using the "memory" clobber effectively forms a read/write memory 
barrier for the compiler."

This seems to only guarantee values will not be re-ordered. But in the 
documentation for ExtendedAsm it also states:

"You will also want to add the volatile keyword if the memory affected 
is not listed in the inputs or outputs of the asm, as the `memory' 
clobber does not count as a side-effect of the asm."

and

"The volatile keyword indicates that the instruction has important 
side-effects. GCC will not delete a volatile asm if it is reachable. 
(The instruction can still be deleted if GCC can prove that control-flow 
will never reach the location of the instruction.) Note that even a 
volatile asm instruction can be moved relative to other code, including 
across jump instructions."

This is a bit vague, but seems to suggest that by making the asm 
statement volatile and having a memory clobber, it definitely will not 
reload variables. About not re-ordering non-volatile accesses, it 
shouldn't but it is not quite clearly stated. I have never observed such 
a re-ordering across a volatile memory clobber. But the semantics seem a 
bit vague.

As for clang, the closest to a definition of what it does I have seen is:

"A clobber constraint is indicated by a “~” prefix. A clobber does not 
consume an input operand, nor generate an output. Clobbers cannot use 
any of the general constraint code letters – they may use only explicit 
register constraints, e.g. “~{eax}”. The one exception is that a clobber 
string of “~{memory}” indicates that the assembly writes to arbitrary 
undeclared memory locations – not only the memory pointed to by a 
declared indirect output."

Apart from sweeping statements saying clang inline assembly is largely 
compatible and working similar to GCC, I have not seen clear guarantees. 
And then there are more compilers.

As a conclusion, by using volatile in addition to OrderAccess you rely 
on standardized compiler semantics (at least for volatile-to-volatile 
re-orderings and re-loading, but not for volatile-to-nonvolatile, but 
that's another can of worms), and regrettably if you rely on OrderAccess 
memory model doing what it says it will do, then it should indeed work 
without volatile, but to make that work, OrderAccess relies on 
non-standardized compiler-specific barriers. In practice it should work 
well on all our supported compilers without volatile. And if it didn't, 
it would indeed be a bug in OrderAccess that needs to be solved in 
OrderAccess.

Personally though, I am a helmet-on-synchronization kind of person, so I 
would take precaution anyway and use volatile whenever possible, because 
1) it makes the code more readable, and 2) it provides one extra layer 
of safety that is more standardized. It seems that over the years it has 
happened multiple times that we assumed OrderAccess is bullet proof, and 
then realized that it wasn't and observed a crash that would never have 
happened if the code was written in a helmet-on-synchronization way. At 
least that's how I feel about it.

Now one might argue that by using C++11 atomics that are standardized, 
all these problems would go away as we would rely in standardized 
primitives and then just trust the compiler. But then there could arise 
problems when the C++ compiler decides to be less conservative than we 
want, e.g. by not doing fence in sequentially consistent loads to 
optimize for non-multiple copy atomic CPUs arguing that IRIW issues that 
violate sequential consistency are non-issues in practice. That makes 
those loads "almost" sequentially consistent, which might be good 
enough. But it feels good to have a choice here to be more conservative. 
To have the synchronization helmet on.

Meta summary:
1) Current OrderAccess without volatile:
   - should work, but relies on compiler-specific not standardized and 
sometimes poorly documented compiler barriers.

2) Current OrderAccess with volatile:
   - relies on standardized volatile semantics to guarantee compiler 
reordering and reloading issues do not occur.

3) C++11 Atomic backend for OrderAccess
   - relies on standardized semantics to guarantee compiler and hardware 
reordering issues
   - nevertheless isn't always flawless, and when it isn't, it gets painful

Hope this sheds some light on the trade-offs.

Thanks,
/Erik

On 2017-05-28 10:45, Andrew Haley wrote:
> On 27/05/17 10:10, Volker Simonis wrote:
>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>>> On 26/05/17 17:03, Volker Simonis wrote:
>>>
>>>> Volatile not only prevents reordering by the compiler. It also
>>>> prevents other, otherwise legal transformations/optimizations (like
>>>> for example reloading a variable [1]) which have to be prevented in
>>>> order to write correct, lock free programs.
>>> Yes, but so do compiler barriers.
>> Please correct me if I'm wrong, but I thought "compiler barriers" are
>> to prevent reordering by the compiler. However, this is a question of
>> optimization. If you have two subsequent loads from the same address,
>> the compiler is free to do only the first load and keep the value in a
>> register if the address is not pointing to a volatile value.
> No it isn't: that is precisely what a compiler barrier prevents.  A
> compiler barrier (from the POV of the compiler) clobbers all of
> the memory state.  Neither reads nor writes may move past a compiler
> barrier.
>