RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException

Mon May 29 17:02:25 UTC 2017

Hi Erik,

thanks for the nice summary. Just for the sake of completeness, here's
the corresponding documentation for the xlc compiler barrier [1]. It
kind of implements the gcc syntax, but the wording is slightly
different:

"Add memory to the list of clobbered registers if assembler
instructions can change a memory location in an unpredictable fashion.
The memory clobber ensures that the data used after the completion of
the assembly statement is valid and synchronized.
However, the memory clobber can result in many unnecessary reloads,
reducing the benefits of hardware prefetching. Thus, the memory
clobber can impose a performance penalty and should be used with
caution."

We haven't used it until now, so I can not say if it really does what
it is supposed to do. I'm also concerned about the performance
warning. It seems like the "unnecessary reloads" can really hurt on
architectures like ppc which have much more registers than x86.
Declaring a memory location 'volatile' seems much more simple and
light-weight in order to achieve the desired effect. So I tend to
agree with you and David that we should proceed to mark things with
'volatile'.

Sorry for constantly "spamming" this thread with another problem (i.e.
JDK-8129440 [2]) but I still think that it is related and important.
In its current state, the way how "load_heap_oop()" and its
application works is broken. And this is not because of a problem in
OrderAccess, but because of missing compiler barriers:

static inline oop       load_heap_oop(oop* p)       { return *p; }
...
template <class T>
inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
  // 1. load 'heap_oop' from 'p'
  T heap_oop = oopDesc::load_heap_oop(p);
  if (!oopDesc::is_null(heap_oop)) {
    // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
    oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
    HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
    _cm->grayRoot(obj, hr);
  }
}

Notice that we don't need memory barriers here - all we need is to
prevent the compiler from loading the oop (i.e. 'heap_oop') a second
time. After Andrews explanation (thanks for that!) and Martin's
examples from Google, I think we could fix this by rewriting
'load_heap_oop()' (and friends) as follows:

static inline oop load_heap_oop(oop* p) {
  oop o = *p;
  __asm__ volatile ("" : : : "memory");
  return o;
}

In order to make this consistent across all platforms, we would
probably have to introduce a new, public "compiler barrier" function
in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
don't currently seem to have a cross-platform concept for
"compiler-only barriers"). But I'm still not convinced that it would
be better than simply writing (and that's the way how we've actually
solved it internally):

static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }

Declaring that single memory location to be 'volatile' seems to be a
much more local change compared to globally "clobbering" all the
memory. And it doesn't rely on a the compilers providing a compiler
barrier. It does however rely on the compiler doing the "right thing"
for volatile - but after all what has been said here so far, that
seems more likely?

The problem may also depend on the specific compiler/cpu combination.
For ppc64, both gcc (on linux) and xlc (on aix), do the right thing
for volatile variables - they don't insert any memory barriers (i.e.
no instructions) but just access the corresponding variables as if
there was a compiler barrier. This is exactly what we currently want
in HotSpot, because fine-grained control of memory barriers is
controlled by the use of OrderAccess (and OrderAccess implies
"compiler barrier", at least after the latest fixes).

Any thoughts? Should we introduce a cross-platform, "compiler-only
barrier" or should we stick to using "volatile" for such cases?

Regards,
Volker

[1] https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/language_ref/asm.html
[2] https://bugs.openjdk.java.net/browse/JDK-8129440

On Mon, May 29, 2017 at 2:20 PM, Erik Österlund
<erik.osterlund at oracle.com> wrote:
> Hi Andrew,
>
> I just thought I'd put my opinions in here as I see I have been mentioned a
> few times already.
>
> First of all, I find using the volatile keyword on things that are involved
> in lock-free protocols meaningful from a readability point of view. It
> allows the reader of the code to see care is needed here.
>
> About the compiler barriers - you are right. Volatile should indeed not be
> necessary if the compiler barriers do everything right. The compiler should
> not reorder things and it should not prevent reloading.
>
> On windows we rely on the deprecated _ReadWriteBarrier(). According to MSDN,
> it guarantees:
>
> "The _ReadWriteBarrier intrinsic limits the compiler optimizations that can
> remove or reorder memory accesses across the point of the call."
>
> This should cut it.
>
> The GCC memory clobber is defined as:
>
> "The "memory" clobber tells the compiler that the assembly code performs
> memory reads or writes to items other than those listed in the input and
> output operands (for example, accessing the memory pointed to by one of the
> input parameters). To ensure memory contains correct values, GCC may need to
> flush specific register values to memory before executing the asm. Further,
> the compiler does not assume that any values read from memory before an asm
> remain unchanged after that asm; it reloads them as needed. Using the
> "memory" clobber effectively forms a read/write memory barrier for the
> compiler."
>
> This seems to only guarantee values will not be re-ordered. But in the
> documentation for ExtendedAsm it also states:
>
> "You will also want to add the volatile keyword if the memory affected is
> not listed in the inputs or outputs of the asm, as the `memory' clobber does
> not count as a side-effect of the asm."
>
> and
>
> "The volatile keyword indicates that the instruction has important
> side-effects. GCC will not delete a volatile asm if it is reachable. (The
> instruction can still be deleted if GCC can prove that control-flow will
> never reach the location of the instruction.) Note that even a volatile asm
> instruction can be moved relative to other code, including across jump
> instructions."
>
> This is a bit vague, but seems to suggest that by making the asm statement
> volatile and having a memory clobber, it definitely will not reload
> variables. About not re-ordering non-volatile accesses, it shouldn't but it
> is not quite clearly stated. I have never observed such a re-ordering across
> a volatile memory clobber. But the semantics seem a bit vague.
>
> As for clang, the closest to a definition of what it does I have seen is:
>
> "A clobber constraint is indicated by a “~” prefix. A clobber does not
> consume an input operand, nor generate an output. Clobbers cannot use any of
> the general constraint code letters – they may use only explicit register
> constraints, e.g. “~{eax}”. The one exception is that a clobber string of
> “~{memory}” indicates that the assembly writes to arbitrary undeclared
> memory locations – not only the memory pointed to by a declared indirect
> output."
>
> Apart from sweeping statements saying clang inline assembly is largely
> compatible and working similar to GCC, I have not seen clear guarantees. And
> then there are more compilers.
>
> As a conclusion, by using volatile in addition to OrderAccess you rely on
> standardized compiler semantics (at least for volatile-to-volatile
> re-orderings and re-loading, but not for volatile-to-nonvolatile, but that's
> another can of worms), and regrettably if you rely on OrderAccess memory
> model doing what it says it will do, then it should indeed work without
> volatile, but to make that work, OrderAccess relies on non-standardized
> compiler-specific barriers. In practice it should work well on all our
> supported compilers without volatile. And if it didn't, it would indeed be a
> bug in OrderAccess that needs to be solved in OrderAccess.
>
> Personally though, I am a helmet-on-synchronization kind of person, so I
> would take precaution anyway and use volatile whenever possible, because 1)
> it makes the code more readable, and 2) it provides one extra layer of
> safety that is more standardized. It seems that over the years it has
> happened multiple times that we assumed OrderAccess is bullet proof, and
> then realized that it wasn't and observed a crash that would never have
> happened if the code was written in a helmet-on-synchronization way. At
> least that's how I feel about it.
>
> Now one might argue that by using C++11 atomics that are standardized, all
> these problems would go away as we would rely in standardized primitives and
> then just trust the compiler. But then there could arise problems when the
> C++ compiler decides to be less conservative than we want, e.g. by not doing
> fence in sequentially consistent loads to optimize for non-multiple copy
> atomic CPUs arguing that IRIW issues that violate sequential consistency are
> non-issues in practice. That makes those loads "almost" sequentially
> consistent, which might be good enough. But it feels good to have a choice
> here to be more conservative. To have the synchronization helmet on.
>
> Meta summary:
> 1) Current OrderAccess without volatile:
>   - should work, but relies on compiler-specific not standardized and
> sometimes poorly documented compiler barriers.
>
> 2) Current OrderAccess with volatile:
>   - relies on standardized volatile semantics to guarantee compiler
> reordering and reloading issues do not occur.
>
> 3) C++11 Atomic backend for OrderAccess
>   - relies on standardized semantics to guarantee compiler and hardware
> reordering issues
>   - nevertheless isn't always flawless, and when it isn't, it gets painful
>
> Hope this sheds some light on the trade-offs.
>
> Thanks,
> /Erik
>
>
> On 2017-05-28 10:45, Andrew Haley wrote:
>>
>> On 27/05/17 10:10, Volker Simonis wrote:
>>>
>>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>>>>
>>>> On 26/05/17 17:03, Volker Simonis wrote:
>>>>
>>>>> Volatile not only prevents reordering by the compiler. It also
>>>>> prevents other, otherwise legal transformations/optimizations (like
>>>>> for example reloading a variable [1]) which have to be prevented in
>>>>> order to write correct, lock free programs.
>>>>
>>>> Yes, but so do compiler barriers.
>>>
>>> Please correct me if I'm wrong, but I thought "compiler barriers" are
>>> to prevent reordering by the compiler. However, this is a question of
>>> optimization. If you have two subsequent loads from the same address,
>>> the compiler is free to do only the first load and keep the value in a
>>> register if the address is not pointing to a volatile value.
>>
>> No it isn't: that is precisely what a compiler barrier prevents.  A
>> compiler barrier (from the POV of the compiler) clobbers all of
>> the memory state.  Neither reads nor writes may move past a compiler
>> barrier.
>>
>