RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException

Tue May 30 15:37:18 UTC 2017

On Mon, May 29, 2017 at 7:56 PM, Erik Osterlund
<erik.osterlund at oracle.com> wrote:
> Hi Volker,
>
> Thank you for filling in more compiler info.
>
> If there is a choice between providing a new compiler barrier interface and defining its semantics vs using existing volatile semantics, then volatile semantics seems better to me.
>
> Also, my new Access API allows you to do BasicAccess<MO_VOLATILE>::load_oop(addr) to perform load_heap_oop and load_decode_heap_oop with volatile semantics. Sounds like that would help here.

Sorry for my ignorance, but what is the "new Access API" and
"BasicAccess<MO_VOLATILE>"? It actually sounds quite interesting :)

>
> Thanks,
> /Erik
>
>> On 29 May 2017, at 19:02, Volker Simonis <volker.simonis at gmail.com> wrote:
>>
>> Hi Erik,
>>
>> thanks for the nice summary. Just for the sake of completeness, here's
>> the corresponding documentation for the xlc compiler barrier [1]. It
>> kind of implements the gcc syntax, but the wording is slightly
>> different:
>>
>> "Add memory to the list of clobbered registers if assembler
>> instructions can change a memory location in an unpredictable fashion.
>> The memory clobber ensures that the data used after the completion of
>> the assembly statement is valid and synchronized.
>> However, the memory clobber can result in many unnecessary reloads,
>> reducing the benefits of hardware prefetching. Thus, the memory
>> clobber can impose a performance penalty and should be used with
>> caution."
>>
>> We haven't used it until now, so I can not say if it really does what
>> it is supposed to do. I'm also concerned about the performance
>> warning. It seems like the "unnecessary reloads" can really hurt on
>> architectures like ppc which have much more registers than x86.
>> Declaring a memory location 'volatile' seems much more simple and
>> light-weight in order to achieve the desired effect. So I tend to
>> agree with you and David that we should proceed to mark things with
>> 'volatile'.
>>
>> Sorry for constantly "spamming" this thread with another problem (i.e.
>> JDK-8129440 [2]) but I still think that it is related and important.
>> In its current state, the way how "load_heap_oop()" and its
>> application works is broken. And this is not because of a problem in
>> OrderAccess, but because of missing compiler barriers:
>>
>> static inline oop       load_heap_oop(oop* p)       { return *p; }
>> ...
>> template <class T>
>> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>>  // 1. load 'heap_oop' from 'p'
>>  T heap_oop = oopDesc::load_heap_oop(p);
>>  if (!oopDesc::is_null(heap_oop)) {
>>    // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
>>    oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
>>    HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>>    _cm->grayRoot(obj, hr);
>>  }
>> }
>>
>> Notice that we don't need memory barriers here - all we need is to
>> prevent the compiler from loading the oop (i.e. 'heap_oop') a second
>> time. After Andrews explanation (thanks for that!) and Martin's
>> examples from Google, I think we could fix this by rewriting
>> 'load_heap_oop()' (and friends) as follows:
>>
>> static inline oop load_heap_oop(oop* p) {
>>  oop o = *p;
>>  __asm__ volatile ("" : : : "memory");
>>  return o;
>> }
>>
>> In order to make this consistent across all platforms, we would
>> probably have to introduce a new, public "compiler barrier" function
>> in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
>> don't currently seem to have a cross-platform concept for
>> "compiler-only barriers"). But I'm still not convinced that it would
>> be better than simply writing (and that's the way how we've actually
>> solved it internally):
>>
>> static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }
>>
>> Declaring that single memory location to be 'volatile' seems to be a
>> much more local change compared to globally "clobbering" all the
>> memory. And it doesn't rely on a the compilers providing a compiler
>> barrier. It does however rely on the compiler doing the "right thing"
>> for volatile - but after all what has been said here so far, that
>> seems more likely?
>>
>> The problem may also depend on the specific compiler/cpu combination.
>> For ppc64, both gcc (on linux) and xlc (on aix), do the right thing
>> for volatile variables - they don't insert any memory barriers (i.e.
>> no instructions) but just access the corresponding variables as if
>> there was a compiler barrier. This is exactly what we currently want
>> in HotSpot, because fine-grained control of memory barriers is
>> controlled by the use of OrderAccess (and OrderAccess implies
>> "compiler barrier", at least after the latest fixes).
>>
>> Any thoughts? Should we introduce a cross-platform, "compiler-only
>> barrier" or should we stick to using "volatile" for such cases?
>>
>> Regards,
>> Volker
>>
>> [1] https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/language_ref/asm.html
>> [2] https://bugs.openjdk.java.net/browse/JDK-8129440
>>
>> On Mon, May 29, 2017 at 2:20 PM, Erik Österlund
>> <erik.osterlund at oracle.com> wrote:
>>> Hi Andrew,
>>>
>>> I just thought I'd put my opinions in here as I see I have been mentioned a
>>> few times already.
>>>
>>> First of all, I find using the volatile keyword on things that are involved
>>> in lock-free protocols meaningful from a readability point of view. It
>>> allows the reader of the code to see care is needed here.
>>>
>>> About the compiler barriers - you are right. Volatile should indeed not be
>>> necessary if the compiler barriers do everything right. The compiler should
>>> not reorder things and it should not prevent reloading.
>>>
>>> On windows we rely on the deprecated _ReadWriteBarrier(). According to MSDN,
>>> it guarantees:
>>>
>>> "The _ReadWriteBarrier intrinsic limits the compiler optimizations that can
>>> remove or reorder memory accesses across the point of the call."
>>>
>>> This should cut it.
>>>
>>> The GCC memory clobber is defined as:
>>>
>>> "The "memory" clobber tells the compiler that the assembly code performs
>>> memory reads or writes to items other than those listed in the input and
>>> output operands (for example, accessing the memory pointed to by one of the
>>> input parameters). To ensure memory contains correct values, GCC may need to
>>> flush specific register values to memory before executing the asm. Further,
>>> the compiler does not assume that any values read from memory before an asm
>>> remain unchanged after that asm; it reloads them as needed. Using the
>>> "memory" clobber effectively forms a read/write memory barrier for the
>>> compiler."
>>>
>>> This seems to only guarantee values will not be re-ordered. But in the
>>> documentation for ExtendedAsm it also states:
>>>
>>> "You will also want to add the volatile keyword if the memory affected is
>>> not listed in the inputs or outputs of the asm, as the `memory' clobber does
>>> not count as a side-effect of the asm."
>>>
>>> and
>>>
>>> "The volatile keyword indicates that the instruction has important
>>> side-effects. GCC will not delete a volatile asm if it is reachable. (The
>>> instruction can still be deleted if GCC can prove that control-flow will
>>> never reach the location of the instruction.) Note that even a volatile asm
>>> instruction can be moved relative to other code, including across jump
>>> instructions."
>>>
>>> This is a bit vague, but seems to suggest that by making the asm statement
>>> volatile and having a memory clobber, it definitely will not reload
>>> variables. About not re-ordering non-volatile accesses, it shouldn't but it
>>> is not quite clearly stated. I have never observed such a re-ordering across
>>> a volatile memory clobber. But the semantics seem a bit vague.
>>>
>>> As for clang, the closest to a definition of what it does I have seen is:
>>>
>>> "A clobber constraint is indicated by a “~” prefix. A clobber does not
>>> consume an input operand, nor generate an output. Clobbers cannot use any of
>>> the general constraint code letters – they may use only explicit register
>>> constraints, e.g. “~{eax}”. The one exception is that a clobber string of
>>> “~{memory}” indicates that the assembly writes to arbitrary undeclared
>>> memory locations – not only the memory pointed to by a declared indirect
>>> output."
>>>
>>> Apart from sweeping statements saying clang inline assembly is largely
>>> compatible and working similar to GCC, I have not seen clear guarantees. And
>>> then there are more compilers.
>>>
>>> As a conclusion, by using volatile in addition to OrderAccess you rely on
>>> standardized compiler semantics (at least for volatile-to-volatile
>>> re-orderings and re-loading, but not for volatile-to-nonvolatile, but that's
>>> another can of worms), and regrettably if you rely on OrderAccess memory
>>> model doing what it says it will do, then it should indeed work without
>>> volatile, but to make that work, OrderAccess relies on non-standardized
>>> compiler-specific barriers. In practice it should work well on all our
>>> supported compilers without volatile. And if it didn't, it would indeed be a
>>> bug in OrderAccess that needs to be solved in OrderAccess.
>>>
>>> Personally though, I am a helmet-on-synchronization kind of person, so I
>>> would take precaution anyway and use volatile whenever possible, because 1)
>>> it makes the code more readable, and 2) it provides one extra layer of
>>> safety that is more standardized. It seems that over the years it has
>>> happened multiple times that we assumed OrderAccess is bullet proof, and
>>> then realized that it wasn't and observed a crash that would never have
>>> happened if the code was written in a helmet-on-synchronization way. At
>>> least that's how I feel about it.
>>>
>>> Now one might argue that by using C++11 atomics that are standardized, all
>>> these problems would go away as we would rely in standardized primitives and
>>> then just trust the compiler. But then there could arise problems when the
>>> C++ compiler decides to be less conservative than we want, e.g. by not doing
>>> fence in sequentially consistent loads to optimize for non-multiple copy
>>> atomic CPUs arguing that IRIW issues that violate sequential consistency are
>>> non-issues in practice. That makes those loads "almost" sequentially
>>> consistent, which might be good enough. But it feels good to have a choice
>>> here to be more conservative. To have the synchronization helmet on.
>>>
>>> Meta summary:
>>> 1) Current OrderAccess without volatile:
>>>  - should work, but relies on compiler-specific not standardized and
>>> sometimes poorly documented compiler barriers.
>>>
>>> 2) Current OrderAccess with volatile:
>>>  - relies on standardized volatile semantics to guarantee compiler
>>> reordering and reloading issues do not occur.
>>>
>>> 3) C++11 Atomic backend for OrderAccess
>>>  - relies on standardized semantics to guarantee compiler and hardware
>>> reordering issues
>>>  - nevertheless isn't always flawless, and when it isn't, it gets painful
>>>
>>> Hope this sheds some light on the trade-offs.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>
>>>> On 2017-05-28 10:45, Andrew Haley wrote:
>>>>
>>>>> On 27/05/17 10:10, Volker Simonis wrote:
>>>>>
>>>>>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>>>>>>
>>>>>>> On 26/05/17 17:03, Volker Simonis wrote:
>>>>>>>
>>>>>>> Volatile not only prevents reordering by the compiler. It also
>>>>>>> prevents other, otherwise legal transformations/optimizations (like
>>>>>>> for example reloading a variable [1]) which have to be prevented in
>>>>>>> order to write correct, lock free programs.
>>>>>>
>>>>>> Yes, but so do compiler barriers.
>>>>>
>>>>> Please correct me if I'm wrong, but I thought "compiler barriers" are
>>>>> to prevent reordering by the compiler. However, this is a question of
>>>>> optimization. If you have two subsequent loads from the same address,
>>>>> the compiler is free to do only the first load and keep the value in a
>>>>> register if the address is not pointing to a volatile value.
>>>>
>>>> No it isn't: that is precisely what a compiler barrier prevents.  A
>>>> compiler barrier (from the POV of the compiler) clobbers all of
>>>> the memory state.  Neither reads nor writes may move past a compiler
>>>> barrier.
>>>>
>>>
>