RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException

Tue May 30 15:40:36 UTC 2017

On Mon, May 29, 2017 at 10:55 PM, David Holmes <david.holmes at oracle.com> wrote:
> <trimming>
>
> On 30/05/2017 3:02 AM, Volker Simonis wrote:
>>
>> Sorry for constantly "spamming" this thread with another problem (i.e.
>> JDK-8129440 [2]) but I still think that it is related and important.
>> In its current state, the way how "load_heap_oop()" and its
>> application works is broken. And this is not because of a problem in
>> OrderAccess, but because of missing compiler barriers:
>>
>> static inline oop       load_heap_oop(oop* p)       { return *p; }
>> ...
>> template <class T>
>> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>>    // 1. load 'heap_oop' from 'p'
>>    T heap_oop = oopDesc::load_heap_oop(p);
>>    if (!oopDesc::is_null(heap_oop)) {
>>      // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
>>      oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
>
>
> Do you mean that the compiler has not stashed heap_oop somewhere and
> re-executes oopDesc::load_heap_oop(p) again? That would be quite nasty I
> think in general as it breaks any logic that wants to read a non-local
> variable once to get it into a local and reuse that knowing that it won't
> change even if the real variable does!
>

Yes, that's exactly what I mean and that's exactly what we've observed
on AIX with xlc. Notice that the compiler is free to do such
transformations if the load is not from a volatile field. That's why
we've opened the bug and fixed out internal version. But we still
think this fix needs to go into OpenJDK as well.

Regards,
Volker

> David
> -----
>
>
>>      HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>>      _cm->grayRoot(obj, hr);
>>    }
>> }
>>
>> Notice that we don't need memory barriers here - all we need is to
>> prevent the compiler from loading the oop (i.e. 'heap_oop') a second
>> time. After Andrews explanation (thanks for that!) and Martin's
>> examples from Google, I think we could fix this by rewriting
>> 'load_heap_oop()' (and friends) as follows:
>>
>> static inline oop load_heap_oop(oop* p) {
>>    oop o = *p;
>>    __asm__ volatile ("" : : : "memory");
>>    return o;
>> }
>>
>> In order to make this consistent across all platforms, we would
>> probably have to introduce a new, public "compiler barrier" function
>> in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
>> don't currently seem to have a cross-platform concept for
>> "compiler-only barriers"). But I'm still not convinced that it would
>> be better than simply writing (and that's the way how we've actually
>> solved it internally):
>>
>> static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }
>>
>> Declaring that single memory location to be 'volatile' seems to be a
>> much more local change compared to globally "clobbering" all the
>> memory. And it doesn't rely on a the compilers providing a compiler
>> barrier. It does however rely on the compiler doing the "right thing"
>> for volatile - but after all what has been said here so far, that
>> seems more likely?
>>
>> The problem may also depend on the specific compiler/cpu combination.
>> For ppc64, both gcc (on linux) and xlc (on aix), do the right thing
>> for volatile variables - they don't insert any memory barriers (i.e.
>> no instructions) but just access the corresponding variables as if
>> there was a compiler barrier. This is exactly what we currently want
>> in HotSpot, because fine-grained control of memory barriers is
>> controlled by the use of OrderAccess (and OrderAccess implies
>> "compiler barrier", at least after the latest fixes).
>>
>> Any thoughts? Should we introduce a cross-platform, "compiler-only
>> barrier" or should we stick to using "volatile" for such cases?
>>
>> Regards,
>> Volker
>>
>> [1]
>> https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/language_ref/asm.html
>> [2] https://bugs.openjdk.java.net/browse/JDK-8129440
>>
>> On Mon, May 29, 2017 at 2:20 PM, Erik Österlund
>> <erik.osterlund at oracle.com> wrote:
>>>
>>> Hi Andrew,
>>>
>>> I just thought I'd put my opinions in here as I see I have been mentioned
>>> a
>>> few times already.
>>>
>>> First of all, I find using the volatile keyword on things that are
>>> involved
>>> in lock-free protocols meaningful from a readability point of view. It
>>> allows the reader of the code to see care is needed here.
>>>
>>> About the compiler barriers - you are right. Volatile should indeed not
>>> be
>>> necessary if the compiler barriers do everything right. The compiler
>>> should
>>> not reorder things and it should not prevent reloading.
>>>
>>> On windows we rely on the deprecated _ReadWriteBarrier(). According to
>>> MSDN,
>>> it guarantees:
>>>
>>> "The _ReadWriteBarrier intrinsic limits the compiler optimizations that
>>> can
>>> remove or reorder memory accesses across the point of the call."
>>>
>>> This should cut it.
>>>
>>> The GCC memory clobber is defined as:
>>>
>>> "The "memory" clobber tells the compiler that the assembly code performs
>>> memory reads or writes to items other than those listed in the input and
>>> output operands (for example, accessing the memory pointed to by one of
>>> the
>>> input parameters). To ensure memory contains correct values, GCC may need
>>> to
>>> flush specific register values to memory before executing the asm.
>>> Further,
>>> the compiler does not assume that any values read from memory before an
>>> asm
>>> remain unchanged after that asm; it reloads them as needed. Using the
>>> "memory" clobber effectively forms a read/write memory barrier for the
>>> compiler."
>>>
>>> This seems to only guarantee values will not be re-ordered. But in the
>>> documentation for ExtendedAsm it also states:
>>>
>>> "You will also want to add the volatile keyword if the memory affected is
>>> not listed in the inputs or outputs of the asm, as the `memory' clobber
>>> does
>>> not count as a side-effect of the asm."
>>>
>>> and
>>>
>>> "The volatile keyword indicates that the instruction has important
>>> side-effects. GCC will not delete a volatile asm if it is reachable. (The
>>> instruction can still be deleted if GCC can prove that control-flow will
>>> never reach the location of the instruction.) Note that even a volatile
>>> asm
>>> instruction can be moved relative to other code, including across jump
>>> instructions."
>>>
>>> This is a bit vague, but seems to suggest that by making the asm
>>> statement
>>> volatile and having a memory clobber, it definitely will not reload
>>> variables. About not re-ordering non-volatile accesses, it shouldn't but
>>> it
>>> is not quite clearly stated. I have never observed such a re-ordering
>>> across
>>> a volatile memory clobber. But the semantics seem a bit vague.
>>>
>>> As for clang, the closest to a definition of what it does I have seen is:
>>>
>>> "A clobber constraint is indicated by a “~” prefix. A clobber does not
>>> consume an input operand, nor generate an output. Clobbers cannot use any
>>> of
>>> the general constraint code letters – they may use only explicit register
>>> constraints, e.g. “~{eax}”. The one exception is that a clobber string of
>>> “~{memory}” indicates that the assembly writes to arbitrary undeclared
>>> memory locations – not only the memory pointed to by a declared indirect
>>> output."
>>>
>>> Apart from sweeping statements saying clang inline assembly is largely
>>> compatible and working similar to GCC, I have not seen clear guarantees.
>>> And
>>> then there are more compilers.
>>>
>>> As a conclusion, by using volatile in addition to OrderAccess you rely on
>>> standardized compiler semantics (at least for volatile-to-volatile
>>> re-orderings and re-loading, but not for volatile-to-nonvolatile, but
>>> that's
>>> another can of worms), and regrettably if you rely on OrderAccess memory
>>> model doing what it says it will do, then it should indeed work without
>>> volatile, but to make that work, OrderAccess relies on non-standardized
>>> compiler-specific barriers. In practice it should work well on all our
>>> supported compilers without volatile. And if it didn't, it would indeed
>>> be a
>>> bug in OrderAccess that needs to be solved in OrderAccess.
>>>
>>> Personally though, I am a helmet-on-synchronization kind of person, so I
>>> would take precaution anyway and use volatile whenever possible, because
>>> 1)
>>> it makes the code more readable, and 2) it provides one extra layer of
>>> safety that is more standardized. It seems that over the years it has
>>> happened multiple times that we assumed OrderAccess is bullet proof, and
>>> then realized that it wasn't and observed a crash that would never have
>>> happened if the code was written in a helmet-on-synchronization way. At
>>> least that's how I feel about it.
>>>
>>> Now one might argue that by using C++11 atomics that are standardized,
>>> all
>>> these problems would go away as we would rely in standardized primitives
>>> and
>>> then just trust the compiler. But then there could arise problems when
>>> the
>>> C++ compiler decides to be less conservative than we want, e.g. by not
>>> doing
>>> fence in sequentially consistent loads to optimize for non-multiple copy
>>> atomic CPUs arguing that IRIW issues that violate sequential consistency
>>> are
>>> non-issues in practice. That makes those loads "almost" sequentially
>>> consistent, which might be good enough. But it feels good to have a
>>> choice
>>> here to be more conservative. To have the synchronization helmet on.
>>>
>>> Meta summary:
>>> 1) Current OrderAccess without volatile:
>>>    - should work, but relies on compiler-specific not standardized and
>>> sometimes poorly documented compiler barriers.
>>>
>>> 2) Current OrderAccess with volatile:
>>>    - relies on standardized volatile semantics to guarantee compiler
>>> reordering and reloading issues do not occur.
>>>
>>> 3) C++11 Atomic backend for OrderAccess
>>>    - relies on standardized semantics to guarantee compiler and hardware
>>> reordering issues
>>>    - nevertheless isn't always flawless, and when it isn't, it gets
>>> painful
>>>
>>> Hope this sheds some light on the trade-offs.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>
>>> On 2017-05-28 10:45, Andrew Haley wrote:
>>>>
>>>>
>>>> On 27/05/17 10:10, Volker Simonis wrote:
>>>>>
>>>>>
>>>>> On Fri, May 26, 2017 at 6:09 PM, Andrew Haley <aph at redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 26/05/17 17:03, Volker Simonis wrote:
>>>>>>
>>>>>>> Volatile not only prevents reordering by the compiler. It also
>>>>>>> prevents other, otherwise legal transformations/optimizations (like
>>>>>>> for example reloading a variable [1]) which have to be prevented in
>>>>>>> order to write correct, lock free programs.
>>>>>>
>>>>>>
>>>>>> Yes, but so do compiler barriers.
>>>>>
>>>>>
>>>>> Please correct me if I'm wrong, but I thought "compiler barriers" are
>>>>> to prevent reordering by the compiler. However, this is a question of
>>>>> optimization. If you have two subsequent loads from the same address,
>>>>> the compiler is free to do only the first load and keep the value in a
>>>>> register if the address is not pointing to a volatile value.
>>>>
>>>>
>>>> No it isn't: that is precisely what a compiler barrier prevents.  A
>>>> compiler barrier (from the POV of the compiler) clobbers all of
>>>> the memory state.  Neither reads nor writes may move past a compiler
>>>> barrier.
>>>>
>>>
>