RFR: 8181085: Race condition in method resolution may produce spurious NullPointerException

Tue May 30 09:04:58 UTC 2017

On 29/05/17 18:02, Volker Simonis wrote:

> "Add memory to the list of clobbered registers if assembler
> instructions can change a memory location in an unpredictable fashion.
> The memory clobber ensures that the data used after the completion of
> the assembly statement is valid and synchronized.
> However, the memory clobber can result in many unnecessary reloads,
> reducing the benefits of hardware prefetching. Thus, the memory
> clobber can impose a performance penalty and should be used with
> caution."
> 
> We haven't used it until now, so I can not say if it really does what
> it is supposed to do. I'm also concerned about the performance
> warning. It seems like the "unnecessary reloads" can really hurt on
> architectures like ppc which have much more registers than x86.
> Declaring a memory location 'volatile' seems much more simple and
> light-weight in order to achieve the desired effect. So I tend to
> agree with you and David that we should proceed to mark things with
> 'volatile'.

If volatile is what is needed, yes.  The problem that we're discussing
is that on x86, OrderAccess was actually incorrect: it should work
with all accesses, not just volatile ones.  The addition of volatile
was potentially papering over a bug.

> Sorry for constantly "spamming" this thread with another problem (i.e.
> JDK-8129440 [2]) but I still think that it is related and important.
> In its current state, the way how "load_heap_oop()" and its
> application works is broken. And this is not because of a problem in
> OrderAccess, but because of missing compiler barriers:
> 
> static inline oop       load_heap_oop(oop* p)       { return *p; }
> ...
> template <class T>
> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>   // 1. load 'heap_oop' from 'p'
>   T heap_oop = oopDesc::load_heap_oop(p);
>   if (!oopDesc::is_null(heap_oop)) {
>     // 2. Compiler reloads 'heap_oop' from 'p' which may now be null!
>     oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
>     HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>     _cm->grayRoot(obj, hr);
>   }
> }
> 
> Notice that we don't need memory barriers here - all we need is to
> prevent the compiler from loading the oop (i.e. 'heap_oop') a second
> time. After Andrews explanation (thanks for that!) and Martin's
> examples from Google, I think we could fix this by rewriting
> 'load_heap_oop()' (and friends) as follows:
> 
> static inline oop load_heap_oop(oop* p) {
>   oop o = *p;
>   __asm__ volatile ("" : : : "memory");
>   return o;
> }

I wouldn't do that: it's much too violent an action because it
clobbers all of memory.  You don't want to do it every time anyone
reads an oop from the heap.

> In order to make this consistent across all platforms, we would
> probably have to introduce a new, public "compiler barrier" function
> in OrderAccess (e.g. 'OrderAccess::compiler_barrier()' because we
> don't currently seem to have a cross-platform concept for
> "compiler-only barriers"). But I'm still not convinced that it would
> be better than simply writing (and that's the way how we've actually
> solved it internally):
> 
> static inline oop load_heap_oop(oop* p) { return * (volatile oop*) p; }

That looks better.  It's still UB post-C++11, but it should be OK.

> Declaring that single memory location to be 'volatile' seems to be a
> much more local change compared to globally "clobbering" all the
> memory. And it doesn't rely on a the compilers providing a compiler
> barrier. It does however rely on the compiler doing the "right thing"
> for volatile - but after all what has been said here so far, that
> seems more likely?

It does.  The problem here is that the compiler is not being told what
is going on, and as the saying goes 'If you lie to the compiler, it
will get its revenge.'

> Any thoughts? Should we introduce a cross-platform, "compiler-only
> barrier" or should we stick to using "volatile" for such cases?

Eventually it will have to be C++11 atomics, which give you exactly
the language you need to express this stuff.  The above would be a
relaxed atomic load.

Andrew.