RFR(XS): JDK-8129440 G1 crash during concurrent root region scan.

Mon Jun 22 16:17:01 UTC 2015

Hi Axel,

the change looks good, but shouldn't we better provide two versions of
load_heap_oop()

inline oop       oopDesc::load_heap_oop(oop* p)
inline oop       oopDesc::load_heap_oop(volatile oop* p)

such that the callers can decide on their own which versions they require?

Or do the callers of load_heap_oop() not always know if there exist
concurrent mutator threads?

Regards,
Volker

On Mon, Jun 22, 2015 at 5:33 PM, Siebenborn, Axel
<axel.siebenborn at sap.com> wrote:
> Hi,
> we have seen crashes with G1 during concurrent root region scan.
> It turned out, that the reason for the crashes was reloading an oop after a null check.
> The compiler may legally do this, as the pointer is not declared as volatile.
> The code runs concurrently to mutator threads.
>
> template <class T>
> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>   T heap_oop = oopDesc::load_heap_oop(p);                                           // 1. Load oop
>   if (!oopDesc::is_null(heap_oop)) {
>     oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);    // 2. Compiler decides to re-load the oop
>     HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>     _cm->grayRoot(obj, obj->size(), _worker_id, hr);
>   }
> }
>
>
> We have seen the problem on AIX with the xlC compiler. However, we have seen similar optimizations on other platforms and other platforms.
>
> As this code pattern is used quite often, I would suggest a global fix in  oopDesc::load_heap_oop(p).
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8129440
> Webrev: http://cr.openjdk.java.net/~asiebenborn/8129440/webrev.00/
>
> Thanks,
> Axel
>
>
>