RFR(XS): JDK-8129440 G1 crash during concurrent root region scan.

Tue Jun 23 06:45:20 UTC 2015

Hi Volker,
thanks for your comment.Indeed, this was my first idea for a fix. However, there are more than 50 callers of  load_heap_oop, plus some generated by preprocessor defines like this one:

#define DO_OOP_WORK_IMPL(closureName) \
  template <class T> inline void closureName##Closure::do_oop_work(T* p) {      \
    T heap_oop = oopDesc::load_heap_oop(p);               \
    if (!oopDesc::is_null(heap_oop)) {                    \
      oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);       \
      do_oop(obj);                                        \
    }                                                     \
  }

We would have to decide for each of the callers, if there might be a concurrency problem or not.

Do you think, it is worth the effort and the risk of missing one?

Regards,
Axel

On 22.06.2015 at 18:17 Volker Simonis wrote:
> Hi Axel,
>
> the change looks good, but shouldn't we better provide two versions of
> load_heap_oop()
>
> inline oop       oopDesc::load_heap_oop(oop* p)
> inline oop       oopDesc::load_heap_oop(volatile oop* p)
>
> such that the callers can decide on their own which versions they require?
>
> Or do the callers of load_heap_oop() not always know if there exist
> concurrent mutator threads?
>
> Regards,
> Volker
>
>
> On Mon, Jun 22, 2015 at 5:33 PM, Siebenborn, Axel
> <axel.siebenborn at sap.com> wrote:
>> Hi,
>> we have seen crashes with G1 during concurrent root region scan.
>> It turned out, that the reason for the crashes was reloading an oop after a null check.
>> The compiler may legally do this, as the pointer is not declared as volatile.
>> The code runs concurrently to mutator threads.
>>
>> template <class T>
>> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>>   T heap_oop = oopDesc::load_heap_oop(p);                                           // 1. Load oop
>>   if (!oopDesc::is_null(heap_oop)) {
>>     oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);    // 2. Compiler decides to re-load the oop
>>     HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>>     _cm->grayRoot(obj, obj->size(), _worker_id, hr);
>>   }
>> }
>>
>>
>> We have seen the problem on AIX with the xlC compiler. However, we have seen similar optimizations on other platforms and other platforms.
>>
>> As this code pattern is used quite often, I would suggest a global fix in  oopDesc::load_heap_oop(p).
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8129440
>> Webrev: http://cr.openjdk.java.net/~asiebenborn/8129440/webrev.00/
>>
>> Thanks,
>> Axel
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150623/974dee83/attachment.htm>