RFR(XS): JDK-8129440 G1 crash during concurrent root region scan.

Thu Jul 9 16:50:28 UTC 2015

On 7/9/2015 1:29 AM, Siebenborn, Axel wrote:
> On 06.07.2015 at 15:03 Volker Simonis wrote:
>> On Tue, Jun 23, 2015 at 8:45 AM, Siebenborn, Axel
>> <axel.siebenborn at sap.com> wrote:
>>>
>>> Hi Volker,
>>>
>>> thanks for your comment.Indeed, this was my first idea for a fix. However,
>>> there are more than 50 callers of  load_heap_oop, plus some generated by
>>> preprocessor defines like this one:
>>>
>>> #define DO_OOP_WORK_IMPL(closureName) \
>>>    template <class T> inline void closureName##Closure::do_oop_work(T* p) {
>>> \
>>>      T heap_oop = oopDesc::load_heap_oop(p);               \
>>>      if (!oopDesc::is_null(heap_oop)) {                    \
>>>        oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);       \
>>>        do_oop(obj);                                        \
>>>      }                                                     \
>>>    }
>>>
>>> We would have to decide for each of the callers, if there might be a
>>> concurrency problem or not.
>>>
>>> Do you think, it is worth the effort and the risk of missing one?
>>>
>> OK, you're probably right.
>>
>> If you haven’t seen any performance regressions caused by this fix I'm
>> fine with your solution.
>>
>> Unfortunately I can not push this shared-code change so can we please
>> get a sponsor from within Oracle for this change?
>>
>> Thanks,
>> Volker
> We did not see any performance regressions.
> On Itanium compilers add release and acquire semantics to stores and loads of volatile variables. The additional memory barriers really could have a performance impact. But even on Itanium we don’t see any performance regression in our benchmarks.
> On other platforms, optimizations prevented by the volatile qualifier are optimizations that we really don't want.

Did you do performance measurements on GC pause times?

Jon

>   
>
> Thanks,
> Axel
>
>>> Regards,
>>>
>>> Axel
>>>
>>> On 22.06.2015 at 18:17 Volker Simonis wrote:
>>>> Hi Axel,
>>>>
>>>> the change looks good, but shouldn't we better provide two versions of
>>>> load_heap_oop()
>>>>
>>>> inline oop       oopDesc::load_heap_oop(oop* p)
>>>> inline oop       oopDesc::load_heap_oop(volatile oop* p)
>>>>
>>>> such that the callers can decide on their own which versions they require?
>>>>
>>>> Or do the callers of load_heap_oop() not always know if there exist
>>>> concurrent mutator threads?
>>>>
>>>> Regards,
>>>> Volker
>>>>
>>>>
>>>> On Mon, Jun 22, 2015 at 5:33 PM, Siebenborn, Axel
>>>> <axel.siebenborn at sap.com> wrote:
>>>>> Hi,
>>>>> we have seen crashes with G1 during concurrent root region scan.
>>>>> It turned out, that the reason for the crashes was reloading an oop after
>>>>> a null check.
>>>>> The compiler may legally do this, as the pointer is not declared as
>>>>> volatile.
>>>>> The code runs concurrently to mutator threads.
>>>>>
>>>>> template <class T>
>>>>> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>>>>>    T heap_oop = oopDesc::load_heap_oop(p);
>>>>> // 1. Load oop
>>>>>    if (!oopDesc::is_null(heap_oop)) {
>>>>>      oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);    // 2.
>>>>> Compiler decides to re-load the oop
>>>>>      HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>>>>>      _cm->grayRoot(obj, obj->size(), _worker_id, hr);
>>>>>    }
>>>>> }
>>>>>
>>>>>
>>>>> We have seen the problem on AIX with the xlC compiler. However, we have
>>>>> seen similar optimizations on other platforms and other platforms.
>>>>>
>>>>> As this code pattern is used quite often, I would suggest a global fix in
>>>>> oopDesc::load_heap_oop(p).
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8129440
>>>>> Webrev: http://cr.openjdk.java.net/~asiebenborn/8129440/webrev.00/
>>>>>
>>>>> Thanks,
>>>>> Axel
>>>>>