RFR(XS): JDK-8129440 G1 crash during concurrent root region scan.

Volker Simonis volker.simonis at gmail.com
Mon Jul 6 13:03:05 UTC 2015


On Tue, Jun 23, 2015 at 8:45 AM, Siebenborn, Axel
<axel.siebenborn at sap.com> wrote:
>
>
> Hi Volker,
>
> thanks for your comment.Indeed, this was my first idea for a fix. However,
> there are more than 50 callers of  load_heap_oop, plus some generated by
> preprocessor defines like this one:
>
>
>
> #define DO_OOP_WORK_IMPL(closureName) \
>
>   template <class T> inline void closureName##Closure::do_oop_work(T* p) {
> \
>
>     T heap_oop = oopDesc::load_heap_oop(p);               \
>
>     if (!oopDesc::is_null(heap_oop)) {                    \
>
>       oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);       \
>
>       do_oop(obj);                                        \
>
>     }                                                     \
>
>   }
>
>
>
> We would have to decide for each of the callers, if there might be a
> concurrency problem or not.
>
>
>
> Do you think, it is worth the effort and the risk of missing one?
>

OK, you're probably right.

If you haven’t seen any performance regressions caused by this fix I'm
fine with your solution.

Unfortunately I can not push this shared-code change so can we please
get a sponsor from within Oracle for this change?

Thanks,
Volker

>
> Regards,
>
> Axel
>
>
>
>
>
>
>
>
>
> On 22.06.2015 at 18:17 Volker Simonis wrote:
>
>> Hi Axel,
>
>>
>
>> the change looks good, but shouldn't we better provide two versions of
>
>> load_heap_oop()
>
>>
>
>> inline oop       oopDesc::load_heap_oop(oop* p)
>
>> inline oop       oopDesc::load_heap_oop(volatile oop* p)
>
>>
>
>> such that the callers can decide on their own which versions they require?
>
>>
>
>> Or do the callers of load_heap_oop() not always know if there exist
>
>> concurrent mutator threads?
>
>>
>
>> Regards,
>
>> Volker
>
>>
>
>>
>
>> On Mon, Jun 22, 2015 at 5:33 PM, Siebenborn, Axel
>
>> <axel.siebenborn at sap.com> wrote:
>
>>> Hi,
>
>>> we have seen crashes with G1 during concurrent root region scan.
>
>>> It turned out, that the reason for the crashes was reloading an oop after
>>> a null check.
>
>>> The compiler may legally do this, as the pointer is not declared as
>>> volatile.
>
>>> The code runs concurrently to mutator threads.
>
>>>
>
>>> template <class T>
>
>>> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>
>>>   T heap_oop = oopDesc::load_heap_oop(p);
>>> // 1. Load oop
>
>>>   if (!oopDesc::is_null(heap_oop)) {
>
>>>     oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);    // 2.
>>> Compiler decides to re-load the oop
>
>>>     HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>
>>>     _cm->grayRoot(obj, obj->size(), _worker_id, hr);
>
>>>   }
>
>>> }
>
>>>
>
>>>
>
>>> We have seen the problem on AIX with the xlC compiler. However, we have
>>> seen similar optimizations on other platforms and other platforms.
>
>>>
>
>>> As this code pattern is used quite often, I would suggest a global fix in
>>> oopDesc::load_heap_oop(p).
>
>>>
>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8129440
>
>>> Webrev: http://cr.openjdk.java.net/~asiebenborn/8129440/webrev.00/
>
>>>
>
>>> Thanks,
>
>>> Axel
>
>>>
>
>>>
>
>>>



More information about the hotspot-gc-dev mailing list