RFR(XS): JDK-8129440 G1 crash during concurrent root region scan.
Siebenborn, Axel
axel.siebenborn at sap.com
Fri Jul 10 07:38:57 UTC 2015
On 09.07.2015 at 18:50 Jon Masamitsu wrote:
>
> On 7/9/2015 1:29 AM, Siebenborn, Axel wrote:
>> On 06.07.2015 at 15:03 Volker Simonis wrote:
>>> On Tue, Jun 23, 2015 at 8:45 AM, Siebenborn, Axel
>>> <axel.siebenborn at sap.com> wrote:
>>>>
>>>> Hi Volker,
>>>>
>>>> thanks for your comment.Indeed, this was my first idea for a fix. However,
>>>> there are more than 50 callers of load_heap_oop, plus some generated by
>>>> preprocessor defines like this one:
>>>>
>>>> #define DO_OOP_WORK_IMPL(closureName) \
>>>> template <class T> inline void closureName##Closure::do_oop_work(T* p) {
>>>> \
>>>> T heap_oop = oopDesc::load_heap_oop(p); \
>>>> if (!oopDesc::is_null(heap_oop)) { \
>>>> oop obj = oopDesc::decode_heap_oop_not_null(heap_oop); \
>>>> do_oop(obj); \
>>>> } \
>>>> }
>>>>
>>>> We would have to decide for each of the callers, if there might be a
>>>> concurrency problem or not.
>>>>
>>>> Do you think, it is worth the effort and the risk of missing one?
>>> OK, you're probably right.
>>>
>>> If you haven’t seen any performance regressions caused by this fix I'm
>>> fine with your solution.
>>>
>>> Unfortunately I can not push this shared-code change so can we please
>>> get a sponsor from within Oracle for this change?
>>>
>>> Thanks,
>>> Volker
>> We did not see any performance regressions.
>> On Itanium compilers add release and acquire semantics to stores and loads of volatile variables. The additional memory barriers really could have a performance impact. But even on Itanium we don’t see any performance regression in our benchmarks.
>> On other platforms, optimizations prevented by the volatile qualifier are optimizations that we really don't want.
>
> Did you do performance measurements on GC pause times?
>
> Jon
I checked the GC pause times in standard benchmarks for ParrallelGC, CMS and G1. I don't see regressions here.
Axel
>>
>> Thanks,
>> Axel
>>>> Regards,
>>>>
>>>> Axel
>>>>
>>>> On 22.06.2015 at 18:17 Volker Simonis wrote:
>>>>> Hi Axel,
>>>>>
>>>>> the change looks good, but shouldn't we better provide two versions of
>>>>> load_heap_oop()
>>>>>
>>>>> inline oop oopDesc::load_heap_oop(oop* p)
>>>>> inline oop oopDesc::load_heap_oop(volatile oop* p)
>>>>>
>>>>> such that the callers can decide on their own which versions they require?
>>>>>
>>>>> Or do the callers of load_heap_oop() not always know if there exist
>>>>> concurrent mutator threads?
>>>>>
>>>>> Regards,
>>>>> Volker
>>>>>
>>>>>
>>>>> On Mon, Jun 22, 2015 at 5:33 PM, Siebenborn, Axel
>>>>> <axel.siebenborn at sap.com> wrote:
>>>>>> Hi,
>>>>>> we have seen crashes with G1 during concurrent root region scan.
>>>>>> It turned out, that the reason for the crashes was reloading an oop after
>>>>>> a null check.
>>>>>> The compiler may legally do this, as the pointer is not declared as
>>>>>> volatile.
>>>>>> The code runs concurrently to mutator threads.
>>>>>>
>>>>>> template <class T>
>>>>>> inline void G1RootRegionScanClosure::do_oop_nv(T* p) {
>>>>>> T heap_oop = oopDesc::load_heap_oop(p);
>>>>>> // 1. Load oop
>>>>>> if (!oopDesc::is_null(heap_oop)) {
>>>>>> oop obj = oopDesc::decode_heap_oop_not_null(heap_oop); // 2.
>>>>>> Compiler decides to re-load the oop
>>>>>> HeapRegion* hr = _g1h->heap_region_containing((HeapWord*) obj);
>>>>>> _cm->grayRoot(obj, obj->size(), _worker_id, hr);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> We have seen the problem on AIX with the xlC compiler. However, we have
>>>>>> seen similar optimizations on other platforms and other platforms.
>>>>>>
>>>>>> As this code pattern is used quite often, I would suggest a global fix in
>>>>>> oopDesc::load_heap_oop(p).
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8129440
>>>>>> Webrev: http://cr.openjdk.java.net/~asiebenborn/8129440/webrev.00/
>>>>>>
>>>>>> Thanks,
>>>>>> Axel
More information about the hotspot-gc-dev
mailing list