RFR(S): 8170409: CMS: Crash in CardTableModRefBSForCTRS::process_chunk_boundaries

Volker Simonis volker.simonis at gmail.com
Fri Dec 2 10:10:49 UTC 2016


On Wed, Nov 30, 2016 at 10:43 PM,  <dean.long at oracle.com> wrote:
> I'm fine with leaving it in.  How about adding a comment saying it's there
> for clarity?
>

I've added the corresponding comment:

http://cr.openjdk.java.net/~simonis/webrevs/2016/8170409.v3/

Ready to push?

@Thomas: can you please sponsor now?

Thank you and best regards,
Volker

> dl
>
>
>
> On 11/30/16 8:41 AM, Volker Simonis wrote:
>>
>> On Tue, Nov 29, 2016 at 11:37 PM,  <dean.long at oracle.com> wrote:
>>>
>>> This looks good to me too, but I'm just wondering if the 2nd load_acquire
>>> inside the lock is really needed.
>>>
>> Yes, we think you're right and the second load_acquire inside the lock
>> could be replaced by a normal load because entering the lock does an
>> implicit acquire and leaving the lock does an implicit release.
>>
>> Do you want me to remove it or can we leave it in for clarity (I don't
>> think it hurts a lot) ?
>>
>> Thank you and best regards,
>> Volker
>>
>>
>>> dl
>>>
>>>
>>>
>>> On 11/29/16 10:26 AM, Volker Simonis wrote:
>>>>
>>>> On Tue, Nov 29, 2016 at 4:55 PM, Mikael Gerdin
>>>> <mikael.gerdin at oracle.com>
>>>> wrote:
>>>>>
>>>>> Hi Volker,
>>>>>
>>>>> On 2016-11-29 14:59, Volker Simonis wrote:
>>>>>>
>>>>>> On Tue, Nov 29, 2016 at 11:58 AM, Mikael Gerdin
>>>>>> <mikael.gerdin at oracle.com> wrote:
>>>>>>>
>>>>>>> Hi Volker,
>>>>>>>
>>>>>>> On 2016-11-28 19:33, Volker Simonis wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> can I please have a review and sponsor for the following fix
>>>>>>>> submitted
>>>>>>>> by gunter.haug at sap.com:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8170409/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I haven't looked too closely on the change but one thing I'm curious
>>>>>>> about
>>>>>>> just to further my understanding of the issue:
>>>>>>> Would an alternate solution be to always acquire the
>>>>>>> ParGCRareEvent_lock
>>>>>>> before reading _last_LNC_resizing_collection[i] since in that case
>>>>>>> the
>>>>>>> mutex
>>>>>>> semantics would ensure proper ordering of the memory accesses?
>>>>>>>
>>>>> Ok, thanks for confirming my thoughts around this.
>>>>>
>>>>> I think the change seems ok but what do you think about making
>>>>> _last_LNC_resizing_collection volatile to further ensure that there's
>>>>> not
>>>>> funny compiler optimizations going on?
>>>>>
>>>> In general we agree that it would be fine to also make
>>>> _last_LNC_resizing_collection volatile and here's an updated webrev
>>>> which does exactly that:
>>>>
>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8170409.v2/
>>>>
>>>> Our initial reasoning was that the aquiring the ParGCRareEvent_lock
>>>> implies a C++ "sequence point" which should prevent the compiler from
>>>> caching the value read from _last_LNC_resizing_collection. But with
>>>> the "volatile" modifier the code is defintely clearer.
>>>>
>>>> @Thomas: can you please use the new webrev for pushing?
>>>>
>>>> Thanks,
>>>> Volker
>>>>
>>>>> /Mikael
>>>>>
>>>>>
>>>>>> Hi Mikael,
>>>>>>
>>>>>> we've discussed that as well and we think you are right in that always
>>>>>> acquiring the ParGCRareEvent_lock before reading
>>>>>> _last_LNC_resizing_collection[i] would solve the problem as well. But
>>>>>> as ParGCRareEvent_lock is potentially a heavy-weight OS lock that
>>>>>> would not be a practical solution.
>>>>>>
>>>>>> Thanks,
>>>>>> Volker
>>>>>>
>>>>>>> Thanks
>>>>>>> /Mikael
>>>>>>>
>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8170409
>>>>>>>>
>>>>>>>> We've observed a crash (see bug report for a stack trace) in
>>>>>>>> CardTableModRefBSForCTRS::process_chunk_boundaries() from time to
>>>>>>>> time
>>>>>>>> since several years now, but only on non TSO platforms:
>>>>>>>>
>>>>>>>> - It only happens in opt builds.
>>>>>>>> - Analysis of the assembly code revealed the actual crash site to be
>>>>>>>> an array store to a pointer (_lowest_non_clean) which is an argument
>>>>>>>> to process_chunk_boundaries()
>>>>>>>> - The pointer is actually calculated in
>>>>>>>> CardTableModRefBS::get_LNC_array_for_space() and passed as argument
>>>>>>>> to
>>>>>>>> CardTableModRefBSForCTRS::process_chunk_boundaries()
>>>>>>>> - CardTableModRefBS::get_LNC_array_for_space() doesn't enforce TSO
>>>>>>>> on
>>>>>>>> _last_LNC_resizing_collection[i] so the pointer to an uninitialized
>>>>>>>> structure (i.e._lowest_non_clean) could become visible to other
>>>>>>>> threads before the value of _last_LNC_resizing_collection[i].
>>>>>>>>
>>>>>>>> Solution:
>>>>>>>>
>>>>>>>> Use OrderAccess::load_acquire and OrderAccess::release_store for
>>>>>>>> accessing _last_LNC_resizing_collection[i] in
>>>>>>>> CardTableModRefBSForCTRS::get_LNC_array_for_space()
>>>>>>>>
>>>>>>>> Thanks you and best regards,
>>>>>>>> Volker
>>>>>>>>
>



More information about the hotspot-gc-dev mailing list