CRR (XS): 7098085: G1: partially-young GCs not initiated under certain circumstances

Tony Printezis tony.printezis at oracle.com
Thu Oct 6 18:10:12 UTC 2011


Ramki,

Y. S. Ramakrishna wrote:
>
>
> On 10/06/11 09:46, Tony Printezis wrote:
>> Ramki,
>>
>> I have to eat humble pie. I got two (infrequent failures) during testing 
>
> OK, i'll share the pie :-) I thought (but didn't check) that
> we clear both marking bit maps at the end of the full gc,
> and the yielded/aborted marking threads will not create any
> new marks.

That's the idea, and the bitmap will be cleared by the CM thread anyway. 
What seems to happen is a Full GC happens, followed by a GC pause and 
the latter somehow has not observed that marking is done. So, it tries 
to manipulate the bitmap and finds some bits to be set which are 
expected to be 0 (I think the reason for this is that the CM thread has 
not finished clearing the bitmap and has yielded for the GC pause to 
happen). I'll file a CR to look into this 'cause it'd be good to avoid 
clearing the bitmap if we can.

> But perhaps they are not checking for abortion when
> they emerge from an arbitrary yield? (You can tell that I have
> not looked sufficiently at the G1 abort mechanism yet; i'll try
> and look at the sources for that when i get some time.)
>
>> overnight that I'm 99.9% sure are caused by the fact that I stopped 
>> clearing the bitmap in the CM::abort() method. As I said, I've backed 
>> out of those changes in concurrentMark.cpp, here's the latest webrev 
>> (same as before but without the changes to the CM::abort() method):
>>
>> http://cr.openjdk.java.net/~tonyp/7098085/webrev.1/
>
> reviewed!

Thanks.

Tony

>>
>> I'll carry on testing this for the next day or two.
>>
>> Tony
>
> thanks.
> -- ramki
>
>>
>> Tony Printezis wrote:
>>> Ramki,
>>>
>>> Thanks for taking a look at it! See inline.
>>>
>>> On 10/5/2011 7:48 PM, Ramki Ramakrishna wrote:
>>>> Hi Tony --
>>>>
>>>> The changes look good.
>>>>
>>>> On a side note, was the removal of the bit map clearing a different 
>>>> bug fix,
>>>> or is it somehow related to the issue described in the CR?
>>>> I can see that it is unnecessary and may be even wasteful,
>>>> but it seemed to me to be unrelated to the issue described in the CR,
>>>> unless I am missing something here.
>>>
>>> Agreed. I was looking at the abort() method to see what it did and I 
>>> noticed that it was also clearing the bitmap. It's indeed wasteful 
>>> and, in fact, it does not achieve what the comment says it should 
>>> achieve. abort() is only called at the start of a Full GC but, in 
>>> order to do the Full GC safepoint, the concurrent marking workers 
>>> should have noticed that a safepoint was initiated and should have 
>>> all yielded (otherwise the safepoint would have to wait for them). 
>>> And, the concurrent marking thread will actually clear the next 
>>> bitmap as the last thing it does before waiting for the next cycle 
>>> to start. But, yeah, point taken: I'll back out of the 
>>> concurrentMark.cpp changes and do them as a separate CR.
>>>
>>>> As a related (but otherwise orthogonal) matter, it appears as 
>>>> though the idiom:-
>>>>
>>>>    _sts.join();
>>>>    do_something();
>>>>    do_something_else();
>>>>   _sts_leave();
>>>>
>>>> might gain from the usual constructor/destructor idiom used
>>>> elsewhere for code that wants to synchronize with other
>>>> parts of the system (in this case safepoints). For example:
>>>>
>>>>   {
>>>>      SynchronizeWithSafepoints ss();     // or appropriate other 
>>>> name [1]
>>>>      do_something();
>>>>      do_something_else();
>>>>   }
>>>
>>> Yes, I do like this idea very much.
>>>
>>>> [1] you will have to invent a suitable other name, perhaps 
>>>> DesynchronizeWithSafepoints
>>>> for when you do the reverse, i.e. perform an action outside of the 
>>>> _sts, as happens
>>>> when you are doing those synch barriers amongst the concurrent marking
>>>> threads in case of overflow and restart.
>>>
>>> I can't think of a good name off-hand but I'll think about it for a 
>>> bit... (suggestions are welcome!)
>>>
>>>> The benefit is that the extra scopes help clearly demarcate
>>>> the code that wants to be in the "critical section", if you
>>>> will, wrt safepoints that will want to synchronize wrt these
>>>> actions.
>>>>
>>>> (As far as I can tell all invocations of join/leave or 
>>>> stsJoin/stsLeave
>>>> are paired and can become block-structured.)
>>>
>>> They should definitely be paired (like lock/unlock pairs). 
>>> Otherwise, we're in trouble. :-)
>>>
>>>> The c'tor/d'tor idiom change is of course orthogonal to yr changes
>>>> and would likely want to be a separate CR, if you agree that this
>>>> may be another clean-up worth doing.
>>>
>>> Absolutely. So, I take an action item to open two hs23 CRs: one to 
>>> avoid clearing the marking bitmap and one for the ctor/dtor idiom to 
>>> join / leave the STS and vice versa.
>>>
>>> Tony
>>>
>>>> Otherwise looks good.
>>>> -- ramki
>>>>
>>>> On 10/5/2011 2:21 PM, Tony Printezis wrote:
>>>>> Hi,
>>>>>
>>>>> I'd like to get a couple of people to have a look at the following 
>>>>> webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~tonyp/7098085/webrev.0/
>>>>>
>>>>> The code changes are very minor, most of the added lines are in 
>>>>> fact detailed comments on the changes.
>>>>>
>>>>> I'm going to do testing on two machines overnight, as this is 
>>>>> quite tricky code (even though it doesn't look it). But, I thought 
>>>>> I'd open the changes for code review early as the change is small 
>>>>> and some early feedback, if possible, would be greatly appreciated.
>>>>>
>>>>> Tony
>>>>>
>>>>> PS Correct url? Check. Text actually applies to the CR in the 
>>>>> subject? Check.
>>>>>



More information about the hotspot-gc-dev mailing list