RFR(S): 8009536: G1: Apache Lucene hang during reference processing

Bengt Rutisson bengt.rutisson at oracle.com
Mon Mar 18 11:40:24 UTC 2013


Hi John,

This looks good to me. Just like I thought ;)

One nit if you feel like fixing it before you push:

In a couple places you have this pattern:

2240           _task->do_marking_step(mark_step_duration_ms,
2241                                  false      /* do_termination */,
2242                                  _is_serial /* is_serial */);

I think the comment /* is_serial */ is superfluous now that we pass a 
variable named _is_serial instead of just true or false.

Bengt


On 3/14/13 6:19 PM, John Cuthbertson wrote:
> Hi Everyone,
>
> Here's the final webrev based upon Bengt's comments:
>
> http://cr.openjdk.java.net/~johnc/8009536/webrev.2/
>
> I think I'm all set after reviews from Bengt and Thomas - I just 
> wanted to send out the final version of the webrev in case anyone else 
> has other comments.
>
> Thanks,
>
> JohnC
>
> On 3/13/2013 2:01 PM, Bengt Rutisson wrote:
>>
>> Hi John,
>>
>> Thanks for explaining a bit about the serial case where we still need 
>> to go through termination protocol.
>>
>> As you know I'll be on vacation the next two days, so I just wanted 
>> to let you know that with the changes that you outline below (the fix 
>> to CMRemarkTask and removing the do_stealing argument) I'm fine with 
>> this change. Feel free to push this if you get another review.
>>
>> Thanks,
>> Bengt
>>
>> On 3/13/13 6:57 PM, John Cuthbertson wrote:
>>> Hi Bengt,
>>>
>>> Thanks for looking at the change again. Replies inline....
>>>
>>> On 3/13/2013 3:22 AM, Bengt Rutisson wrote:
>>>>
>>>>
>>>> Hi John,
>>>>
>>>> I'm not too familiar with the do_marking_step() code, so I'm trying 
>>>> to get to know it while I'm looking at your webrev. So, maybe I'm 
>>>> completely off here, but I think there are a couple of strange things.
>>>>
>>>> After your change do_marking_step() takes three bools: do_stealing, 
>>>> do_termination and is_serial.
>>>>
>>>> This means that we have 8 possible states if we combine all value 
>>>> of these bools. I went through all calls to do_marking_step() and 
>>>> it seems like we make use of 4 of those states:
>>>>
>>>>
>>>> G1CMKeepAliveAndDrainClosure::do_oop_work()
>>>>   do_marking_step(false, false, true)             (1)
>>>>
>>>> CMConcurrentMarkingTask::work()
>>>>   do_marking_step(true, true, false)              (2)
>>>>
>>>> CMRemarkTask::work()
>>>>   do_marking_step(true, true, false)              (2)
>>>>   do_marking_step(true, true, true)               (3)
>>>>
>>>> G1CMDrainMarkingStackClosure::do_void()
>>>>   do_marking_step(false, true, true)              (4)
>>>>   do_marking_step(true, true, false)              (2)
>>>>
>>>
>>> Yeah. Well...
>>>
>>>>
>>>> I numbered the states (1) - (4). Here are all states and I marked 
>>>> which ones we use:
>>>>
>>>>  stealing,  termination,  serial    (3)
>>>>  stealing, !termination,  serial
>>>>  stealing, !termination, !serial
>>>>  stealing,  termination, !serial    (2) parallel
>>>> !stealing,  termination,  serial    (4)
>>>> !stealing, !termination,  serial    (1) serial
>>>> !stealing, !termination, !serial
>>>> !stealing,  termination, !serial
>>>>
>>>>
>>>> I think state (2) makes sense as the parallel case. We do stealing 
>>>> and termination and we are not serial. State (1) also makes sense. 
>>>> It is clearly the serial case. No stealing or termination. Just 
>>>> serial.
>>>>
>>>> Now, case (3) looks odd to me. We say we are serial but we want to 
>>>> do stealing and termination. This case comes from CMRemarkTask. It 
>>>> takes a is_serial argument in its constructor. We get to case (3) 
>>>> from ConcurrentMark::checkpointRootsFinalWork(), where we do:
>>>>
>>>>   if (G1CollectedHeap::use_parallel_gc_threads()) {
>>>>     ...
>>>>   } else {
>>>>     ...
>>>>     CMRemarkTask remarkTask(this, active_workers, true /* 
>>>> is_serial*/);
>>>>   }
>>>>
>>>> To me this looks like it should lead to the pure serial case (1) 
>>>> instead of (3). Do you agree? In that case I think 
>>>> CMRemarkTask::work() needs to be changed to do:
>>>>
>>>>         task->do_marking_step(1000000000.0 /* something very large */,
>>>>                               !_is_serial /* do_stealing */,
>>>>                               !_is_serial /* do_termination */,
>>>>                               _is_serial);
>>>>
>>>
>>> You're right here. The serial remark case should not enable 
>>> stealing. It doesn't make sense. Done!
>>>
>>>>
>>>> The other case that sticks out is (4). We don't want to do 
>>>> stealing, but we want to do termination even though we are serial. 
>>>> This comes from how we set up the g1_drain_mark_stack in 
>>>> ConcurrentMark::weakRefsWork(). We pass that along to the reference 
>>>> processing, which I think is serial (wasn't that the original 
>>>> problem? :) ). We also pass the g1_keep_alive which is set up to be 
>>>> in state (1). So, I wonder if it is correct that the 
>>>> g1_drain_mark_stack is in state (4). If you agree that we should be 
>>>> using the pure serial case here too, I think we need to change 
>>>> G1CMDrainMarkingStackClosure::do_void() to be:
>>>>
>>>>       _task->do_marking_step(1000000000.0 /* something very large */,
>>>>                              !_is_serial /* do_stealing */,
>>>>                              !_is_serial /* do_termination */,
>>>>                              _is_serial  /* is_serial */);
>>>>
>>>>
>>>
>>> That is what we had before and I wasn't happy with it. The serial 
>>> drain closure is the last thing that get's executed as part of 
>>> reference processing - when process_phaseJNI invokes the 
>>> complete_gc.do_void() method. In the drain closure either serial or 
>>> parallel I think we want to go through the termination protocol. We 
>>> just don't want to interact with the terminator in the serial case. 
>>> There's a bunch of guarantees and checks in the termination protocol 
>>> and we should still go through them - even when serial.
>>>
>>>>
>>>> Again, I'm not sure about this, but if the above is correct we will 
>>>> be left with only two states. The parallel and the serial. In that 
>>>> case I think it would make sense to remove the do_stealing and 
>>>> do_termination arguments for do_marking_step() and just pass in the 
>>>> is_serial argument. Internally do_marking_step() could set up two 
>>>> local variable to track stealing and termination. That might be 
>>>> good if we see the need in the future to split these up again.
>>>
>>> I think we can get rid of the do_stealing parameter - it's always 
>>> the negation of the is_serial parameter and I like the idea of 
>>> making it a local. How about that? For example
>>>
>>> void CMTask::do_marking_step(double ms, bool do_termination, bool 
>>> is_serial) {
>>>   // It does not make sense to allow stealing when called serially.
>>>   bool do_stealing = !is_serial;
>>>
>>> Thanks,
>>>
>>> JohnC
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130318/7175a41c/attachment.htm>


More information about the hotspot-gc-dev mailing list