RFR(S): 8009536: G1: Apache Lucene hang during reference processing

Bengt Rutisson bengt.rutisson at oracle.com
Wed Mar 13 21:01:42 UTC 2013


Hi John,

Thanks for explaining a bit about the serial case where we still need to 
go through termination protocol.

As you know I'll be on vacation the next two days, so I just wanted to 
let you know that with the changes that you outline below (the fix to 
CMRemarkTask and removing the do_stealing argument) I'm fine with this 
change. Feel free to push this if you get another review.

Thanks,
Bengt

On 3/13/13 6:57 PM, John Cuthbertson wrote:
> Hi Bengt,
>
> Thanks for looking at the change again. Replies inline....
>
> On 3/13/2013 3:22 AM, Bengt Rutisson wrote:
>>
>>
>> Hi John,
>>
>> I'm not too familiar with the do_marking_step() code, so I'm trying 
>> to get to know it while I'm looking at your webrev. So, maybe I'm 
>> completely off here, but I think there are a couple of strange things.
>>
>> After your change do_marking_step() takes three bools: do_stealing, 
>> do_termination and is_serial.
>>
>> This means that we have 8 possible states if we combine all value of 
>> these bools. I went through all calls to do_marking_step() and it 
>> seems like we make use of 4 of those states:
>>
>>
>> G1CMKeepAliveAndDrainClosure::do_oop_work()
>>   do_marking_step(false, false, true)             (1)
>>
>> CMConcurrentMarkingTask::work()
>>   do_marking_step(true, true, false)              (2)
>>
>> CMRemarkTask::work()
>>   do_marking_step(true, true, false)              (2)
>>   do_marking_step(true, true, true)               (3)
>>
>> G1CMDrainMarkingStackClosure::do_void()
>>   do_marking_step(false, true, true)              (4)
>>   do_marking_step(true, true, false)              (2)
>>
>
> Yeah. Well...
>
>>
>> I numbered the states (1) - (4). Here are all states and I marked 
>> which ones we use:
>>
>>  stealing,  termination,  serial    (3)
>>  stealing, !termination,  serial
>>  stealing, !termination, !serial
>>  stealing,  termination, !serial    (2) parallel
>> !stealing,  termination,  serial    (4)
>> !stealing, !termination,  serial    (1) serial
>> !stealing, !termination, !serial
>> !stealing,  termination, !serial
>>
>>
>> I think state (2) makes sense as the parallel case. We do stealing 
>> and termination and we are not serial. State (1) also makes sense. It 
>> is clearly the serial case. No stealing or termination. Just serial.
>>
>> Now, case (3) looks odd to me. We say we are serial but we want to do 
>> stealing and termination. This case comes from CMRemarkTask. It takes 
>> a is_serial argument in its constructor. We get to case (3) from 
>> ConcurrentMark::checkpointRootsFinalWork(), where we do:
>>
>>   if (G1CollectedHeap::use_parallel_gc_threads()) {
>>     ...
>>   } else {
>>     ...
>>     CMRemarkTask remarkTask(this, active_workers, true /* is_serial*/);
>>   }
>>
>> To me this looks like it should lead to the pure serial case (1) 
>> instead of (3). Do you agree? In that case I think 
>> CMRemarkTask::work() needs to be changed to do:
>>
>>         task->do_marking_step(1000000000.0 /* something very large */,
>>                               !_is_serial /* do_stealing    */,
>>                               !_is_serial /* do_termination */,
>>                               _is_serial);
>>
>
> You're right here. The serial remark case should not enable stealing. 
> It doesn't make sense. Done!
>
>>
>> The other case that sticks out is (4). We don't want to do stealing, 
>> but we want to do termination even though we are serial. This comes 
>> from how we set up the g1_drain_mark_stack in 
>> ConcurrentMark::weakRefsWork(). We pass that along to the reference 
>> processing, which I think is serial (wasn't that the original 
>> problem? :) ). We also pass the g1_keep_alive which is set up to be 
>> in state (1). So, I wonder if it is correct that the 
>> g1_drain_mark_stack is in state (4). If you agree that we should be 
>> using the pure serial case here too, I think we need to change 
>> G1CMDrainMarkingStackClosure::do_void() to be:
>>
>>       _task->do_marking_step(1000000000.0 /* something very large */,
>>                              !_is_serial /* do_stealing    */,
>>                              !_is_serial /* do_termination */,
>>                              _is_serial  /* is_serial      */);
>>
>>
>
> That is what we had before and I wasn't happy with it. The serial 
> drain closure is the last thing that get's executed as part of 
> reference processing - when process_phaseJNI invokes the 
> complete_gc.do_void() method. In the drain closure either serial or 
> parallel I think we want to go through the termination protocol. We 
> just don't want to interact with the terminator in the serial case. 
> There's a bunch of guarantees and checks in the termination protocol 
> and we should still go through them - even when serial.
>
>>
>> Again, I'm not sure about this, but if the above is correct we will 
>> be left with only two states. The parallel and the serial. In that 
>> case I think it would make sense to remove the do_stealing and 
>> do_termination arguments for do_marking_step() and just pass in the 
>> is_serial argument. Internally do_marking_step() could set up two 
>> local variable to track stealing and termination. That might be good 
>> if we see the need in the future to split these up again.
>
> I think we can get rid of the do_stealing parameter - it's always the 
> negation of the is_serial parameter and I like the idea of making it a 
> local. How about that? For example
>
> void CMTask::do_marking_step(double ms, bool do_termination, bool 
> is_serial) {
>   // It does not make sense to allow stealing when called serially.
>   bool do_stealing = !is_serial;
>
> Thanks,
>
> JohnC




More information about the hotspot-gc-dev mailing list