RFR(S): 8009536: G1: Apache Lucene hang during reference processing

John Cuthbertson john.cuthbertson at oracle.com
Wed Mar 13 17:57:25 UTC 2013


Hi Bengt,

Thanks for looking at the change again. Replies inline....

On 3/13/2013 3:22 AM, Bengt Rutisson wrote:
>
>
> Hi John,
>
> I'm not too familiar with the do_marking_step() code, so I'm trying to 
> get to know it while I'm looking at your webrev. So, maybe I'm 
> completely off here, but I think there are a couple of strange things.
>
> After your change do_marking_step() takes three bools: do_stealing, 
> do_termination and is_serial.
>
> This means that we have 8 possible states if we combine all value of 
> these bools. I went through all calls to do_marking_step() and it 
> seems like we make use of 4 of those states:
>
>
> G1CMKeepAliveAndDrainClosure::do_oop_work()
>   do_marking_step(false, false, true)             (1)
>
> CMConcurrentMarkingTask::work()
>   do_marking_step(true, true, false)              (2)
>
> CMRemarkTask::work()
>   do_marking_step(true, true, false)              (2)
>   do_marking_step(true, true, true)               (3)
>
> G1CMDrainMarkingStackClosure::do_void()
>   do_marking_step(false, true, true)              (4)
>   do_marking_step(true, true, false)              (2)
>

Yeah. Well...

>
> I numbered the states (1) - (4). Here are all states and I marked 
> which ones we use:
>
>  stealing,  termination,  serial    (3)
>  stealing, !termination,  serial
>  stealing, !termination, !serial
>  stealing,  termination, !serial    (2) parallel
> !stealing,  termination,  serial    (4)
> !stealing, !termination,  serial    (1) serial
> !stealing, !termination, !serial
> !stealing,  termination, !serial
>
>
> I think state (2) makes sense as the parallel case. We do stealing and 
> termination and we are not serial. State (1) also makes sense. It is 
> clearly the serial case. No stealing or termination. Just serial.
>
> Now, case (3) looks odd to me. We say we are serial but we want to do 
> stealing and termination. This case comes from CMRemarkTask. It takes 
> a is_serial argument in its constructor. We get to case (3) from 
> ConcurrentMark::checkpointRootsFinalWork(), where we do:
>
>   if (G1CollectedHeap::use_parallel_gc_threads()) {
>     ...
>   } else {
>     ...
>     CMRemarkTask remarkTask(this, active_workers, true /* is_serial*/);
>   }
>
> To me this looks like it should lead to the pure serial case (1) 
> instead of (3). Do you agree? In that case I think 
> CMRemarkTask::work() needs to be changed to do:
>
>         task->do_marking_step(1000000000.0 /* something very large */,
>                               !_is_serial /* do_stealing    */,
>                               !_is_serial /* do_termination */,
>                               _is_serial);
>

You're right here. The serial remark case should not enable stealing. It 
doesn't make sense. Done!

>
> The other case that sticks out is (4). We don't want to do stealing, 
> but we want to do termination even though we are serial. This comes 
> from how we set up the g1_drain_mark_stack in 
> ConcurrentMark::weakRefsWork(). We pass that along to the reference 
> processing, which I think is serial (wasn't that the original problem? 
> :) ). We also pass the g1_keep_alive which is set up to be in state 
> (1). So, I wonder if it is correct that the g1_drain_mark_stack is in 
> state (4). If you agree that we should be using the pure serial case 
> here too, I think we need to change 
> G1CMDrainMarkingStackClosure::do_void() to be:
>
>       _task->do_marking_step(1000000000.0 /* something very large */,
>                              !_is_serial /* do_stealing    */,
>                              !_is_serial /* do_termination */,
>                              _is_serial  /* is_serial      */);
>
>

That is what we had before and I wasn't happy with it. The serial drain 
closure is the last thing that get's executed as part of reference 
processing - when process_phaseJNI invokes the complete_gc.do_void() 
method. In the drain closure either serial or parallel I think we want 
to go through the termination protocol. We just don't want to interact 
with the terminator in the serial case. There's a bunch of guarantees 
and checks in the termination protocol and we should still go through 
them - even when serial.

>
> Again, I'm not sure about this, but if the above is correct we will be 
> left with only two states. The parallel and the serial. In that case I 
> think it would make sense to remove the do_stealing and do_termination 
> arguments for do_marking_step() and just pass in the is_serial 
> argument. Internally do_marking_step() could set up two local variable 
> to track stealing and termination. That might be good if we see the 
> need in the future to split these up again.

I think we can get rid of the do_stealing parameter - it's always the 
negation of the is_serial parameter and I like the idea of making it a 
local. How about that? For example

void CMTask::do_marking_step(double ms, bool do_termination, bool 
is_serial) {
   // It does not make sense to allow stealing when called serially.
   bool do_stealing = !is_serial;

Thanks,

JohnC



More information about the hotspot-gc-dev mailing list