RFR(S): 8009536: G1: Apache Lucene hang during reference processing
John Cuthbertson
john.cuthbertson at oracle.com
Wed Mar 13 17:57:25 UTC 2013
Hi Bengt,
Thanks for looking at the change again. Replies inline....
On 3/13/2013 3:22 AM, Bengt Rutisson wrote:
>
>
> Hi John,
>
> I'm not too familiar with the do_marking_step() code, so I'm trying to
> get to know it while I'm looking at your webrev. So, maybe I'm
> completely off here, but I think there are a couple of strange things.
>
> After your change do_marking_step() takes three bools: do_stealing,
> do_termination and is_serial.
>
> This means that we have 8 possible states if we combine all value of
> these bools. I went through all calls to do_marking_step() and it
> seems like we make use of 4 of those states:
>
>
> G1CMKeepAliveAndDrainClosure::do_oop_work()
> do_marking_step(false, false, true) (1)
>
> CMConcurrentMarkingTask::work()
> do_marking_step(true, true, false) (2)
>
> CMRemarkTask::work()
> do_marking_step(true, true, false) (2)
> do_marking_step(true, true, true) (3)
>
> G1CMDrainMarkingStackClosure::do_void()
> do_marking_step(false, true, true) (4)
> do_marking_step(true, true, false) (2)
>
Yeah. Well...
>
> I numbered the states (1) - (4). Here are all states and I marked
> which ones we use:
>
> stealing, termination, serial (3)
> stealing, !termination, serial
> stealing, !termination, !serial
> stealing, termination, !serial (2) parallel
> !stealing, termination, serial (4)
> !stealing, !termination, serial (1) serial
> !stealing, !termination, !serial
> !stealing, termination, !serial
>
>
> I think state (2) makes sense as the parallel case. We do stealing and
> termination and we are not serial. State (1) also makes sense. It is
> clearly the serial case. No stealing or termination. Just serial.
>
> Now, case (3) looks odd to me. We say we are serial but we want to do
> stealing and termination. This case comes from CMRemarkTask. It takes
> a is_serial argument in its constructor. We get to case (3) from
> ConcurrentMark::checkpointRootsFinalWork(), where we do:
>
> if (G1CollectedHeap::use_parallel_gc_threads()) {
> ...
> } else {
> ...
> CMRemarkTask remarkTask(this, active_workers, true /* is_serial*/);
> }
>
> To me this looks like it should lead to the pure serial case (1)
> instead of (3). Do you agree? In that case I think
> CMRemarkTask::work() needs to be changed to do:
>
> task->do_marking_step(1000000000.0 /* something very large */,
> !_is_serial /* do_stealing */,
> !_is_serial /* do_termination */,
> _is_serial);
>
You're right here. The serial remark case should not enable stealing. It
doesn't make sense. Done!
>
> The other case that sticks out is (4). We don't want to do stealing,
> but we want to do termination even though we are serial. This comes
> from how we set up the g1_drain_mark_stack in
> ConcurrentMark::weakRefsWork(). We pass that along to the reference
> processing, which I think is serial (wasn't that the original problem?
> :) ). We also pass the g1_keep_alive which is set up to be in state
> (1). So, I wonder if it is correct that the g1_drain_mark_stack is in
> state (4). If you agree that we should be using the pure serial case
> here too, I think we need to change
> G1CMDrainMarkingStackClosure::do_void() to be:
>
> _task->do_marking_step(1000000000.0 /* something very large */,
> !_is_serial /* do_stealing */,
> !_is_serial /* do_termination */,
> _is_serial /* is_serial */);
>
>
That is what we had before and I wasn't happy with it. The serial drain
closure is the last thing that get's executed as part of reference
processing - when process_phaseJNI invokes the complete_gc.do_void()
method. In the drain closure either serial or parallel I think we want
to go through the termination protocol. We just don't want to interact
with the terminator in the serial case. There's a bunch of guarantees
and checks in the termination protocol and we should still go through
them - even when serial.
>
> Again, I'm not sure about this, but if the above is correct we will be
> left with only two states. The parallel and the serial. In that case I
> think it would make sense to remove the do_stealing and do_termination
> arguments for do_marking_step() and just pass in the is_serial
> argument. Internally do_marking_step() could set up two local variable
> to track stealing and termination. That might be good if we see the
> need in the future to split these up again.
I think we can get rid of the do_stealing parameter - it's always the
negation of the is_serial parameter and I like the idea of making it a
local. How about that? For example
void CMTask::do_marking_step(double ms, bool do_termination, bool
is_serial) {
// It does not make sense to allow stealing when called serially.
bool do_stealing = !is_serial;
Thanks,
JohnC
More information about the hotspot-gc-dev
mailing list