RFR(S): 8009536: G1: Apache Lucene hang during reference processing
Bengt Rutisson
bengt.rutisson at oracle.com
Wed Mar 13 21:01:42 UTC 2013
Hi John,
Thanks for explaining a bit about the serial case where we still need to
go through termination protocol.
As you know I'll be on vacation the next two days, so I just wanted to
let you know that with the changes that you outline below (the fix to
CMRemarkTask and removing the do_stealing argument) I'm fine with this
change. Feel free to push this if you get another review.
Thanks,
Bengt
On 3/13/13 6:57 PM, John Cuthbertson wrote:
> Hi Bengt,
>
> Thanks for looking at the change again. Replies inline....
>
> On 3/13/2013 3:22 AM, Bengt Rutisson wrote:
>>
>>
>> Hi John,
>>
>> I'm not too familiar with the do_marking_step() code, so I'm trying
>> to get to know it while I'm looking at your webrev. So, maybe I'm
>> completely off here, but I think there are a couple of strange things.
>>
>> After your change do_marking_step() takes three bools: do_stealing,
>> do_termination and is_serial.
>>
>> This means that we have 8 possible states if we combine all value of
>> these bools. I went through all calls to do_marking_step() and it
>> seems like we make use of 4 of those states:
>>
>>
>> G1CMKeepAliveAndDrainClosure::do_oop_work()
>> do_marking_step(false, false, true) (1)
>>
>> CMConcurrentMarkingTask::work()
>> do_marking_step(true, true, false) (2)
>>
>> CMRemarkTask::work()
>> do_marking_step(true, true, false) (2)
>> do_marking_step(true, true, true) (3)
>>
>> G1CMDrainMarkingStackClosure::do_void()
>> do_marking_step(false, true, true) (4)
>> do_marking_step(true, true, false) (2)
>>
>
> Yeah. Well...
>
>>
>> I numbered the states (1) - (4). Here are all states and I marked
>> which ones we use:
>>
>> stealing, termination, serial (3)
>> stealing, !termination, serial
>> stealing, !termination, !serial
>> stealing, termination, !serial (2) parallel
>> !stealing, termination, serial (4)
>> !stealing, !termination, serial (1) serial
>> !stealing, !termination, !serial
>> !stealing, termination, !serial
>>
>>
>> I think state (2) makes sense as the parallel case. We do stealing
>> and termination and we are not serial. State (1) also makes sense. It
>> is clearly the serial case. No stealing or termination. Just serial.
>>
>> Now, case (3) looks odd to me. We say we are serial but we want to do
>> stealing and termination. This case comes from CMRemarkTask. It takes
>> a is_serial argument in its constructor. We get to case (3) from
>> ConcurrentMark::checkpointRootsFinalWork(), where we do:
>>
>> if (G1CollectedHeap::use_parallel_gc_threads()) {
>> ...
>> } else {
>> ...
>> CMRemarkTask remarkTask(this, active_workers, true /* is_serial*/);
>> }
>>
>> To me this looks like it should lead to the pure serial case (1)
>> instead of (3). Do you agree? In that case I think
>> CMRemarkTask::work() needs to be changed to do:
>>
>> task->do_marking_step(1000000000.0 /* something very large */,
>> !_is_serial /* do_stealing */,
>> !_is_serial /* do_termination */,
>> _is_serial);
>>
>
> You're right here. The serial remark case should not enable stealing.
> It doesn't make sense. Done!
>
>>
>> The other case that sticks out is (4). We don't want to do stealing,
>> but we want to do termination even though we are serial. This comes
>> from how we set up the g1_drain_mark_stack in
>> ConcurrentMark::weakRefsWork(). We pass that along to the reference
>> processing, which I think is serial (wasn't that the original
>> problem? :) ). We also pass the g1_keep_alive which is set up to be
>> in state (1). So, I wonder if it is correct that the
>> g1_drain_mark_stack is in state (4). If you agree that we should be
>> using the pure serial case here too, I think we need to change
>> G1CMDrainMarkingStackClosure::do_void() to be:
>>
>> _task->do_marking_step(1000000000.0 /* something very large */,
>> !_is_serial /* do_stealing */,
>> !_is_serial /* do_termination */,
>> _is_serial /* is_serial */);
>>
>>
>
> That is what we had before and I wasn't happy with it. The serial
> drain closure is the last thing that get's executed as part of
> reference processing - when process_phaseJNI invokes the
> complete_gc.do_void() method. In the drain closure either serial or
> parallel I think we want to go through the termination protocol. We
> just don't want to interact with the terminator in the serial case.
> There's a bunch of guarantees and checks in the termination protocol
> and we should still go through them - even when serial.
>
>>
>> Again, I'm not sure about this, but if the above is correct we will
>> be left with only two states. The parallel and the serial. In that
>> case I think it would make sense to remove the do_stealing and
>> do_termination arguments for do_marking_step() and just pass in the
>> is_serial argument. Internally do_marking_step() could set up two
>> local variable to track stealing and termination. That might be good
>> if we see the need in the future to split these up again.
>
> I think we can get rid of the do_stealing parameter - it's always the
> negation of the is_serial parameter and I like the idea of making it a
> local. How about that? For example
>
> void CMTask::do_marking_step(double ms, bool do_termination, bool
> is_serial) {
> // It does not make sense to allow stealing when called serially.
> bool do_stealing = !is_serial;
>
> Thanks,
>
> JohnC
More information about the hotspot-gc-dev
mailing list