RFR(S): 8009536: G1: Apache Lucene hang during reference processing
Bengt Rutisson
bengt.rutisson at oracle.com
Mon Mar 18 11:40:24 UTC 2013
Hi John,
This looks good to me. Just like I thought ;)
One nit if you feel like fixing it before you push:
In a couple places you have this pattern:
2240 _task->do_marking_step(mark_step_duration_ms,
2241 false /* do_termination */,
2242 _is_serial /* is_serial */);
I think the comment /* is_serial */ is superfluous now that we pass a
variable named _is_serial instead of just true or false.
Bengt
On 3/14/13 6:19 PM, John Cuthbertson wrote:
> Hi Everyone,
>
> Here's the final webrev based upon Bengt's comments:
>
> http://cr.openjdk.java.net/~johnc/8009536/webrev.2/
>
> I think I'm all set after reviews from Bengt and Thomas - I just
> wanted to send out the final version of the webrev in case anyone else
> has other comments.
>
> Thanks,
>
> JohnC
>
> On 3/13/2013 2:01 PM, Bengt Rutisson wrote:
>>
>> Hi John,
>>
>> Thanks for explaining a bit about the serial case where we still need
>> to go through termination protocol.
>>
>> As you know I'll be on vacation the next two days, so I just wanted
>> to let you know that with the changes that you outline below (the fix
>> to CMRemarkTask and removing the do_stealing argument) I'm fine with
>> this change. Feel free to push this if you get another review.
>>
>> Thanks,
>> Bengt
>>
>> On 3/13/13 6:57 PM, John Cuthbertson wrote:
>>> Hi Bengt,
>>>
>>> Thanks for looking at the change again. Replies inline....
>>>
>>> On 3/13/2013 3:22 AM, Bengt Rutisson wrote:
>>>>
>>>>
>>>> Hi John,
>>>>
>>>> I'm not too familiar with the do_marking_step() code, so I'm trying
>>>> to get to know it while I'm looking at your webrev. So, maybe I'm
>>>> completely off here, but I think there are a couple of strange things.
>>>>
>>>> After your change do_marking_step() takes three bools: do_stealing,
>>>> do_termination and is_serial.
>>>>
>>>> This means that we have 8 possible states if we combine all value
>>>> of these bools. I went through all calls to do_marking_step() and
>>>> it seems like we make use of 4 of those states:
>>>>
>>>>
>>>> G1CMKeepAliveAndDrainClosure::do_oop_work()
>>>> do_marking_step(false, false, true) (1)
>>>>
>>>> CMConcurrentMarkingTask::work()
>>>> do_marking_step(true, true, false) (2)
>>>>
>>>> CMRemarkTask::work()
>>>> do_marking_step(true, true, false) (2)
>>>> do_marking_step(true, true, true) (3)
>>>>
>>>> G1CMDrainMarkingStackClosure::do_void()
>>>> do_marking_step(false, true, true) (4)
>>>> do_marking_step(true, true, false) (2)
>>>>
>>>
>>> Yeah. Well...
>>>
>>>>
>>>> I numbered the states (1) - (4). Here are all states and I marked
>>>> which ones we use:
>>>>
>>>> stealing, termination, serial (3)
>>>> stealing, !termination, serial
>>>> stealing, !termination, !serial
>>>> stealing, termination, !serial (2) parallel
>>>> !stealing, termination, serial (4)
>>>> !stealing, !termination, serial (1) serial
>>>> !stealing, !termination, !serial
>>>> !stealing, termination, !serial
>>>>
>>>>
>>>> I think state (2) makes sense as the parallel case. We do stealing
>>>> and termination and we are not serial. State (1) also makes sense.
>>>> It is clearly the serial case. No stealing or termination. Just
>>>> serial.
>>>>
>>>> Now, case (3) looks odd to me. We say we are serial but we want to
>>>> do stealing and termination. This case comes from CMRemarkTask. It
>>>> takes a is_serial argument in its constructor. We get to case (3)
>>>> from ConcurrentMark::checkpointRootsFinalWork(), where we do:
>>>>
>>>> if (G1CollectedHeap::use_parallel_gc_threads()) {
>>>> ...
>>>> } else {
>>>> ...
>>>> CMRemarkTask remarkTask(this, active_workers, true /*
>>>> is_serial*/);
>>>> }
>>>>
>>>> To me this looks like it should lead to the pure serial case (1)
>>>> instead of (3). Do you agree? In that case I think
>>>> CMRemarkTask::work() needs to be changed to do:
>>>>
>>>> task->do_marking_step(1000000000.0 /* something very large */,
>>>> !_is_serial /* do_stealing */,
>>>> !_is_serial /* do_termination */,
>>>> _is_serial);
>>>>
>>>
>>> You're right here. The serial remark case should not enable
>>> stealing. It doesn't make sense. Done!
>>>
>>>>
>>>> The other case that sticks out is (4). We don't want to do
>>>> stealing, but we want to do termination even though we are serial.
>>>> This comes from how we set up the g1_drain_mark_stack in
>>>> ConcurrentMark::weakRefsWork(). We pass that along to the reference
>>>> processing, which I think is serial (wasn't that the original
>>>> problem? :) ). We also pass the g1_keep_alive which is set up to be
>>>> in state (1). So, I wonder if it is correct that the
>>>> g1_drain_mark_stack is in state (4). If you agree that we should be
>>>> using the pure serial case here too, I think we need to change
>>>> G1CMDrainMarkingStackClosure::do_void() to be:
>>>>
>>>> _task->do_marking_step(1000000000.0 /* something very large */,
>>>> !_is_serial /* do_stealing */,
>>>> !_is_serial /* do_termination */,
>>>> _is_serial /* is_serial */);
>>>>
>>>>
>>>
>>> That is what we had before and I wasn't happy with it. The serial
>>> drain closure is the last thing that get's executed as part of
>>> reference processing - when process_phaseJNI invokes the
>>> complete_gc.do_void() method. In the drain closure either serial or
>>> parallel I think we want to go through the termination protocol. We
>>> just don't want to interact with the terminator in the serial case.
>>> There's a bunch of guarantees and checks in the termination protocol
>>> and we should still go through them - even when serial.
>>>
>>>>
>>>> Again, I'm not sure about this, but if the above is correct we will
>>>> be left with only two states. The parallel and the serial. In that
>>>> case I think it would make sense to remove the do_stealing and
>>>> do_termination arguments for do_marking_step() and just pass in the
>>>> is_serial argument. Internally do_marking_step() could set up two
>>>> local variable to track stealing and termination. That might be
>>>> good if we see the need in the future to split these up again.
>>>
>>> I think we can get rid of the do_stealing parameter - it's always
>>> the negation of the is_serial parameter and I like the idea of
>>> making it a local. How about that? For example
>>>
>>> void CMTask::do_marking_step(double ms, bool do_termination, bool
>>> is_serial) {
>>> // It does not make sense to allow stealing when called serially.
>>> bool do_stealing = !is_serial;
>>>
>>> Thanks,
>>>
>>> JohnC
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130318/7175a41c/attachment.htm>
More information about the hotspot-gc-dev
mailing list