RFR(S): 8005032: G1: Cleanup serial reference processing closures in concurrent marking

Wed Jan 30 17:07:49 UTC 2013

Hi Bengt,

Thanks. And I agree these changes look better. Thanks for insisting. :)

JohnC

On 1/30/2013 5:14 AM, Bengt Rutisson wrote:
>
> Hi John,
>
> Thanks for doing these changes! Looks good.
>
> Ship it!
> Bengt
>
>
> On 1/17/13 9:02 PM, John Cuthbertson wrote:
>> Hi Bengt,
>>
>> There's a new webrev at: 
>> http://cr.openjdk.java.net/~johnc/8005032/webrev.1/
>>
>> It looks larger than the previous webrev but the most of the change 
>> was tweaking comments. The actual code changes are smaller.
>>
>> Testing was the same as before.
>>
>> On 1/15/2013 1:18 AM, Bengt Rutisson wrote:
>>>
>>> I see. I didn't think about the difference betweeen 
>>> ParallelGCThreads and ParallelRefProcEnabled. BTW, not part of this 
>>> change, but why do we have ParallelRefProcEnabled? And why is it 
>>> false by default? Wouldn't it make more sense to have it just be 
>>> dependent on ParallelGCThreads?
>>
>> I don't know and the answer is probably lost in the dark depths of 
>> time - I can only speculate. For G1 we have a CR to turn 
>> ParallelRefProcEnabled on if the number of GC threads > 1. I'm not 
>> sure about the other collectors.
>>
>>>
>>>> Setting it once in weakRefsWork() will not be sufficient. We will 
>>>> run into an assertion failure in 
>>>> ParallelTaskTerminator::offer_termination().
>>>>
>>>> During the reference processing, the do_void() method of the 
>>>> complete_gc oop closure (in our case the complete gc oop closure is 
>>>> an instance of G1CMParDrainMarkingStackClosure) is called multiple 
>>>> times (in process_phase1, sometimes process_phase2, process_phase3, 
>>>> and process_phaseJNI)
>>>>
>>>> Setting the phase sets the number of active tasks (or threads) that 
>>>> the termination protocol in do_marking_step() will wait for. When 
>>>> an invocation of do_marking_step() offers termination, the number 
>>>> of tasks/threads in the terminator instance is decremented. So 
>>>> Setting the phase once will let the first execution of 
>>>> do_marking_step (with termination) from process_phase1() succeed, 
>>>> but subsequent calls to do_marking_step() will result in the 
>>>> assertion failure.
>>>>
>>>> We also can't unconditionally set it in the do_void() method or 
>>>> even the constructor of G1CMParDrainMarkingStackClosure. Separate 
>>>> instances of this closure are created by each of the worker threads 
>>>> in the MT-case.
>>>>
>>>> Note when processing is multi-threaded the complete_gc instance 
>>>> used is the one passed into the ProcessTask's work method (passed 
>>>> into process_discovered_references() using the task executor 
>>>> instance) which may not necessarily be the same complete gc 
>>>> instance as the one passed directly into 
>>>> process_discovered_references().
>>>
>>> Thanks for this detailed explanation. It really helped!
>>>
>>> I understand the issue now, but I still think it is very confusing 
>>> that _cm->set_phase() is called from 
>>> G1CMRefProcTaskExecutor::execute() in the multithreaded case and 
>>> from G1CMParDrainMarkingStackClosure::do_void() in the single 
>>> threaded case.
>>>
>>>> It might be possible to record whether processing is MT in the 
>>>> G1CMRefProcTaskExecutor class and always pass the executor instance 
>>>> into process_discovered_references. We could then set processing to 
>>>> MT so that the execute() methods in the executor instance are 
>>>> invoked but call the Proxy class' work method directly. Then we 
>>>> could override the set_single_threaded() routine (called just 
>>>> before process_phaseJNI) to set the phase.
>>>
>>> I think this would be a better solution, but if I understand it 
>>> correctly it would mean that we would have to change all the 
>>> collectors to always pass a TaskExecutor. All of them currently pass 
>>> NULL in the non-MT case. I think it would be simpler if they always 
>>> passed a TaskExecutor but it is a pretty big change.
>>
>> I wasn't meaning to do that for the other collectors just G1's 
>> concurrent mark reference processor i.e. fool the ref processor into 
>> think it's MT so that the parallel task executor is used but only use 
>> the work gang if reference processing was _really_ MT.
>>
>> I decided not to do this as there is an easier way. For the non-MT 
>> case we do not need to enter the termination protocol in 
>> CMTask::do_marking_step(). When there's only one thread we don't need 
>> to use the ParallelTaskTerminator to wait for other threads. And we 
>> certainly don't need stealing. Hence the solution is to only do the 
>> termination and stealing if the closure is instantiated for MT 
>> reference processing. That removes the set_phase call().
>>
>>> Another possibility is to introduce some kind of prepare method to 
>>> the VoidClosure (or maybe in a specialized subclass for ref 
>>> processing). Then we could do something like:
>>>
>>>   complete_gc->prologue();
>>>   if (mt_processing) {
>>>     RefProcPhase2Task phase2(*this, refs_lists, 
>>> !discovery_is_atomic() /*marks_oops_alive*/);
>>>     task_executor->execute(phase2);
>>>   } else {
>>>     for (uint i = 0; i < _max_num_q; i++) {
>>>       process_phase2(refs_lists[i], is_alive, keep_alive, complete_gc);
>>>     }
>>>   }
>>>
>>> G1CMParDrainMarkingStackClosure::prologue() could do the call to 
>>> _cm->set_phase(). And G1CMRefProcTaskExecutor::execute() would not 
>>> have to do it.
>>
>> The above is a reasonable extension to the reference processing code. 
>> I no longer need this feature for this change but we should submit a 
>> CR for it. I'll do that.
>>
>>> BTW, not really part of your change, but above code is duplicated 
>>> three times in ReferenceProcessor::process_discovered_reflist(). 
>>> Would be nice to factor this out to a method.
>>
>> Completely agree. Again I'll submit a CR for it.
>>
>> Thanks,
>>
>> JohnC
>