RFR(S): 8005032: G1: Cleanup serial reference processing closures in concurrent marking

Thu Jan 17 20:02:34 UTC 2013

Hi Bengt,

There's a new webrev at: http://cr.openjdk.java.net/~johnc/8005032/webrev.1/

It looks larger than the previous webrev but the most of the change was 
tweaking comments. The actual code changes are smaller.

Testing was the same as before.

On 1/15/2013 1:18 AM, Bengt Rutisson wrote:
>
> I see. I didn't think about the difference betweeen ParallelGCThreads 
> and ParallelRefProcEnabled. BTW, not part of this change, but why do 
> we have ParallelRefProcEnabled? And why is it false by default? 
> Wouldn't it make more sense to have it just be dependent on 
> ParallelGCThreads?

I don't know and the answer is probably lost in the dark depths of time 
- I can only speculate. For G1 we have a CR to turn 
ParallelRefProcEnabled on if the number of GC threads > 1. I'm not sure 
about the other collectors.

>
>> Setting it once in weakRefsWork() will not be sufficient. We will run 
>> into an assertion failure in 
>> ParallelTaskTerminator::offer_termination().
>>
>> During the reference processing, the do_void() method of the 
>> complete_gc oop closure (in our case the complete gc oop closure is 
>> an instance of G1CMParDrainMarkingStackClosure) is called multiple 
>> times (in process_phase1, sometimes process_phase2, process_phase3, 
>> and process_phaseJNI)
>>
>> Setting the phase sets the number of active tasks (or threads) that 
>> the termination protocol in do_marking_step() will wait for. When an 
>> invocation of do_marking_step() offers termination, the number of 
>> tasks/threads in the terminator instance is decremented. So Setting 
>> the phase once will let the first execution of do_marking_step (with 
>> termination) from process_phase1() succeed, but subsequent calls to 
>> do_marking_step() will result in the assertion failure.
>>
>> We also can't unconditionally set it in the do_void() method or even 
>> the constructor of G1CMParDrainMarkingStackClosure. Separate 
>> instances of this closure are created by each of the worker threads 
>> in the MT-case.
>>
>> Note when processing is multi-threaded the complete_gc instance used 
>> is the one passed into the ProcessTask's work method (passed into 
>> process_discovered_references() using the task executor instance) 
>> which may not necessarily be the same complete gc instance as the one 
>> passed directly into process_discovered_references().
>
> Thanks for this detailed explanation. It really helped!
>
> I understand the issue now, but I still think it is very confusing 
> that _cm->set_phase() is called from 
> G1CMRefProcTaskExecutor::execute() in the multithreaded case and from 
> G1CMParDrainMarkingStackClosure::do_void() in the single threaded case.
>
>> It might be possible to record whether processing is MT in the 
>> G1CMRefProcTaskExecutor class and always pass the executor instance 
>> into process_discovered_references. We could then set processing to 
>> MT so that the execute() methods in the executor instance are invoked 
>> but call the Proxy class' work method directly. Then we could 
>> override the set_single_threaded() routine (called just before 
>> process_phaseJNI) to set the phase.
>
> I think this would be a better solution, but if I understand it 
> correctly it would mean that we would have to change all the 
> collectors to always pass a TaskExecutor. All of them currently pass 
> NULL in the non-MT case. I think it would be simpler if they always 
> passed a TaskExecutor but it is a pretty big change.

I wasn't meaning to do that for the other collectors just G1's 
concurrent mark reference processor i.e. fool the ref processor into 
think it's MT so that the parallel task executor is used but only use 
the work gang if reference processing was _really_ MT.

I decided not to do this as there is an easier way. For the non-MT case 
we do not need to enter the termination protocol in 
CMTask::do_marking_step(). When there's only one thread we don't need to 
use the ParallelTaskTerminator to wait for other threads. And we 
certainly don't need stealing. Hence the solution is to only do the 
termination and stealing if the closure is instantiated for MT reference 
processing. That removes the set_phase call().

> Another possibility is to introduce some kind of prepare method to the 
> VoidClosure (or maybe in a specialized subclass for ref processing). 
> Then we could do something like:
>
>   complete_gc->prologue();
>   if (mt_processing) {
>     RefProcPhase2Task phase2(*this, refs_lists, !discovery_is_atomic() 
> /*marks_oops_alive*/);
>     task_executor->execute(phase2);
>   } else {
>     for (uint i = 0; i < _max_num_q; i++) {
>       process_phase2(refs_lists[i], is_alive, keep_alive, complete_gc);
>     }
>   }
>
> G1CMParDrainMarkingStackClosure::prologue() could do the call to 
> _cm->set_phase(). And G1CMRefProcTaskExecutor::execute() would not 
> have to do it.

The above is a reasonable extension to the reference processing code. I 
no longer need this feature for this change but we should submit a CR 
for it. I'll do that.

> BTW, not really part of your change, but above code is duplicated 
> three times in ReferenceProcessor::process_discovered_reflist(). Would 
> be nice to factor this out to a method.

Completely agree. Again I'll submit a CR for it.

Thanks,

JohnC