RFR(S): 8005032: G1: Cleanup serial reference processing closures in concurrent marking

Charlie Hunt chunt at salesforce.com
Thu Jan 17 20:52:57 UTC 2013


John / Bengt:

I think I can offer a bit of info on Bengt's earlier question about ParallelProcRefEnabled being disabled by default.

IIRC, there was one workload that showed a slight perf regression with +ParallelProcRefEnabled.  That workload that showed a regression may not be as relevant as it was back when the evaluation / decision was made to disable it by default?

You both have probably thought about this already?  My reaction is ... I think reasonable defaults would be to enable +ParallelProcRefEnabled for Parallel[Old], CMS and G1 when ParallelGCThreads is greater than 1, and disable -ParallelProcRefEnabled with -XX:+UseSerialGC.

hths,

charlie ...

On Jan 17, 2013, at 3:02 PM, John Cuthbertson wrote:

> Hi Bengt,
> 
> There's a new webrev at: http://cr.openjdk.java.net/~johnc/8005032/webrev.1/
> 
> It looks larger than the previous webrev but the most of the change was 
> tweaking comments. The actual code changes are smaller.
> 
> Testing was the same as before.
> 
> On 1/15/2013 1:18 AM, Bengt Rutisson wrote:
>> 
>> I see. I didn't think about the difference betweeen ParallelGCThreads 
>> and ParallelRefProcEnabled. BTW, not part of this change, but why do 
>> we have ParallelRefProcEnabled? And why is it false by default? 
>> Wouldn't it make more sense to have it just be dependent on 
>> ParallelGCThreads?
> 
> I don't know and the answer is probably lost in the dark depths of time 
> - I can only speculate. For G1 we have a CR to turn 
> ParallelRefProcEnabled on if the number of GC threads > 1. I'm not sure 
> about the other collectors.
> 
>> 
>>> Setting it once in weakRefsWork() will not be sufficient. We will run 
>>> into an assertion failure in 
>>> ParallelTaskTerminator::offer_termination().
>>> 
>>> During the reference processing, the do_void() method of the 
>>> complete_gc oop closure (in our case the complete gc oop closure is 
>>> an instance of G1CMParDrainMarkingStackClosure) is called multiple 
>>> times (in process_phase1, sometimes process_phase2, process_phase3, 
>>> and process_phaseJNI)
>>> 
>>> Setting the phase sets the number of active tasks (or threads) that 
>>> the termination protocol in do_marking_step() will wait for. When an 
>>> invocation of do_marking_step() offers termination, the number of 
>>> tasks/threads in the terminator instance is decremented. So Setting 
>>> the phase once will let the first execution of do_marking_step (with 
>>> termination) from process_phase1() succeed, but subsequent calls to 
>>> do_marking_step() will result in the assertion failure.
>>> 
>>> We also can't unconditionally set it in the do_void() method or even 
>>> the constructor of G1CMParDrainMarkingStackClosure. Separate 
>>> instances of this closure are created by each of the worker threads 
>>> in the MT-case.
>>> 
>>> Note when processing is multi-threaded the complete_gc instance used 
>>> is the one passed into the ProcessTask's work method (passed into 
>>> process_discovered_references() using the task executor instance) 
>>> which may not necessarily be the same complete gc instance as the one 
>>> passed directly into process_discovered_references().
>> 
>> Thanks for this detailed explanation. It really helped!
>> 
>> I understand the issue now, but I still think it is very confusing 
>> that _cm->set_phase() is called from 
>> G1CMRefProcTaskExecutor::execute() in the multithreaded case and from 
>> G1CMParDrainMarkingStackClosure::do_void() in the single threaded case.
>> 
>>> It might be possible to record whether processing is MT in the 
>>> G1CMRefProcTaskExecutor class and always pass the executor instance 
>>> into process_discovered_references. We could then set processing to 
>>> MT so that the execute() methods in the executor instance are invoked 
>>> but call the Proxy class' work method directly. Then we could 
>>> override the set_single_threaded() routine (called just before 
>>> process_phaseJNI) to set the phase.
>> 
>> I think this would be a better solution, but if I understand it 
>> correctly it would mean that we would have to change all the 
>> collectors to always pass a TaskExecutor. All of them currently pass 
>> NULL in the non-MT case. I think it would be simpler if they always 
>> passed a TaskExecutor but it is a pretty big change.
> 
> I wasn't meaning to do that for the other collectors just G1's 
> concurrent mark reference processor i.e. fool the ref processor into 
> think it's MT so that the parallel task executor is used but only use 
> the work gang if reference processing was _really_ MT.
> 
> I decided not to do this as there is an easier way. For the non-MT case 
> we do not need to enter the termination protocol in 
> CMTask::do_marking_step(). When there's only one thread we don't need to 
> use the ParallelTaskTerminator to wait for other threads. And we 
> certainly don't need stealing. Hence the solution is to only do the 
> termination and stealing if the closure is instantiated for MT reference 
> processing. That removes the set_phase call().
> 
>> Another possibility is to introduce some kind of prepare method to the 
>> VoidClosure (or maybe in a specialized subclass for ref processing). 
>> Then we could do something like:
>> 
>>  complete_gc->prologue();
>>  if (mt_processing) {
>>    RefProcPhase2Task phase2(*this, refs_lists, !discovery_is_atomic() 
>> /*marks_oops_alive*/);
>>    task_executor->execute(phase2);
>>  } else {
>>    for (uint i = 0; i < _max_num_q; i++) {
>>      process_phase2(refs_lists[i], is_alive, keep_alive, complete_gc);
>>    }
>>  }
>> 
>> G1CMParDrainMarkingStackClosure::prologue() could do the call to 
>> _cm->set_phase(). And G1CMRefProcTaskExecutor::execute() would not 
>> have to do it.
> 
> The above is a reasonable extension to the reference processing code. I 
> no longer need this feature for this change but we should submit a CR 
> for it. I'll do that.
> 
>> BTW, not really part of your change, but above code is duplicated 
>> three times in ReferenceProcessor::process_discovered_reflist(). Would 
>> be nice to factor this out to a method.
> 
> Completely agree. Again I'll submit a CR for it.
> 
> Thanks,
> 
> JohnC




More information about the hotspot-gc-dev mailing list