RFR(S): 8005032: G1: Cleanup serial reference processing closures in concurrent marking

Tue Jan 15 09:18:56 UTC 2013

Hi John,

Sorry for being late in replying. Thanks for taking the time to explain 
this in detail! :)

A couple of comments inline.

On 1/11/13 12:23 AM, John Cuthbertson wrote:
> Hi Bengt,
>
> Thanks for looking over the code changes. Be prepared for some gory 
> details. :) Replies inline...
>
> On 1/9/2013 7:35 AM, Bengt Rutisson wrote:
>>
>> In ConcurrentMark::weakRefsWork() there is this code:
>>
>> 2422     if (rp->processing_is_mt()) {
>> 2423       // Set the degree of MT here.  If the discovery is done 
>> MT, there
>> 2424       // may have been a different number of threads doing the 
>> discovery
>> 2425       // and a different number of discovered lists may have Ref 
>> objects.
>> 2426       // That is OK as long as the Reference lists are balanced 
>> (see
>> 2427       // balance_all_queues() and balance_queues()).
>> 2428       rp->set_active_mt_degree(active_workers);
>> 2429     }
>>
>> Could we now always call rp->set_active_mt_degree() ? Maybe I am 
>> missing the point here, but I thought that we are now using queues 
>> and all rp->set_active_mt_degree() does is set the number of queues 
>> to active_workers. Which will be 1 for the single threaded mode.
>
> Yes - most likely we can but I would prefer to  set active_workers using:
>
>     // We need at least one active thread. If reference processing is
>     // not multi-threaded we use the current (ConcurrentMarkThread) 
> thread,
>     // otherwise we use the work gang from the G1CollectedHeap and we
>     // utilize all the worker threads we can.
>     uint active_workers = (rp->processing_is_mt() && g1h->workers() != 
> NULL
>                                 ? g1h->workers()->active_workers()
>                                 : 1U);
>
> since single threaded versus multi-threaded reference processing is 
> determined using the ParallelRefProcEnabled flag. The number of active 
> workers here is the number of workers in G1's STW work gang - which is 
> controlled via ParallelGCThreads.

I see. I didn't think about the difference betweeen ParallelGCThreads 
and ParallelRefProcEnabled. BTW, not part of this change, but why do we 
have ParallelRefProcEnabled? And why is it false by default? Wouldn't it 
make more sense to have it just be dependent on ParallelGCThreads?

>>
>> If we do that we can also remove the first part of the assert a bit 
>> further down:
>>
>> 2449     assert(!rp->processing_is_mt() || rp->num_q() == 
>> active_workers, "why not");
>
> OK. Done.
>
>>
>> Also, I think I would like to move the code you added to 
>> G1CMParDrainMarkingStackClosure::do_void() into 
>> ConcurrentMark::weakRefsWork() somewhere. Maybe something like:
>>
>> if (!rp->processing_is_mt()) {
>>   set_phase(1, false /* concurrent */);
>> }
>>
>> It is a bit strange to me that G1CMParDrainMarkingStackClosure should 
>> set this up. If we really want to keep it in 
>> G1CMParDrainMarkingStackClosure I think the constructor would be a 
>> better place to do it than do_void().
>
> Setting it once in weakRefsWork() will not be sufficient. We will run 
> into an assertion failure in ParallelTaskTerminator::offer_termination().
>
> During the reference processing, the do_void() method of the 
> complete_gc oop closure (in our case the complete gc oop closure is an 
> instance of G1CMParDrainMarkingStackClosure) is called multiple times 
> (in process_phase1, sometimes process_phase2, process_phase3, and 
> process_phaseJNI)
>
> Setting the phase sets the number of active tasks (or threads) that 
> the termination protocol in do_marking_step() will wait for. When an 
> invocation of do_marking_step() offers termination, the number of 
> tasks/threads in the terminator instance is decremented. So Setting 
> the phase once will let the first execution of do_marking_step (with 
> termination) from process_phase1() succeed, but subsequent calls to 
> do_marking_step() will result in the assertion failure.
>
> We also can't unconditionally set it in the do_void() method or even 
> the constructor of G1CMParDrainMarkingStackClosure. Separate instances 
> of this closure are created by each of the worker threads in the MT-case.
>
> Note when processing is multi-threaded the complete_gc instance used 
> is the one passed into the ProcessTask's work method (passed into 
> process_discovered_references() using the task executor instance) 
> which may not necessarily be the same complete gc instance as the one 
> passed directly into process_discovered_references().

Thanks for this detailed explanation. It really helped!

I understand the issue now, but I still think it is very confusing that 
_cm->set_phase() is called from G1CMRefProcTaskExecutor::execute() in 
the multithreaded case and from 
G1CMParDrainMarkingStackClosure::do_void() in the single threaded case.

> It might be possible to record whether processing is MT in the 
> G1CMRefProcTaskExecutor class and always pass the executor instance 
> into process_discovered_references. We could then set processing to MT 
> so that the execute() methods in the executor instance are invoked but 
> call the Proxy class' work method directly. Then we could override the 
> set_single_threaded() routine (called just before process_phaseJNI) to 
> set the phase.

I think this would be a better solution, but if I understand it 
correctly it would mean that we would have to change all the collectors 
to always pass a TaskExecutor. All of them currently pass NULL in the 
non-MT case. I think it would be simpler if they always passed a 
TaskExecutor but it is a pretty big change.

Another possibility is to introduce some kind of prepare method to the 
VoidClosure (or maybe in a specialized subclass for ref processing). 
Then we could do something like:

   complete_gc->prologue();
   if (mt_processing) {
     RefProcPhase2Task phase2(*this, refs_lists, !discovery_is_atomic() 
/*marks_oops_alive*/);
     task_executor->execute(phase2);
   } else {
     for (uint i = 0; i < _max_num_q; i++) {
       process_phase2(refs_lists[i], is_alive, keep_alive, complete_gc);
     }
   }

G1CMParDrainMarkingStackClosure::prologue() could do the call to 
_cm->set_phase(). And G1CMRefProcTaskExecutor::execute() would not have 
to do it.

BTW, not really part of your change, but above code is duplicated three 
times in ReferenceProcessor::process_discovered_reflist(). Would be nice 
to factor this out to a method.

Thanks,
Bengt

>
>
> I don't think that would be any clearer.
>
> Perhaps a better name for the _is_par flag would be _processing_is_mt 
> (and set it using ReferenceProcessor::processing_is_mt())?
>> Some minor comments:
>>
>> In G1CMParKeepAliveAndDrainClosure and 
>> G1CMParDrainMarkingStackClosure constructors there is this assert:
>>
>> assert(_task->worker_id() == 0 || _is_par, "sanity");
>>
>> I think it is good, but I had to think a bit about what it meant and 
>> to me I think it would be quicker to understand if it was the other 
>> way around:
>>
>> assert(_is_par || _task->worker_id() == 0, "Only worker 0 should be 
>> used if single threaded");
>
> Sure no problem. Done.
>
>> Remove newline on line 2254 ?
>
> Sure. What about the blank lines on 2187 and 2302 for consistency?
>
>> ConcurrentMark::weakRefsWork()
>>
>> How about introducing a variable that either hold the value of 
>> &par_task_executor or NULL depending on rp->processing_is_mt()? That 
>> way we don't have to duplicate and inline this test twice:
>>
>> (rp->processing_is_mt() ? &par_task_executor : NULL)
>
> OK.
>
>> As a separate change it might be worth renaming the closures to not 
>> have "Par" in the name, now that they are not always parallel...
>>
>> G1CMParKeepAliveAndDrainClosure -> G1CMKeepAliveAndDrainClosure
>> G1CMParDrainMarkingStackClosure -> G1CMDrainMarkingStackClosure
>>
>> But it would be very confusing to do this in the same change.
>
> I think we can change the names. The change shouldn't be that confusing.
>
> Thanks. A new webrev will appear shortly.
>
> JohnC