RFR: 8008916 G1: Evacuation failed tracing event

Tue Apr 9 00:50:58 UTC 2013

Hi Jesper,

A quick question before going on with the actual review....

Do you need the native thread ID in the event or just the worker IDs?

As you have seen tying something to specific worker native IDs across 
multiple gang tasks is not easy. If you actually need the native thread 
IDs then I'm not sure what the best approach is. The most obvious is to 
have some sequential number in each worker that is fixed for the life of 
that worker (i.e. the life the JVM) and use that value to index into the 
evacuation_failed_info_array. Fortunately, I think we have that - it's 
the WorkerThread::_id field:

>   for (uint worker = 0; worker < total_workers(); worker += 1) {
>     GangWorker* new_worker = allocate_worker(worker);

and:

> GangWorker* WorkGang::allocate_worker(uint which) {
>   GangWorker* new_worker = new GangWorker(this, which);
>   return new_worker;
> }
> GangWorker::GangWorker(AbstractWorkGang* gang, uint id) {
>   _gang = gang;
>   set_id(id);
>   set_name("Gang worker#%d (%s)", id, gang->name());
> }

So instead of associating the entries in the 
evacuation_failed_info_array with worker_id/queue_num, they are 
associated with Thread::current()->as_Worker_thread()->id(). Note if 
ParallelGCThreads=0 then the worker threads are not created but you are 
using ParallelGCThreads as the index for the VM thread - that's OK (but 
is a break with convention). You comment it (and why).

Now on to the actual review:

g1CollectedHeap.hpp:

Line 888:
I think you can make this an array of EvacFailureInfo instances rather 
than pointers. I think it's safer to do this. Once you establish the 
mapping between worker thread and EvacFailureInfo instance - you don't 
want it to change.

Line 930:
The change is unnecessary - but it's OK to leave it.

Line 1817:
Again since the mapping between worker thread and evac failed info 
instance is fixed, there's no need to embed a pointer to the object into 
the PSS but it's OK to leave it but I would initialize in the 
constructor, i.e. have something like the following in the PSS constructor:

// The value of native_id is used to index into the 
evacuation_failed_info array
Thread* me = Thread::current();
uint native_id = 0;
if (me->is_Worker_thread()) {
   native_id = me->as_Worker_thread()->id();
} else {
   assert(me->is_VMThread(), "who else?");
   native_id = ParallelGCThreads;
}
_evac_failed_info = _g1h->evacuation_failed_instance_for(native_id);

Alternatively store the native worker ID as well as queue_num in the PSS 
- you can use that to access the instance.

Line 1863:
This line is unnecessary. You no longer need the routine.

Line 1951:
This line may be unnecessary. You could return the _native_id stored in 
the PSS and fetch the evac_failed_info object for that _native_id from 
the array. I'll leave it up to you.

g1CollectedHeap.cpp

Line 1970 - 1976
I really don't like that you  allocate ParallelGCThreads+1 entries - I 
would prefer to use n_queues but I can see why you need to - if parallel 
reference processing is not enabled. The convention is that if 
ParallelGCThreads==0, n_queues will be 1 and we use queue[0] and 
worker_id 0. I want lots of comments here since you're breaking with 
convention.

Fold the initialization of the evac_failed_info array into one of the 
other loops or fold all three loops into one:

> 1954   for (int i = 0; i < n_queues; i++) {
> 1955     iter_arr[i] = new HeapRegionRemSetIterator();
> 1956   }

> 1962   for (int i = 0; i < n_queues; i++) {
> 1963     RefToScanQueue* q = new RefToScanQueue();
> 1964     q->initialize();
> 1965     _task_queues->register_queue(i, q);
> 1966   }
> 1971   for (uint n = 0; n < ParallelGCThreads; n++) {
> 1972     _evacuation_failed_info_array[n] = NULL;
> 1973   }

I think you also want to allocate/reset/initialize the evac_failure_info 
objects. You could also bind to the worker thread:

if (G1CollectedHeap::use_parallel_gc_threads()) {
   assert(workers()->total_workers == n_queues(), "check this code 
otherwise");
   for (uint k = 0; k < workers()->total_workers()l k += 1) {
     GangWorker *worker = workers()->gang_worker(k);
     uint id = worker->id();
     _evac_failed_info[id].bind_to_thread(worker);
   }
}

// The following is unconditional.
  _evac_failed_info[ParallelGCThreads].bind_to_thread(VMThread::vm_thread());

But binding as the first thing in G1ParTask::work() would be OK too.

Line 3813:
You don't need  to do this here. It can be done in G1ParTask::work() or 
during initialization. Again lot's of comments about why the 
evac_failure_info instance for the VMThread is not encoded at index 0.

4051-4056:
Use the use_parallel_gc_threads idiom:

if (G1CollectedHeap::use_parallel_gc_threads()) {
   // Since there are multiple parallel tasks where an evacuation 
failure could occur
   // and (in the future) each task may have been executed by a 
different number of workers,
   // we need to check them all.
   for (uint i = 0; i < workers()->total_workers(); i += 1) {
     ....
   }
}
// Now check the VM thread as it could have seen an evacuation failure 
during serial
// reference processing
...

Line 4480:
This line may not be necessary.

Lines 4507-4533:
Routine may not be necessary. The loop in particular:

> 4521     while (_evacuation_failed_info == NULL) {
> 4522       EvacuationFailedInfo* current = _g1h->_evacuation_failed_info_array[n];
> 4523       assert(current != NULL, "All info objects should have been allocated already");
> 4524       if (current->thread() == Thread::current()->osthread()) {
> 4525         _evacuation_failed_info = current;
> 4526       } else {
> 4527         n++;
> 4528         assert(n <= ParallelGCThreads, "There shouldn't be more threads than ParallelGThreads + the main GC thread");
> 4529       }
> 4530     }

is a big concern.This is an O(n) lookup for each of n threads, for each 
of the distinct parallel tasks (3).

The other files look OK.

JohnC

On 4/6/2013 11:54 PM, Jesper Wilhelmsson wrote:
> Bengt, all,
>
> A new webrev is now available:
>
> http://cr.openjdk.java.net/~jwilhelm/8008916/webrev/
>
> The main difference is that now we have an array in G1CollectedHeap 
> that keeps track the EvacuationFailedInfo-objects between the 
> different parallel parts of G1 so that each thread only sends a single 
> EvacuationFailed event in the end.
>
> To achieve this some logic was added to find the correct info object 
> in the array since the threads does not have the same queue number 
> each time.
>
> Thanks,
> /Jesper
>
>
> Bengt Rutisson skrev 28/3/13 10:20 PM:
>>
>> Jesper,
>>
>> This looks much better!
>>
>> Have you tested it to see that you get a maximum of one event per GC 
>> thread and
>> GC? It looks to me like you risk getting multiple events sent for 
>> each GC thread
>> every GC.
>>
>> You send the events in the G1ParScanThreadState destructor. I am not 
>> that
>> familiar with G1ParScanThreadState but it looks to me like we create new
>> G1ParScanThreadState instances for the different phases of a GC. Here 
>> are the
>> places where we create new instances of G1ParScanThreadState:
>>
>> G1CollectedHeap::process_discovered_references()
>> G1ParPreserveCMReferentsTask::work()
>> G1ParTask::work()
>> G1STWRefProcTaskProxy::work()
>>
>> So, it looks to me like we risk getting 4 events sent per thread and GC.
>>
>>
>> Some minor comments:
>>
>> In G1CollectedHeap::handle_evacuation_failure_par() you pass the 
>> ef_info on to
>> G1CollectedHeap::handle_evacuation_failure_common(). But the ef_info 
>> is now
>> thread local so you don't have to take the lock to use it. One way to 
>> simplify
>> this would be to do "ef_info->register_copy_failure(old->size());" in
>> G1CollectedHeap::handle_evacuation_failure_par() just before "if
>> (_evac_failure_closure != cl)" instead. Then you don't have to pass 
>> it on to
>> handle_evacuation_failure_common().
>>
>> (As a side note, I don't think setting _evacuation_failed to true has 
>> to be
>> protected by the lock either. It is a shared variable but all writes 
>> to it from
>> different threads will be to set it to true, so I think racing on it 
>> is benign.
>> But you have it in the same place as before, which might be good.)
>>
>> In G1ParCopyClosure::copy_to_survivor_space() you are now picking out 
>> two thing
>> from the _par_scan_state and passing them on to
>> G1CollectedHeap::handle_evacuation_failure_par(). Have you considered 
>> passing a
>> reference to _par_scan_state instead and let 
>> handle_evacuation_failure_par()
>> extract the data it needs?
>>
>> If you move the implementation of the destructor of 
>> G1ParScanThreadState in to
>> the cpp file you don't have to add the include for gcTrace.hpp to
>> g1CollectedHeap.hpp.
>>
>> Why did you decide to remove set_evactuation_failed()?
>>
>> Finally, you have several quite unrelated spelling corrections in 
>> this change.
>> This makes it a bit hard to review. I'm not sure I think it is worth 
>> the extra
>> review time to get those fixed.
>>
>> Bengt
>>
>>
>> On 3/28/13 7:50 PM, Jesper Wilhelmsson wrote:
>>> Hi,
>>>
>>> A new version of the evacuation failed event for G1. Now, each 
>>> thread sends an
>>> event with the info it has collected during the collection in the 
>>> same way as
>>> the other collectors do with promotion failed events.
>>>
>>> Webrev: http://cr.openjdk.java.net/~jwilhelm/8008916/webrev.2/
>>>
>>> Thanks,
>>> /Jesper
>>