RFR: 8008916 G1: Evacuation failed tracing event
Bengt Rutisson
bengt.rutisson at oracle.com
Mon Apr 8 12:35:31 UTC 2013
Hi Jesper,
I'm a bit worried about
G1ParScanThreadState::setup_evacuation_failed_info().
First, you will never enter the first case "if
(_g1h->_evacuation_failed_info_array[queue_num] == NULL)" since you have
set up _evacuation_failed_info_array completely in the G1CollectedHeap
constructor.
The rest of the method tries to handle the fact that a worker thread
does not always have the same worker_id (or queue_num). But to handle
this with an O(N2) search does not seem to scale. If we should have this
logic we need to find a better data structure for the thread->worker_id
mapping, such as a hash table.
Having said that I am not convinced that this is a problem worth
solving. What I mean is, does it really matter to the user which thread
actually got the evacuation failure or is it enough that we report all
occurrences of evacuation failures that threads had? Rather than having
a complex mapping from thread->worker_id, I think I would prefer to
either just use the first thread that got a failure or to skip the
thread information all together and just have a worker_id in the event.
This way you probably don't have to propagate EvacuationFailedInfo
through the _par_scan_state at all.
G1CollectedHeap::handle_evacuation_failure_par() can look up the
EvacuationFailedInfo directly from the
G1CollectedHeap::_evacuation_failed_info_array using the _worker_id.
If it is really important to have the correct thread information in
there I would prefer to do what I suggested earlier; possibly send
several events per thread by doing what your last webrev did: send the
event in the G1ParScanThreadState destructor. But add information about
when this evacuation failure happened. This way you get the thread right
and you give the user some information to work with.
Some minor comments:
G1CollectedHeap::G1CollectedHeap()
If _evacuation_failed_info_array was a EvacuationFailedInfo* rather than
a EvacuationFailedInfo** and you initialize it as:
_evacuation_failed_info_array = NEW_C_HEAP_ARRAY(EvacuationFailedInfo,
ParallelGCThreads + 1, mtGC);
you probably don't need the rest of the initialization code on lines
1971-1976.
G1CollectedHeap::do_collection_pause_at_safepoint()
why do we need this line?
3813
_evacuation_failed_info_array[ParallelGCThreads]->bind_to_thread(Thread::current()->osthread());
gcTrace.hpp: report_evacuation_failed(const EvacuationFailedInfo& cf_info);
can G1CollectedHeap::_evacuation_failed_info_array be private rather
than protected?
can G1ParScanThreadState::_evacuation_failed_info be private rather than
protected?
Finally, just a thought, do we need to somehow guard the setup of
_evacuation_failed_info_array with whether or not the event is enabled?
On a large machine it can become a fairly large data structure. Not
really sure what is reasonable to do here.
Thanks,
Bengt
On 4/7/13 8:54 AM, Jesper Wilhelmsson wrote:
> Bengt, all,
>
> A new webrev is now available:
>
> http://cr.openjdk.java.net/~jwilhelm/8008916/webrev/
>
> The main difference is that now we have an array in G1CollectedHeap
> that keeps track the EvacuationFailedInfo-objects between the
> different parallel parts of G1 so that each thread only sends a single
> EvacuationFailed event in the end.
>
> To achieve this some logic was added to find the correct info object
> in the array since the threads does not have the same queue number
> each time.
>
> Thanks,
> /Jesper
>
>
> Bengt Rutisson skrev 28/3/13 10:20 PM:
>>
>> Jesper,
>>
>> This looks much better!
>>
>> Have you tested it to see that you get a maximum of one event per GC
>> thread and
>> GC? It looks to me like you risk getting multiple events sent for
>> each GC thread
>> every GC.
>>
>> You send the events in the G1ParScanThreadState destructor. I am not
>> that
>> familiar with G1ParScanThreadState but it looks to me like we create new
>> G1ParScanThreadState instances for the different phases of a GC. Here
>> are the
>> places where we create new instances of G1ParScanThreadState:
>>
>> G1CollectedHeap::process_discovered_references()
>> G1ParPreserveCMReferentsTask::work()
>> G1ParTask::work()
>> G1STWRefProcTaskProxy::work()
>>
>> So, it looks to me like we risk getting 4 events sent per thread and GC.
>>
>>
>> Some minor comments:
>>
>> In G1CollectedHeap::handle_evacuation_failure_par() you pass the
>> ef_info on to
>> G1CollectedHeap::handle_evacuation_failure_common(). But the ef_info
>> is now
>> thread local so you don't have to take the lock to use it. One way to
>> simplify
>> this would be to do "ef_info->register_copy_failure(old->size());" in
>> G1CollectedHeap::handle_evacuation_failure_par() just before "if
>> (_evac_failure_closure != cl)" instead. Then you don't have to pass
>> it on to
>> handle_evacuation_failure_common().
>>
>> (As a side note, I don't think setting _evacuation_failed to true has
>> to be
>> protected by the lock either. It is a shared variable but all writes
>> to it from
>> different threads will be to set it to true, so I think racing on it
>> is benign.
>> But you have it in the same place as before, which might be good.)
>>
>> In G1ParCopyClosure::copy_to_survivor_space() you are now picking out
>> two thing
>> from the _par_scan_state and passing them on to
>> G1CollectedHeap::handle_evacuation_failure_par(). Have you considered
>> passing a
>> reference to _par_scan_state instead and let
>> handle_evacuation_failure_par()
>> extract the data it needs?
>>
>> If you move the implementation of the destructor of
>> G1ParScanThreadState in to
>> the cpp file you don't have to add the include for gcTrace.hpp to
>> g1CollectedHeap.hpp.
>>
>> Why did you decide to remove set_evactuation_failed()?
>>
>> Finally, you have several quite unrelated spelling corrections in
>> this change.
>> This makes it a bit hard to review. I'm not sure I think it is worth
>> the extra
>> review time to get those fixed.
>>
>> Bengt
>>
>>
>> On 3/28/13 7:50 PM, Jesper Wilhelmsson wrote:
>>> Hi,
>>>
>>> A new version of the evacuation failed event for G1. Now, each
>>> thread sends an
>>> event with the info it has collected during the collection in the
>>> same way as
>>> the other collectors do with promotion failed events.
>>>
>>> Webrev: http://cr.openjdk.java.net/~jwilhelm/8008916/webrev.2/
>>>
>>> Thanks,
>>> /Jesper
>>
More information about the hotspot-gc-dev
mailing list