RFR: 8008916 G1: Evacuation failed tracing event

Bengt Rutisson bengt.rutisson at oracle.com
Tue Apr 16 13:30:02 UTC 2013


Hi Jesper,

On 4/16/13 2:58 PM, Jesper Wilhelmsson wrote:
> Hi Bengt,
>
> Thanks for the review!
>
>
> Bengt Rutisson skrev 16/4/13 2:12 PM:
>>
>> Hi Jesper,
>>
>> A couple of comments:
>>
>> As I said before I don't like the lazy allocation of the 
>> EvacutaionFailedInfo
>> objects. I would prefer to allocate all the objects the G1CollectedHeap
>> constructor. This would make the code much simpler. I know you are 
>> worried about
>> false sharing, but I don't think the current approach reduces the 
>> risk of false
>> sharing. If it really is a problem in the evacuation failed case 
>> (which I am not
>> convinced of) you should pad the EvacuationFailedInfo objects instead.
>
> False sharing is an issue even if it only happens when evacuation 
> fails. As we have seen in several tests and benchmarks, once one 
> thread fails to evacuate it is quite likely that other threads will 
> follow, and the way G1 works today the GC is not aborted in the case 
> of a failed evacuation, but continues to try to evacuate the rest of 
> the live objects. This often results in thousands of failed object 
> evacuations in a single GC.
>
> And, sure, we hope that a production system won't experience that many 
> evacuation failures, but do we really want that to be a show stopper 
> for pause sensitive applications?
>
> The current approach will reduce the risk of false sharing on normal 
> hardware since each thread allocates its own data structure, and it 
> will lower the risk of false sharing even more on NUMA architectures 
> when the data structures are allocated on different nodes.

The current approach does not reduce the risk of false sharing. You 
allocate the EvacuationFailedInfo objects as part of the 
G1ParScanThreadStates which means that you might get false sharing 
between the EvacuationFailedInfo for one thread and the 
G1ParScanThreadStates for the next thread. All you have achieved is make 
the code more complex.

You are also not solving the NUMA issue by allocating these thread local 
since, as you know, the EvacuationFailedInfo objects are not thread 
local. That is the whole reason why you weren't able to get the thread 
field right. And we also don't have any NUMA support so I don't really 
know what the issue is.

>
> Padding is not an option since we are actively working on reducing 
> footprint, and it will not solve the problem on NUMA.

I think padding is the only stable option and I disagree that it is too 
much of a footprint issue. We already do that a lot in the exact code 
you are changing. See PADDING_ELEM_NUM.

>
>> I would prefer that we always look the EvacutaionFailedInfo objects 
>> up in the
>> G1CollectedHeap::_evacuation_failed_info_array rather than to include 
>> them in
>> the G1ParScanThreadState.
>>
>> So, do "_g1h->_evacuation_failed_info_array[_queue_num]" instead of
>> "_par_scan_state->evacuation_failed_info()" in
>> G1CollectedHeap::handle_evacuation_failure_par(). That means that you 
>> don't have
>> to change G1ParScanThreadState much at all. No extra fields and no 
>> changes to
>> the method declarations.
>>
>> I think this is more future proof also, since the next step is 
>> probably to add a
>> separate data structure to allow us to look up EvacuationFailedInfo 
>> objects
>> based on the thread id.
>
> I think it is unlikely that we will add a new global data structure to 
> solve the thread problem. It is more likely that we will have the 
> EvacuationFailedInfo object as a local part of the 
> G1ParScanThreadState and make sure that the same thread gets the same 
> G1ParScanThreadState object each time. This would be the most 
> efficient solution. Again, think NUMA.

Again, I disagree. I definitely think we should add the thread field to 
the EvacuationFailedInfo and short term I think the way to do that is to 
look it up based on the thread id. The footprint and performance 
overheads are minimal.

>> I would also prefer that EvacuationFailedInfo inherits from CHeapObj 
>> instead of
>> VALUE_OBJ_CLASS_SPEC. That way you can use the normal new operator 
>> instead of
>> placement new.
>
> Is CHeapObj really an option? EvacuationFailedInfo inherits from 
> CopyFailedInfo which is also the superclass of PromotionFailedInfo. 
> PromotionFailedInfo is stack allocated in the other collectors.

What is the problem with making CopyFailedInfo inherit from CHeapObj? A 
CHeapObj can still be stack allocated.

>
> Placement new is used in several places already in the HotSpot code. I 
> don't think it is less normal than the "normal" new operator. 

I just think it is strange that you use placement new to work around the 
fact that you have specified the allocation class to disallow new. Why 
not just admit that the object is intended to be dynamically allocated 
and make it a CHeapObj?

> But I don't have a strong opinion here. If CHeapObj can be stack 
> allocated (which would be a bit weird I think) then I can use that 
> approach instead.

Yes, I you can.

Bengt

>
> Thanks,
> /Jesper
>
>
>>
>> Bengt
>>
>>
>> On 4/16/13 12:01 PM, Jesper Wilhelmsson wrote:
>>> Hi,
>>>
>>> After several different versions and realizations about the worker 
>>> threads in
>>> G1, here is now a new version of the evacuation failed event.
>>>
>>> To include the correct thread with each event has turned out to 
>>> require more
>>> work and footprint overhead than what can be motivated at this 
>>> point, and
>>> since any other solution would be confusing rather than informative, 
>>> we have
>>> decided not to include the thread information in the evacuation 
>>> failed event
>>> for now.
>>>
>>> The thread information was never a requirement in the first place, 
>>> we just
>>> threw it in there in the promotion failed event because it was easily
>>> available in the parallel collectors and someone might be able to 
>>> use it for
>>> debugging at some point.
>>>
>>> An RFE [1] has been created to remind us to look at this again for G1.
>>>
>>> The updated webrev is here:
>>> http://cr.openjdk.java.net/~jwilhelm/8008916/webrev/
>>>
>>> Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8008916
>>>
>>> [1]: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8012217
>>>
>>> Thanks!
>>> /Jesper
>>>
>>> Jesper Wilhelmsson skrev 28/3/13 7:50 PM:
>>>> Hi,
>>>>
>>>> A new version of the evacuation failed event for G1. Now, each 
>>>> thread sends an
>>>> event with the info it has collected during the collection in the 
>>>> same way as
>>>> the other collectors do with promotion failed events.
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~jwilhelm/8008916/webrev.2/
>>>>
>>>> Thanks,
>>>> /Jesper
>>




More information about the hotspot-gc-dev mailing list