RFR(S): 7041879: G1: introduce stress testing parameter to cause frequent evacuation failures

Wed Aug 22 06:48:23 UTC 2012

Hi John,

On 2012-08-22 00:42, John Cuthbertson wrote:
> Hi Bengt,
>
> Thanks for reviewing these changes. Detailed responses are inline....

Thanks for your comments! With your proposed changes I think it looks 
good. Ship it!

Bengt

>
> On 08/16/12 03:48, Bengt Rutisson wrote:
>>
>> Hi John,
>>
>> Overall I think this looks fine. It's definitely a good stress option 
>> to have. Thanks for fixing this!
>>
>> Functionally it looks good to me, but I have some style questions:
>>
>> * Why do we add the new methods as inline methods in the 
>> g1CollectedHeap.inline.hpp file? These are only debug methods and 
>> they are only used from g1CollectedHeap.cpp. I would prefer to place 
>> the implementations in g1CollectedHeap.cpp and let the compiler 
>> decide whether to inline or not.
>
> I followed the style of PromotionFailureALot. This is speculation...
>
> The code would be included in non-product builds (which includes jvmg, 
> fastdebug, and optimized). If we are using fastdebug and/or optimized 
> builds of hotspot to chase down a timing sensitive failure then adding 
> overhead to the fast path of the copy code might affect our attempts 
> to reproduce the failure. I don't think we can assume that all 
> compilers will inline routines that are in the same cpp file (IIRC the 
> Solaris Studio compiler only does it at certain optimization levels 
> _and_ if it has already seen the definition of the callee routine). So 
> by declaring the routines as "inline" and placing them in an include 
> file, we don't need to rely upon the compiler (and the optimization 
> level) to keep the overhead of the code as low as we can. As I said 
> though, this is speculation.
>>
>> * In the constructor for G1CollectedHeap I think we could call 
>> reset_evacuation_should_fail() instead of these lines:
>>
>> #ifndef PRODUCT
>>   _evacuation_failure_alot_for_current_gc = false;
>>   _evacuation_failure_alot_count = 0;
>>   _evacuation_failure_alot_gc_number = 0;
>> #endif // !PRODUCT
>
> Good observation. Done.
>
>>
>>
>> * G1CollectedHeap::evacuation_should_fail()
>> Why do we need evacuation_should_fail(volatile size_t* count)? It is 
>> always called with _evacuation_failure_alot_count as parameter. How 
>> about using that directly instead of passing it as a parameter? Also, 
>> the nesting level is getting three levels deep. We could flatten that 
>> by bailing out early if we are not triggering evacuation failures. 
>> So, I think I would prefer evacuation_should_fail() to look something 
>> like:
>>
>> bool G1CollectedHeap::evacuation_should_fail() {
>>   if (!G1EvacuationFailureALot || 
>> !_evacuation_failure_alot_for_current_gc) {
>>     return false;
>>   }
>>   // Access to _evacuation_failure_alot_count is not atomic; the 
>> value does not have to be exact.
>>   if (++_evacuation_failure_alot_count >= 
>> G1EvacuationFailureALotCount) {
>>     _evacuation_failure_alot_count = 0;
>>     return true;
>>   }
>>   return false;
>> }
>>
>
> OK. How about also testing the negative of the second condition:
>
> inline bool
> G1CollectedHeap::evacuation_should_fail() {
>  if (!G1EvacuationFailureALot || 
> !_evacuation_failure_alot_for_current_gc) {
>    return false;
>  }
>  // G1EvacuationFailureALot is in effect for current GC
>  // Access to _evacuation_failure_alot_count is not atomic;
>  // the value does not have to be exact.
>  if (++_evacuation_failure_alot_count < G1EvacuationFailureALotCount) {
>    return false;
>  }
>  _evacuation_failure_alot_count = 0;
>  return true;
> }
>
> ?
>
>>
>> * G1CollectedHeap::evacuation_failure_alot_for_gc_type() and 
>> G1CollectedHeap::set_evacuation_failure_alot_for_current_gc()
>> I am not sure that the method evacuation_failure_alot_for_gc_type() 
>> actually makes the code easier to understand. I think I would prefer 
>> to inline that logic in set_evacuation_failure_alot_for_current_gc(). 
>> Something like:
>>
>> void G1CollectedHeap::set_evacuation_failure_alot_for_current_gc() {
>>   if (G1EvacuationFailureALot) {
>>     // Note we can't assert that _evacuation_failure_alot_for_current_gc
>>     // is clear here. It may have been set during a previous GC but 
>> that GC
>>     // did not copy enough objects (i.e. 
>> G1EvacuationFailureALotCount) to
>>     // trigger an evacuation failure and clear the flags and and counts.
>>     _evacuation_failure_alot_for_current_gc = false;
>>
>>     // Check if we have gone over the interval.
>>     const size_t gc_num = total_collections();
>>     const size_t elapsed_gcs = gc_num - 
>> _evacuation_failure_alot_gc_number;
>>
>>     if (elapsed_gcs >= G1EvacuationFailureALotInterval) {
>>       const bool gcs_are_young = g1_policy()->gcs_are_young();
>>       const bool during_im = g1_policy()->during_initial_mark_pause();
>>       const bool during_marking = mark_in_progress();
>>
>>       if (gcs_are_young) {
>>         _evacuation_failure_alot_for_current_gc = 
>> G1EvacuationFailureALotDuringYoungGC
>>           || (during_marking && G1EvacuationFailureALotDuringConcMark)
>>           || (during_im && G1EvacuationFailureALotDuringInitialMark);
>>       } else {
>>         _evacuation_failure_alot_for_current_gc = 
>> G1EvacuationFailureALotDuringMixedGC;
>>       }
>>     }
>> }
>
> I disagree. The conditional statement above is more complicated IMO. 
> In the current version you don't need to know that during_initial_mark 
> will be true iff gcs_are_young is true and so you don't need to encode 
> that in a nested if. It's sufficient say for an initial mark pause - 
> check the value of G1EvacuationFailureALotDuringInitialMark.
>
>>
>> * If we keep any changes in g1CollectedHeap.inline.hpp it should get 
>> its copyright year updated
>
> Oops. Thanks. Done.
>
> JohnC