RFR(S): 7041879: G1: introduce stress testing parameter to cause frequent evacuation failures

Thu Aug 16 10:48:09 UTC 2012

Hi John,

Overall I think this looks fine. It's definitely a good stress option to 
have. Thanks for fixing this!

Functionally it looks good to me, but I have some style questions:

* Why do we add the new methods as inline methods in the 
g1CollectedHeap.inline.hpp file? These are only debug methods and they 
are only used from g1CollectedHeap.cpp. I would prefer to place the 
implementations in g1CollectedHeap.cpp and let the compiler decide 
whether to inline or not.

* In the constructor for G1CollectedHeap I think we could call 
reset_evacuation_should_fail() instead of these lines:

#ifndef PRODUCT
   _evacuation_failure_alot_for_current_gc = false;
   _evacuation_failure_alot_count = 0;
   _evacuation_failure_alot_gc_number = 0;
#endif // !PRODUCT

* G1CollectedHeap::evacuation_should_fail()
Why do we need evacuation_should_fail(volatile size_t* count)? It is 
always called with _evacuation_failure_alot_count as parameter. How 
about using that directly instead of passing it as a parameter? Also, 
the nesting level is getting three levels deep. We could flatten that by 
bailing out early if we are not triggering evacuation failures. So, I 
think I would prefer evacuation_should_fail() to look something like:

bool G1CollectedHeap::evacuation_should_fail() {
   if (!G1EvacuationFailureALot || 
!_evacuation_failure_alot_for_current_gc) {
     return false;
   }
   // Access to _evacuation_failure_alot_count is not atomic; the value 
does not have to be exact.
   if (++_evacuation_failure_alot_count >= G1EvacuationFailureALotCount) {
     _evacuation_failure_alot_count = 0;
     return true;
   }
   return false;
}

* G1CollectedHeap::evacuation_failure_alot_for_gc_type() and 
G1CollectedHeap::set_evacuation_failure_alot_for_current_gc()
I am not sure that the method evacuation_failure_alot_for_gc_type() 
actually makes the code easier to understand. I think I would prefer to 
inline that logic in set_evacuation_failure_alot_for_current_gc(). 
Something like:

void G1CollectedHeap::set_evacuation_failure_alot_for_current_gc() {
   if (G1EvacuationFailureALot) {
     // Note we can't assert that _evacuation_failure_alot_for_current_gc
     // is clear here. It may have been set during a previous GC but that GC
     // did not copy enough objects (i.e. G1EvacuationFailureALotCount) to
     // trigger an evacuation failure and clear the flags and and counts.
     _evacuation_failure_alot_for_current_gc = false;

     // Check if we have gone over the interval.
     const size_t gc_num = total_collections();
     const size_t elapsed_gcs = gc_num - _evacuation_failure_alot_gc_number;

     if (elapsed_gcs >= G1EvacuationFailureALotInterval) {
       const bool gcs_are_young = g1_policy()->gcs_are_young();
       const bool during_im = g1_policy()->during_initial_mark_pause();
       const bool during_marking = mark_in_progress();

       if (gcs_are_young) {
         _evacuation_failure_alot_for_current_gc = 
G1EvacuationFailureALotDuringYoungGC
           || (during_marking && G1EvacuationFailureALotDuringConcMark)
           || (during_im && G1EvacuationFailureALotDuringInitialMark);
       } else {
         _evacuation_failure_alot_for_current_gc = 
G1EvacuationFailureALotDuringMixedGC;
       }
     }
}

* If we keep any changes in g1CollectedHeap.inline.hpp it should get its 
copyright year updated

Thanks,
Bengt

On 2012-07-19 21:17, John Cuthbertson wrote:
> Hi Everyone,
>
> Can I have couple of volunteers to review the changes for this CR? The 
> webrev can be found at: 
> http://cr.openjdk.java.net/~johnc/7041879/webrev.0/
>
> Summary:
> The changes adds mechanism, which is only available in non-product 
> builds, for forcing evacuation failures. The mechanism is analogous to 
> the PromotionFailureALot functionality in the other collectors. It is 
> enabled with the G1EvacuationFailureALot flag and is controlled using 
> sub-flags: G1EvacuationFailureALotCount and 
> G1EvacuationFailureALotInterval control the frequency and number of 
> evacuation failures, and G1EvacuationFailureALotDuringYoungGC, 
> G1EvacuationFailureALotDuringMixedGC, 
> G1EvacuationFailureALotDuringInitialMark, and G1EvacuationFailureALot 
> control during type of GC we should/can trigger evacuation failures.
>
> An earlier variant of this patch was previously employed to stress 
> test Tony's marking changes.
>
> Testing:
> Dacapo/xalan to test the various flags and settings. GC test suite and 
> jprt with G1EvacuationFailureALot and heap verification enabled (the 
> sub-flags for G1EvacuationFailureALot were left at their default values).
>
> Thanks,
>
> JohnC