RFR(S): 7041879: G1: introduce stress testing parameter to cause frequent evacuation failures

Tue Aug 21 22:42:04 UTC 2012

Hi Bengt,

Thanks for reviewing these changes. Detailed responses are inline....

On 08/16/12 03:48, Bengt Rutisson wrote:
>
> Hi John,
>
> Overall I think this looks fine. It's definitely a good stress option 
> to have. Thanks for fixing this!
>
> Functionally it looks good to me, but I have some style questions:
>
> * Why do we add the new methods as inline methods in the 
> g1CollectedHeap.inline.hpp file? These are only debug methods and they 
> are only used from g1CollectedHeap.cpp. I would prefer to place the 
> implementations in g1CollectedHeap.cpp and let the compiler decide 
> whether to inline or not.

I followed the style of PromotionFailureALot. This is speculation...

The code would be included in non-product builds (which includes jvmg, 
fastdebug, and optimized). If we are using fastdebug and/or optimized 
builds of hotspot to chase down a timing sensitive failure then adding 
overhead to the fast path of the copy code might affect our attempts to 
reproduce the failure. I don't think we can assume that all compilers 
will inline routines that are in the same cpp file (IIRC the Solaris 
Studio compiler only does it at certain optimization levels _and_ if it 
has already seen the definition of the callee routine). So by declaring 
the routines as "inline" and placing them in an include file, we don't 
need to rely upon the compiler (and the optimization level) to keep the 
overhead of the code as low as we can. As I said though, this is 
speculation.
>
> * In the constructor for G1CollectedHeap I think we could call 
> reset_evacuation_should_fail() instead of these lines:
>
> #ifndef PRODUCT
>   _evacuation_failure_alot_for_current_gc = false;
>   _evacuation_failure_alot_count = 0;
>   _evacuation_failure_alot_gc_number = 0;
> #endif // !PRODUCT

Good observation. Done.

>
>
> * G1CollectedHeap::evacuation_should_fail()
> Why do we need evacuation_should_fail(volatile size_t* count)? It is 
> always called with _evacuation_failure_alot_count as parameter. How 
> about using that directly instead of passing it as a parameter? Also, 
> the nesting level is getting three levels deep. We could flatten that 
> by bailing out early if we are not triggering evacuation failures. So, 
> I think I would prefer evacuation_should_fail() to look something like:
>
> bool G1CollectedHeap::evacuation_should_fail() {
>   if (!G1EvacuationFailureALot || 
> !_evacuation_failure_alot_for_current_gc) {
>     return false;
>   }
>   // Access to _evacuation_failure_alot_count is not atomic; the value 
> does not have to be exact.
>   if (++_evacuation_failure_alot_count >= G1EvacuationFailureALotCount) {
>     _evacuation_failure_alot_count = 0;
>     return true;
>   }
>   return false;
> }
>

OK. How about also testing the negative of the second condition:

inline bool
G1CollectedHeap::evacuation_should_fail() {
  if (!G1EvacuationFailureALot || 
!_evacuation_failure_alot_for_current_gc) {
    return false;
  }
  // G1EvacuationFailureALot is in effect for current GC
  // Access to _evacuation_failure_alot_count is not atomic;
  // the value does not have to be exact.
  if (++_evacuation_failure_alot_count < G1EvacuationFailureALotCount) {
    return false;
  }
  _evacuation_failure_alot_count = 0;
  return true;
}

?

>
> * G1CollectedHeap::evacuation_failure_alot_for_gc_type() and 
> G1CollectedHeap::set_evacuation_failure_alot_for_current_gc()
> I am not sure that the method evacuation_failure_alot_for_gc_type() 
> actually makes the code easier to understand. I think I would prefer 
> to inline that logic in set_evacuation_failure_alot_for_current_gc(). 
> Something like:
>
> void G1CollectedHeap::set_evacuation_failure_alot_for_current_gc() {
>   if (G1EvacuationFailureALot) {
>     // Note we can't assert that _evacuation_failure_alot_for_current_gc
>     // is clear here. It may have been set during a previous GC but 
> that GC
>     // did not copy enough objects (i.e. G1EvacuationFailureALotCount) to
>     // trigger an evacuation failure and clear the flags and and counts.
>     _evacuation_failure_alot_for_current_gc = false;
>
>     // Check if we have gone over the interval.
>     const size_t gc_num = total_collections();
>     const size_t elapsed_gcs = gc_num - 
> _evacuation_failure_alot_gc_number;
>
>     if (elapsed_gcs >= G1EvacuationFailureALotInterval) {
>       const bool gcs_are_young = g1_policy()->gcs_are_young();
>       const bool during_im = g1_policy()->during_initial_mark_pause();
>       const bool during_marking = mark_in_progress();
>
>       if (gcs_are_young) {
>         _evacuation_failure_alot_for_current_gc = 
> G1EvacuationFailureALotDuringYoungGC
>           || (during_marking && G1EvacuationFailureALotDuringConcMark)
>           || (during_im && G1EvacuationFailureALotDuringInitialMark);
>       } else {
>         _evacuation_failure_alot_for_current_gc = 
> G1EvacuationFailureALotDuringMixedGC;
>       }
>     }
> }

I disagree. The conditional statement above is more complicated IMO. In 
the current version you don't need to know that during_initial_mark will 
be true iff gcs_are_young is true and so you don't need to encode that 
in a nested if. It's sufficient say for an initial mark pause - check 
the value of G1EvacuationFailureALotDuringInitialMark.

>
> * If we keep any changes in g1CollectedHeap.inline.hpp it should get 
> its copyright year updated

Oops. Thanks. Done.

JohnC