RFR/RFC: Tax-and-Spend allocation pacing

Thu Mar 8 17:48:23 UTC 2018

http://cr.openjdk.java.net/~shade/shenandoah/tax-and-spend/webrev.01/

Please review this at your leisure, as I would have time to revisit this only next week. There are
lots of comments in the source code itself.

This implements a simple Tax-and-Spend allocation pacing, and it is needed to catch up with
application allocations when either the GC cycle is in progress (e.g. when LDS is high, and not
enough free space is available, or there is an allocation spike), or when we are idle (e.g. heap is
small, and control loop has to react swiftly to start the GC).

This plays into our usual degradation scheme: if we blow the pacing budget, we would allocate, and
hope for the best. In worst case, we would Degenerate, as usual, instead of stalling the
applications threads indefinitely.

It will naturally allow higher allocation rates with larger heaps, while keeping application at bay
with small-to-moderate heaps. I have not been able to make the implementation into the performance
bottleneck even in the torturous 1+ TB/sec allocation tests, with enough heap available.

Passes hotspot_gc_shenandoah, but not all heuristics are implemented yet.

Motivational examples ["Easy does it", "haste makes waste", Gil's Ferrari-around-the-tree and other
adages apply when interpreting the results]:

=== Allocating "new Object()" in 16 threads and tiny -Xmx128m heap:

NOTE: This test basically tests if heuristics is able to catch up fast enough.

--- Before:

 Time per alloc:    34.192 ±    4.929  ns/op
 Allocation rate: 9730.235 ± 1526.052  MB/sec

   945 successful concurrent GCs
       0 invoked explicitly

  1280 Degenerated GCs
    1280 caused by allocation failure
       208 happened at Outside of Cycle
       778 happened at Mark
       291 happened at Evacuation
         3 happened at Update Refs
     604 upgraded to Full GC

   717 Full GCs
       0 invoked explicitly
     113 caused by allocation failure
     604 upgraded from Degenerated GC

--- After:

 Time per op:      175.004 ±  10.537 ns/op
 Allocation rate: 1900.956 ± 149.273 MB/sec

    377 successful concurrent GCs
       0 invoked explicitly

     0 Degenerated GCs
       0 caused by allocation failure
       0 upgraded to Full GC

     0 Full GCs
       0 invoked explicitly
       0 caused by allocation failure
       0 upgraded from Degenerated GC

=== TreeFragger with 16 threads, ~20 GB LDS and -Xmx30g:

NOTE: Allocation pacing provides *higher* average allocation rate, because STW GCs hurt.

--- Before

 Time per op:       166.156 ±   94.360 ns/op
 Allocation rate:  2469.184 ± 1466.191 MB/sec

     3 successful concurrent GCs
       0 invoked explicitly

    40 Degenerated GCs
      40 caused by allocation failure
         4 happened at Outside of Cycle
        36 happened at Mark
       1 upgraded to Full GC

    11 Full GCs
       0 invoked explicitly
      10 caused by allocation failure
       1 upgraded from Degenerated GC

--- After

 Time per op:       62.819 ±  104.797 ns/op
 Allocation rate: 5716.089 ± 1748.301 MB/sec

    97 successful concurrent GCs
       0 invoked explicitly

     0 Degenerated GCs
       0 caused by allocation failure
       0 upgraded to Full GC

     0 Full GCs
       0 invoked explicitly
       0 caused by allocation failure
       0 upgraded from Degenerated GC

Thanks,
-Aleksey