RFR/RFC: Tax-and-Spend allocation pacing

Fri Mar 9 10:47:43 UTC 2018

Hi Aleksey,

this is great stuff. The patch looks good to me.

I wonder what is needed to make this work with partial and traversal GC?
Or would that only be the boilerplate stuff like in
init_concurrent_normal_cycle() to set up pacing? Because the other hooks
(to report evacs and intercept allocs) are already in their right places?

Cheers, Roman

> http://cr.openjdk.java.net/~shade/shenandoah/tax-and-spend/webrev.01/
> 
> Please review this at your leisure, as I would have time to revisit this only next week. There are
> lots of comments in the source code itself.
> 
> This implements a simple Tax-and-Spend allocation pacing, and it is needed to catch up with
> application allocations when either the GC cycle is in progress (e.g. when LDS is high, and not
> enough free space is available, or there is an allocation spike), or when we are idle (e.g. heap is
> small, and control loop has to react swiftly to start the GC).
> 
> This plays into our usual degradation scheme: if we blow the pacing budget, we would allocate, and
> hope for the best. In worst case, we would Degenerate, as usual, instead of stalling the
> applications threads indefinitely.
> 
> It will naturally allow higher allocation rates with larger heaps, while keeping application at bay
> with small-to-moderate heaps. I have not been able to make the implementation into the performance
> bottleneck even in the torturous 1+ TB/sec allocation tests, with enough heap available.
> 
> Passes hotspot_gc_shenandoah, but not all heuristics are implemented yet.
> 
> Motivational examples ["Easy does it", "haste makes waste", Gil's Ferrari-around-the-tree and other
> adages apply when interpreting the results]:
> 
> === Allocating "new Object()" in 16 threads and tiny -Xmx128m heap:
> 
> NOTE: This test basically tests if heuristics is able to catch up fast enough.
> 
> --- Before:
> 
>  Time per alloc:    34.192 ±    4.929  ns/op
>  Allocation rate: 9730.235 ± 1526.052  MB/sec
> 
>    945 successful concurrent GCs
>        0 invoked explicitly
> 
>   1280 Degenerated GCs
>     1280 caused by allocation failure
>        208 happened at Outside of Cycle
>        778 happened at Mark
>        291 happened at Evacuation
>          3 happened at Update Refs
>      604 upgraded to Full GC
> 
>    717 Full GCs
>        0 invoked explicitly
>      113 caused by allocation failure
>      604 upgraded from Degenerated GC
> 
> 
> --- After:
> 
>  Time per op:      175.004 ±  10.537 ns/op
>  Allocation rate: 1900.956 ± 149.273 MB/sec
> 
>     377 successful concurrent GCs
>        0 invoked explicitly
> 
>      0 Degenerated GCs
>        0 caused by allocation failure
>        0 upgraded to Full GC
> 
>      0 Full GCs
>        0 invoked explicitly
>        0 caused by allocation failure
>        0 upgraded from Degenerated GC
> 
> 
> === TreeFragger with 16 threads, ~20 GB LDS and -Xmx30g:
> 
> NOTE: Allocation pacing provides *higher* average allocation rate, because STW GCs hurt.
> 
> --- Before
> 
>  Time per op:       166.156 ±   94.360 ns/op
>  Allocation rate:  2469.184 ± 1466.191 MB/sec
> 
>      3 successful concurrent GCs
>        0 invoked explicitly
> 
>     40 Degenerated GCs
>       40 caused by allocation failure
>          4 happened at Outside of Cycle
>         36 happened at Mark
>        1 upgraded to Full GC
> 
>     11 Full GCs
>        0 invoked explicitly
>       10 caused by allocation failure
>        1 upgraded from Degenerated GC
> 
> 
> 
> --- After
> 
>  Time per op:       62.819 ±  104.797 ns/op
>  Allocation rate: 5716.089 ± 1748.301 MB/sec
> 
>     97 successful concurrent GCs
>        0 invoked explicitly
> 
>      0 Degenerated GCs
>        0 caused by allocation failure
>        0 upgraded to Full GC
> 
>      0 Full GCs
>        0 invoked explicitly
>        0 caused by allocation failure
>        0 upgraded from Degenerated GC
> 
> 
> 
> Thanks,
> -Aleksey
> 
> 
>