Stack allocation prototype for C2

Mon Jan 27 23:18:39 UTC 2020

Am 27.01.2020 17:00 schrieb Charlie Gracie <Charlie.Gracie at microsoft.com>:
>
> Hi,
>
> We have been prototyping Stack Allocation in C2. We are hoping to start a conversation on our approach
> and to see if there is interest in the community for us to submit our work as a patch set. We would
> appreciate any comments, concerns or feedback on our current approach or the idea in general. Thanks!
>

Hi Charlie, Nikola

this looks really very interesting and the performance gains are impressive!

If you need help with uploading a version of your patch to cr.openjdk.java.net please let me know. I'll be happy to assist.

Looking forward hearing Nikola's talk about the details at FOSDEM.

Save travels,
Volker

> Background on why we are investigating Stack Allocation:
> While investigating the performance of different workloads (a lot of Scala but some plain Java), we noticed
> a lot of GC and heap pressure due to transient object allocations. Upon further digging we were able to see
> that C2’s Escape Analysis phase was correctly recognizing these allocations as Non Escaping. The allocations
> were not being Scalar Replaced due to a few different restrictions but the most common was due to control
> flow merge points before accessing the objects. Here is a simplified example of a case where an allocation
> cannot be Scalar Replaced even though it is recognized as Non Escaping.
>
> int foo(int x, MyObject[] vals) {
>    MyObject bar;
>    If (x > 127) {
>       bar = new MyObject(x);
>    } else {
>       bar = vals[x];
>    }
>    return bar.x;
> }
>
> In the above example the allocation could successfully be replaced with a Stack Allocation. We believe that
> a lot of code has control flow merge points that cause Scalar Replacement to be rejected but Stack Allocation
> could work for many of them. The obvious benefit is the reduction in GC/heap pressure, but you also get the
> added benefit of improved cache hit rate since the stack allocated objects will likely be in the cache memory.
> There are costs with Stack Allocation including increased compilation times and increased code cache size.
> These would have to be carefully measured fully understand the benefits and costs.
>
> Our current approach:
> 1. Only allow stack allocation in non-compressed oops mode (-XX:-UseCompressedOops)
>         1.1 This simplifies the prototype and we believe there are solutions for compressed oops
> 2. Focus on ParallelGC and G1GC for prototype
>         2.1 Support for other collectors needs to be investigated
> 3. Current prototype is based on JDK 11
>         3.1 We are going to work on moving it up to tip shortly
>         3.2 We have been following the changes in tip and this work should apply easily
> 4. Use the results from Escape Analysis to decide which objects are eligible for Stack Allocation
>         4.1 Scalar Replacement is the most important, so allow that to happen first
>         4.2 Arrays cannot be stack allocated in the current prototype.
>                 4.2.1 Straight forward to implement
>         4.3 Stack allocated objects cannot reference other stack allocated objects.
>                 4.3.1 More difficult than arrays but should be doable
>         4.4 Reject allocations based on the reasons listed below in the details section
> 5. In macro expansion during expand allocation replace allocations eligible for Stack Allocation.
>         5.1 Replace the AllocNode with a BoxLockNode with the correct stack location and remove the slow
> path and other unused IR
>         5.2 Repurpose BoxLock nodes to represent the stack location for the object
>         5.3 No Stack Allocation in methods that contain synchronization.
>                 5.3.1 This restriction should be easy to remove.
>         5.4 Remove write barriers on stores into stack allocated objects
> 6. Grow the frame size to handle the stack allocated objects
> 7. Modify OopMap scanning to handle stack allocated objects
>         7.1 Currently handled via a generic OopClosure for all GCs
>         7.2 More work required for full GC collector coverage
> 8. Safepoint support reuses SafePointScalarObject
>         8.1 De-optimization is working and correctly allocating heap objects and initializing their fields
>
> Current Results:
> 1. Microbenchmarks using JMH simply replace all allocations using Stack Allocation where Scalar Replacement
> 2. Workloads in DeCapo and Renaissance are showing benefits
> 3. We have not measured any negative impacts on application throughput
> 4. More detailed analysis on compilation times and code cache size are required
>
> Benchmark                                  % Improvement
> DeCapo tmt                                           15
> Renaissance db-shootout                   10
> Renaissance fj-kmeans                          5
> Renaissance future-genetic                10
> Renaissance movie-lens                        5
> Renaissance neo4j-analytics              45
>
> Details on why allocations should not be replaced with Stack Allocation:
> Not all Non Escaping objects can be Stack Allocated so there will still be cases where Non Escaping allocations
> will need to be allocated on the heap:
> 1. Large arrays and objects should likely be ignored as they could significantly impact the stack frame size
> 2. Similar to 1 limit the total amount of stack memory per frame that should be used for Stack Allocations
> 3. Objects and arrays without constant sizes/lengths
> 4. Allocations with over lapping lifetimes from different loop iterations
> 5. A stack allocated object cannot be referenced by any other Non Escaping objects that are not scalar replaced
> or stack allocated as that would count as an escape
>
> Here is a simple example of what I was describing in 4.
> int method(int[] vals) {
>    MyObject o1 = new MyObject(vals[0]);
>    for (int i =1; i < vals.length; ++) {
>       MyObject o2 = new MyObject(vals[i]);
>       if (!o1.equals(o2)) {
>          o1 = o2;
>          …
>       }
>       …
>    }
>    …
> }
>
> In the above example if o2 was stack allocated and you ever entered the “if” from that point on o1 and o2 would
> always be the same object. This is incorrect so o2 cannot be stack allocated.
>
> Thanks for reading and we look forward to your feedback!
> Charlie Gracie & Nikola Grcevski
>
>

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879