[EXTERNAL] Stack allocation prototype for C2

Charlie Gracie Charlie.Gracie at microsoft.com
Tue Jan 28 22:45:42 UTC 2020


Hi Volker,

That would be great! We will reach out at FOSDEM to learn about uploading a patch to cr.openjdk.java.net. After we return from Europe we will look into getting the patch uploaded for viewing, suggestions, refinements and testing.

Cheers,
Charlie

From: "Simonis, Volker" <simonisv at amazon.de>
Date: Monday, January 27, 2020 at 6:18 PM
To: Nikola Grcevski <Nikola.Grcevski at microsoft.com>, "hotspot-compiler-dev at openjdk.java.net" <hotspot-compiler-dev at openjdk.java.net>, Charlie Gracie <Charlie.Gracie at microsoft.com>
Subject: [EXTERNAL] Stack allocation prototype for C2


Am 27.01.2020 17:00 schrieb Charlie Gracie <Charlie.Gracie at microsoft.com>: 
>
> Hi, 
>
> We have been prototyping Stack Allocation in C2. We are hoping to start a conversation on our approach 
> and to see if there is interest in the community for us to submit our work as a patch set. We would 
> appreciate any comments, concerns or feedback on our current approach or the idea in general. Thanks! 
>

Hi Charlie, Nikola

this looks really very interesting and the performance gains are impressive!

If you need help with uploading a version of your patch to cr.openjdk.java.net please let me know. I'll be happy to assist.

Looking forward hearing Nikola's talk about the details at FOSDEM.

Save travels,
Volker

> Background on why we are investigating Stack Allocation: 
> While investigating the performance of different workloads (a lot of Scala but some plain Java), we noticed 
> a lot of GC and heap pressure due to transient object allocations. Upon further digging we were able to see 
> that C2’s Escape Analysis phase was correctly recognizing these allocations as Non Escaping. The allocations 
> were not being Scalar Replaced due to a few different restrictions but the most common was due to control 
> flow merge points before accessing the objects. Here is a simplified example of a case where an allocation 
> cannot be Scalar Replaced even though it is recognized as Non Escaping. 
>
> int foo(int x, MyObject[] vals) { 
>    MyObject bar; 
>    If (x > 127) { 
>       bar = new MyObject(x); 
>    } else { 
>       bar = vals[x]; 
>    } 
>    return bar.x; 
> } 
>
> In the above example the allocation could successfully be replaced with a Stack Allocation. We believe that 
> a lot of code has control flow merge points that cause Scalar Replacement to be rejected but Stack Allocation 
> could work for many of them. The obvious benefit is the reduction in GC/heap pressure, but you also get the 
> added benefit of improved cache hit rate since the stack allocated objects will likely be in the cache memory. 
> There are costs with Stack Allocation including increased compilation times and increased code cache size. 
> These would have to be carefully measured fully understand the benefits and costs. 
>
> Our current approach: 
> 1. Only allow stack allocation in non-compressed oops mode (-XX:-UseCompressedOops) 
>         1.1 This simplifies the prototype and we believe there are solutions for compressed oops 
> 2. Focus on ParallelGC and G1GC for prototype 
>         2.1 Support for other collectors needs to be investigated 
> 3. Current prototype is based on JDK 11 
>         3.1 We are going to work on moving it up to tip shortly 
>         3.2 We have been following the changes in tip and this work should apply easily 
> 4. Use the results from Escape Analysis to decide which objects are eligible for Stack Allocation 
>         4.1 Scalar Replacement is the most important, so allow that to happen first 
>         4.2 Arrays cannot be stack allocated in the current prototype. 
>                 4.2.1 Straight forward to implement 
>         4.3 Stack allocated objects cannot reference other stack allocated objects. 
>                 4.3.1 More difficult than arrays but should be doable 
>         4.4 Reject allocations based on the reasons listed below in the details section 
> 5. In macro expansion during expand allocation replace allocations eligible for Stack Allocation. 
>         5.1 Replace the AllocNode with a BoxLockNode with the correct stack location and remove the slow 
> path and other unused IR 
>         5.2 Repurpose BoxLock nodes to represent the stack location for the object 
>         5.3 No Stack Allocation in methods that contain synchronization. 
>                 5.3.1 This restriction should be easy to remove. 
>         5.4 Remove write barriers on stores into stack allocated objects 
> 6. Grow the frame size to handle the stack allocated objects 
> 7. Modify OopMap scanning to handle stack allocated objects 
>         7.1 Currently handled via a generic OopClosure for all GCs 
>         7.2 More work required for full GC collector coverage 
> 8. Safepoint support reuses SafePointScalarObject 
>         8.1 De-optimization is working and correctly allocating heap objects and initializing their fields 
>
> Current Results: 
> 1. Microbenchmarks using JMH simply replace all allocations using Stack Allocation where Scalar Replacement 
> 2. Workloads in DeCapo and Renaissance are showing benefits 
> 3. We have not measured any negative impacts on application throughput 
> 4. More detailed analysis on compilation times and code cache size are required 
>
> Benchmark                                  % Improvement 
> DeCapo tmt                                           15 
> Renaissance db-shootout                   10 
> Renaissance fj-kmeans                          5 
> Renaissance future-genetic                10 
> Renaissance movie-lens                        5 
> Renaissance neo4j-analytics              45 
>
> Details on why allocations should not be replaced with Stack Allocation: 
> Not all Non Escaping objects can be Stack Allocated so there will still be cases where Non Escaping allocations 
> will need to be allocated on the heap: 
> 1. Large arrays and objects should likely be ignored as they could significantly impact the stack frame size 
> 2. Similar to 1 limit the total amount of stack memory per frame that should be used for Stack Allocations 
> 3. Objects and arrays without constant sizes/lengths 
> 4. Allocations with over lapping lifetimes from different loop iterations 
> 5. A stack allocated object cannot be referenced by any other Non Escaping objects that are not scalar replaced 
> or stack allocated as that would count as an escape 
>
> Here is a simple example of what I was describing in 4. 
> int method(int[] vals) { 
>    MyObject o1 = new MyObject(vals[0]); 
>    for (int i =1; i < vals.length; ++) { 
>       MyObject o2 = new MyObject(vals[i]); 
>       if (!o1.equals(o2)) { 
>          o1 = o2; 
>          … 
>       } 
>       … 
>    } 
>    … 
> } 
>
> In the above example if o2 was stack allocated and you ever entered the “if” from that point on o1 and o2 would 
> always be the same object. This is incorrect so o2 cannot be stack allocated. 
>
> Thanks for reading and we look forward to your feedback! 
> Charlie Gracie & Nikola Grcevski 
>
>




Amazon Development Center Germany GmbH 
Krausenstr. 38 
10117 Berlin 
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss 
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B 
Sitz: Berlin 
Ust-ID: DE 289 237 879 






More information about the hotspot-compiler-dev mailing list