Stack allocation prototype for C2
Simonis, Volker
simonisv at amazon.de
Mon Jan 27 23:18:39 UTC 2020
Am 27.01.2020 17:00 schrieb Charlie Gracie <Charlie.Gracie at microsoft.com>:
>
> Hi,
>
> We have been prototyping Stack Allocation in C2. We are hoping to start a conversation on our approach
> and to see if there is interest in the community for us to submit our work as a patch set. We would
> appreciate any comments, concerns or feedback on our current approach or the idea in general. Thanks!
>
Hi Charlie, Nikola
this looks really very interesting and the performance gains are impressive!
If you need help with uploading a version of your patch to cr.openjdk.java.net please let me know. I'll be happy to assist.
Looking forward hearing Nikola's talk about the details at FOSDEM.
Save travels,
Volker
> Background on why we are investigating Stack Allocation:
> While investigating the performance of different workloads (a lot of Scala but some plain Java), we noticed
> a lot of GC and heap pressure due to transient object allocations. Upon further digging we were able to see
> that C2’s Escape Analysis phase was correctly recognizing these allocations as Non Escaping. The allocations
> were not being Scalar Replaced due to a few different restrictions but the most common was due to control
> flow merge points before accessing the objects. Here is a simplified example of a case where an allocation
> cannot be Scalar Replaced even though it is recognized as Non Escaping.
>
> int foo(int x, MyObject[] vals) {
> MyObject bar;
> If (x > 127) {
> bar = new MyObject(x);
> } else {
> bar = vals[x];
> }
> return bar.x;
> }
>
> In the above example the allocation could successfully be replaced with a Stack Allocation. We believe that
> a lot of code has control flow merge points that cause Scalar Replacement to be rejected but Stack Allocation
> could work for many of them. The obvious benefit is the reduction in GC/heap pressure, but you also get the
> added benefit of improved cache hit rate since the stack allocated objects will likely be in the cache memory.
> There are costs with Stack Allocation including increased compilation times and increased code cache size.
> These would have to be carefully measured fully understand the benefits and costs.
>
> Our current approach:
> 1. Only allow stack allocation in non-compressed oops mode (-XX:-UseCompressedOops)
> 1.1 This simplifies the prototype and we believe there are solutions for compressed oops
> 2. Focus on ParallelGC and G1GC for prototype
> 2.1 Support for other collectors needs to be investigated
> 3. Current prototype is based on JDK 11
> 3.1 We are going to work on moving it up to tip shortly
> 3.2 We have been following the changes in tip and this work should apply easily
> 4. Use the results from Escape Analysis to decide which objects are eligible for Stack Allocation
> 4.1 Scalar Replacement is the most important, so allow that to happen first
> 4.2 Arrays cannot be stack allocated in the current prototype.
> 4.2.1 Straight forward to implement
> 4.3 Stack allocated objects cannot reference other stack allocated objects.
> 4.3.1 More difficult than arrays but should be doable
> 4.4 Reject allocations based on the reasons listed below in the details section
> 5. In macro expansion during expand allocation replace allocations eligible for Stack Allocation.
> 5.1 Replace the AllocNode with a BoxLockNode with the correct stack location and remove the slow
> path and other unused IR
> 5.2 Repurpose BoxLock nodes to represent the stack location for the object
> 5.3 No Stack Allocation in methods that contain synchronization.
> 5.3.1 This restriction should be easy to remove.
> 5.4 Remove write barriers on stores into stack allocated objects
> 6. Grow the frame size to handle the stack allocated objects
> 7. Modify OopMap scanning to handle stack allocated objects
> 7.1 Currently handled via a generic OopClosure for all GCs
> 7.2 More work required for full GC collector coverage
> 8. Safepoint support reuses SafePointScalarObject
> 8.1 De-optimization is working and correctly allocating heap objects and initializing their fields
>
> Current Results:
> 1. Microbenchmarks using JMH simply replace all allocations using Stack Allocation where Scalar Replacement
> 2. Workloads in DeCapo and Renaissance are showing benefits
> 3. We have not measured any negative impacts on application throughput
> 4. More detailed analysis on compilation times and code cache size are required
>
> Benchmark % Improvement
> DeCapo tmt 15
> Renaissance db-shootout 10
> Renaissance fj-kmeans 5
> Renaissance future-genetic 10
> Renaissance movie-lens 5
> Renaissance neo4j-analytics 45
>
> Details on why allocations should not be replaced with Stack Allocation:
> Not all Non Escaping objects can be Stack Allocated so there will still be cases where Non Escaping allocations
> will need to be allocated on the heap:
> 1. Large arrays and objects should likely be ignored as they could significantly impact the stack frame size
> 2. Similar to 1 limit the total amount of stack memory per frame that should be used for Stack Allocations
> 3. Objects and arrays without constant sizes/lengths
> 4. Allocations with over lapping lifetimes from different loop iterations
> 5. A stack allocated object cannot be referenced by any other Non Escaping objects that are not scalar replaced
> or stack allocated as that would count as an escape
>
> Here is a simple example of what I was describing in 4.
> int method(int[] vals) {
> MyObject o1 = new MyObject(vals[0]);
> for (int i =1; i < vals.length; ++) {
> MyObject o2 = new MyObject(vals[i]);
> if (!o1.equals(o2)) {
> o1 = o2;
> …
> }
> …
> }
> …
> }
>
> In the above example if o2 was stack allocated and you ever entered the “if” from that point on o1 and o2 would
> always be the same object. This is incorrect so o2 cannot be stack allocated.
>
> Thanks for reading and we look forward to your feedback!
> Charlie Gracie & Nikola Grcevski
>
>
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
More information about the hotspot-compiler-dev
mailing list