Stack allocation prototype for C2

Mon Jan 27 15:58:43 UTC 2020

Hi,

We have been prototyping Stack Allocation in C2. We are hoping to start a conversation on our approach
and to see if there is interest in the community for us to submit our work as a patch set. We would
appreciate any comments, concerns or feedback on our current approach or the idea in general. Thanks!

Background on why we are investigating Stack Allocation:
While investigating the performance of different workloads (a lot of Scala but some plain Java), we noticed
a lot of GC and heap pressure due to transient object allocations. Upon further digging we were able to see
that C2’s Escape Analysis phase was correctly recognizing these allocations as Non Escaping. The allocations
were not being Scalar Replaced due to a few different restrictions but the most common was due to control
flow merge points before accessing the objects. Here is a simplified example of a case where an allocation
cannot be Scalar Replaced even though it is recognized as Non Escaping.

int foo(int x, MyObject[] vals) {
   MyObject bar;
   If (x > 127) {
      bar = new MyObject(x);
   } else {
      bar = vals[x];
   }
   return bar.x;
}

In the above example the allocation could successfully be replaced with a Stack Allocation. We believe that
a lot of code has control flow merge points that cause Scalar Replacement to be rejected but Stack Allocation
could work for many of them. The obvious benefit is the reduction in GC/heap pressure, but you also get the
added benefit of improved cache hit rate since the stack allocated objects will likely be in the cache memory.
There are costs with Stack Allocation including increased compilation times and increased code cache size.
These would have to be carefully measured fully understand the benefits and costs.

Our current approach:
1. Only allow stack allocation in non-compressed oops mode (-XX:-UseCompressedOops)
	1.1 This simplifies the prototype and we believe there are solutions for compressed oops
2. Focus on ParallelGC and G1GC for prototype
	2.1 Support for other collectors needs to be investigated
3. Current prototype is based on JDK 11
	3.1 We are going to work on moving it up to tip shortly
	3.2 We have been following the changes in tip and this work should apply easily
4. Use the results from Escape Analysis to decide which objects are eligible for Stack Allocation
	4.1 Scalar Replacement is the most important, so allow that to happen first
	4.2 Arrays cannot be stack allocated in the current prototype.
		4.2.1 Straight forward to implement
	4.3 Stack allocated objects cannot reference other stack allocated objects.
		4.3.1 More difficult than arrays but should be doable
	4.4 Reject allocations based on the reasons listed below in the details section
5. In macro expansion during expand allocation replace allocations eligible for Stack Allocation.
	5.1 Replace the AllocNode with a BoxLockNode with the correct stack location and remove the slow
path and other unused IR
	5.2 Repurpose BoxLock nodes to represent the stack location for the object
	5.3 No Stack Allocation in methods that contain synchronization. 
		5.3.1 This restriction should be easy to remove.
	5.4 Remove write barriers on stores into stack allocated objects
6. Grow the frame size to handle the stack allocated objects
7. Modify OopMap scanning to handle stack allocated objects
	7.1 Currently handled via a generic OopClosure for all GCs
	7.2 More work required for full GC collector coverage 
8. Safepoint support reuses SafePointScalarObject
	8.1 De-optimization is working and correctly allocating heap objects and initializing their fields

Current Results:
1. Microbenchmarks using JMH simply replace all allocations using Stack Allocation where Scalar Replacement
2. Workloads in DeCapo and Renaissance are showing benefits
3. We have not measured any negative impacts on application throughput
4. More detailed analysis on compilation times and code cache size are required

Benchmark                                  % Improvement
DeCapo tmt                                           15
Renaissance db-shootout                   10
Renaissance fj-kmeans                          5
Renaissance future-genetic                10
Renaissance movie-lens                        5
Renaissance neo4j-analytics              45

Details on why allocations should not be replaced with Stack Allocation:
Not all Non Escaping objects can be Stack Allocated so there will still be cases where Non Escaping allocations
will need to be allocated on the heap:
1. Large arrays and objects should likely be ignored as they could significantly impact the stack frame size
2. Similar to 1 limit the total amount of stack memory per frame that should be used for Stack Allocations
3. Objects and arrays without constant sizes/lengths
4. Allocations with over lapping lifetimes from different loop iterations
5. A stack allocated object cannot be referenced by any other Non Escaping objects that are not scalar replaced
or stack allocated as that would count as an escape

Here is a simple example of what I was describing in 4.
int method(int[] vals) {
   MyObject o1 = new MyObject(vals[0]);
   for (int i =1; i < vals.length; ++) {
      MyObject o2 = new MyObject(vals[i]);
      if (!o1.equals(o2)) {
         o1 = o2;
         …
      }
      …
   }
   …
}

In the above example if o2 was stack allocated and you ever entered the “if” from that point on o1 and o2 would
always be the same object. This is incorrect so o2 cannot be stack allocated.

Thanks for reading and we look forward to your feedback!
Charlie Gracie & Nikola Grcevski