Stack allocation prototype for C2

Charlie Gracie Charlie.Gracie at microsoft.com
Wed Jul 8 20:41:31 UTC 2020


Hi Sergey,

To get an idea of the objects which are being stack allocated you can use a fastdebug build and gather the output 
from -XX:+PrintStackAllocation. This static view can be combined with inspecting the source code to find patterns
where allocations can be stack allocated but fail to be scalar replaced. This information is not great to understand
which allocation sites are important since it just describes where heap allocations were replaced with stack allocations
but not the frequency that they are used at runtime.

The common patterns we have recognized are:
1.	Boxing objects, with caches, make up a significant portion of the wins we measured.
2.           Iterators and transient data created during collection iteration.
3.	Object chains of non-escaping objects. In these scenarios a lot of the time the root object gets scalar replaced (SCR)
	but the children objects do not. I think SCR might be able to be improved for some of these cases but I need to get
	more data to understand why it is failing.
4.	Backing arrays for data structures. A lot of data structures have a default initial array length. Since the array may
	grow it is not eligible for SCR but it may be eligible for stack allocation. This is a common subcase of #3 but I separated
	it out since the reason why SCR fails is due to merge points.


To get a better understanding of the runtime wins we gathered JFR data with and without stack allocation enabled for some of the 
benchmarks showing large reductions in heap allocation. These workloads were all Scala based.

1.	In TMT, almost 100% of the reduction in heap allocations is due to stack allocation of java.lang.Double objects created
	via scala.runtime.BoxesRunTime.boxToDouble(double). The reduction is due to 2 different call stacks where this method
	was inlined. Here are the 2 callers that generate the allocations which get stack allocated.
		a. scala.runtime.ScalaRunTime$.array_apply(Object, int)
		b. edu.stanford.nlp.tmt.model.SoftAssignmentModel$$anonfun$summary$1$$anonfun$apply$5.apply(Object). 

2.	In ALS ,almost 100% of the reduction in heap allocations is due to stack allocation of java.lang.Integer objects created via
	scala.runtime.BoxesRunTime.boxToInteger(int). The reduction is due to 1 call stack containing the following caller.
		a. scala.runtime.ScalaRunTime$.array_apply(Object, int). When this function is used for primitive arrays it looks
		like stack allocation can regularly see big wins with the right amount of inlining.

3.	In factorie, there are 5 object types that benefit from stack allocation to reduce overall heap allocations. Digging
	further into the call stacks for the 5 allocation sites it appears as they are all related to iterating over data structures.
	Most of the objects are transient objects used for a single iteration and are not Boxed primitives. The object types are:
		a. scala.Some which is allocated as the result of scala.collection.mutable.HashMap.get(Object)
		b. scala.collection.immutable.ListBuffer which is allocated by scala.collection.immutable.List$.newBuilder()
		c. cc.factorie.generative.Proportions[] which is allocated by
		cc.factorie.generative.DiscreteMixtureVar$class.chosenParents(DiscreteMixtureVar)
		d. cc.factorie.package$$anon$1 which is allocated by cc.factorie.package$.singleFactorIterable(Factor)
		e. cc.factorie.Domain$$anonfun$get$1 which is allocated by cc.factorie.Domain$.get(Class)

I hope this is the type of information you were looking for. If you have any other questions or would like to see more/different data please let us know. I can always make log files available via our GitHub project or similar if that helps.

Charlie

On 2020-06-29, 11:34 PM, "hotspot-compiler-dev on behalf of Sergey Kuksenko" <hotspot-compiler-dev-retn at openjdk.java.net on behalf of sergey.kuksenko at oracle.com> wrote:

    I am just curious.
    
    For each benchmark you show allocation reduce size in general. Do you 
    have statistics which stack allocated objects gives major impact? And 
    which code patterns fail scalar replacement except well know Integer 
    cache flow merge?
    
    On 6/29/20 2:05 PM, Charlie Gracie wrote:
    > Hi hotspot-compiler-dev community,
    >
    > Here is the prototype code for our work on adding stack allocation to the HotSpot C2 compiler. We are looking for any and all feedback
    > as we hope to move from a prototype to something that could be contributed. A change of this size is difficult to review so we
    > understand the process will be thorough and will take time to complete. Any suggestions on how to allow for collaboration with others,
    > if they wanted to, would also be appreciated (i.e., a repo somewhere).
    >
    > For a quick refresher here is a link to Nikola’s talk at FOSDEM:
    > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffosdem.org%2F2020%2Fschedule%2Fevent%2Freducing_gc_times%2F&data=02%7C01%7Ccharlie.gracie%40microsoft.com%7C9e9b56c23fde463bf6b808d81ca68bf4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637290848926541670&sdata=qB1c8l5mUVk%2BAt7W5178A9wQ3pauoxW6XTVCfOTOmHw%3D&reserved=0
    >
    > Here is a link to our initial webrev:
    > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~adityam%2Fcharlie%2Fstack_alloc%2F&data=02%7C01%7Ccharlie.gracie%40microsoft.com%7C9e9b56c23fde463bf6b808d81ca68bf4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637290848926541670&sdata=46mF34J4XcMV58TJxvJ4%2FiDSxL41TSKgW0X2MX7HRV4%3D&reserved=0
    >
    > Expecting that a change like this will require a JEP, we have prepared a document describing our work based off of the JEP submission
    > form. Our document has a few extra sections at the end discussing areas that we are looking for guidance on and some initial
    > performance results. This document can be found here:
    > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fopenjdk-proposals%2Fblob%2Fmaster%2Fstack_allocation%2FStack_Allocation_JEP.md&data=02%7C01%7Ccharlie.gracie%40microsoft.com%7C9e9b56c23fde463bf6b808d81ca68bf4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637290848926541670&sdata=V%2BqKZ9QgCd%2BKDbFb9MqFDoxdtXm8fFmgh%2FLYxgiGqJA%3D&reserved=0
    >
    > Thanks in advance for reviews, suggestions, concerns, comments and issues.
    > Charlie and Nikola
    >
    



More information about the hotspot-compiler-dev mailing list