Escape Analysis and Stack Allocation

Sun Feb 2 01:09:59 PST 2014

Hi guys,

I finally ported MLAB to jdk7 and 8. There was a problem due to new additional allocation path in runtime.
It passed small set of tests and benchmarks (jvm98, jbb2000, jbb2005) but nothing is guaranteed :)

http://cr.openjdk.java.net/~kvn/mlab_7/webrev/
http://cr.openjdk.java.net/~kvn/mlab_8/webrev/

I would like to hear more about Jeremy's implementation comparing to my. May be we can improve this.

About changes:
- It is supported only with C2, x86_64 and ParallelOldGC (did not try other GCs).
- mlab is the same structure as tlab which is embedded into thread structure (the same as tlab) (thread.?pp)
- mlab->top pointer is saved on compiled method entry and restored on exit (x86_64.ad, assembler_x86.cpp)
- Allocation of objects which does not escape is done in mlab (macro.cpp)
- Runtime allocation is called only when no space left in mlab. During runtime call new mlab is allocated and compiled 
frames are walked to patch saved mlab->top on stack with new top value (Thread::clear_mlab_allocation()).
- some print and statistic output is added (copied from tlab) but nothing fancy.
- default size is small 4Kb, you may want to play with it. All flags are defined in globals.hpp

I forgot the reason for changes in escape.cpp but added comment what I think it was for. But I don't remember - you can 
play with it.

Usual disclaimer: it is experimental work - don't expect it will run everything. Performance also is not guaranteed :)

Regards,
Vladimir

On 1/29/14 4:45 PM, Benedict Elliott Smith wrote:
> I'd definitely be interested if you could dig it out, especially if you
> have the commit you forked from to compare against so I can figure out what
> you changed and why. This approach seems just (or almost) as good to me, if
> we can eliminate the problem you mention by, e.g. creating a special region
> as Jeremy suggested. It also sounds a lot less involved to get to a
> position to trial it, which is a huge plus.
>
>
> On 28 January 2014 06:08, Vladimir Kozlov <vladimir.kozlov at oracle.com>wrote:
>
>> I did the same experiment 4 years ago, back in jdk6 era. Called it MLAB,
>> method local allocation buffer, works like thread local stack for
>> non-escaping objects but was allocated in java heap as special TLAB. Got it
>> worked but did not see benefits in jbb2005. GC requires no holes in heap,
>> as result I had to give the buffer back to GC when young gen collection was
>> needed. After GC a thread get new MLAB and starts allocation from scratch
>> which nullified performance benefits.
>>
>> I can try to find those changes if someone interested.
>>
>> Regards,
>> Vladimir
>>
>>
>> On 1/27/14 9:21 PM, Jeremy Manson wrote:
>>
>>> I tried implementing direct stack allocation in Hotspot a couple of years
>>> ago.  It was a pain to try to allocate anything outside the heap - there
>>> are a lot of checks to make sure that your objects live on the heap.
>>>
>>> I ended up creating TLAB-like regions in the heap that could hold objects
>>> allocated in a stack-like way.  It was a lot easier that way, and seemed
>>> to
>>> give the kinds of performance benefits you would expect.
>>>
>>> I never got around to trying to wire it up to Hotspot's escape analysis,
>>> but it was a fairly obvious next step.
>>>
>>> Jeremy
>>>
>>>
>>> On Sun, Jan 26, 2014 at 12:26 PM, Aaron Grunthal <
>>> aaron.grunthal at infinite-source.de> wrote:
>>>
>>>   There also is an issue with merge points [1] which prevents objects in
>>>> loops with an accumulator (e.g. reduce operations on streams) to
>>>> stack-allocate the intermediate values.
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-6853701
>>>>
>>>> - Aaron
>>>>
>>>>
>>>> On 26.01.2014 07:04, Benedict Elliott Smith wrote:
>>>>
>>>>   Hi,
>>>>>
>>>>> I was digging into some (to me) unexpected behaviour of escape analysis,
>>>>> namely that some references that clearly weren't escaping, and easily
>>>>> determined to be so, were not being stack allocated.
>>>>>
>>>>> So, after some digging through the hotspot code, I discovered some
>>>>> things
>>>>> that were probably obvious to everyone on this list, but also some
>>>>> things
>>>>> I'm still a little perplexed about. I was hoping somebody could
>>>>> enlighten
>>>>> me about the latter.
>>>>>
>>>>> 1) I cannot see a reason why stores to a primitive array, for instance,
>>>>> should cause the argument to escape in bcEscapeAnalyser.cpp
>>>>> *iterate_one_block()*; most interestingly, a store to an object array
>>>>> does
>>>>> not result in this, which seems incongruous;
>>>>>
>>>>> 2) An object array store *does* however result in *set_global_escape()*
>>>>> for
>>>>> the value being stored, which makes sense, except that this should only
>>>>> be
>>>>> *set_method_escape()*, as per the paper, in the case where the target
>>>>> array
>>>>> is one of the method arguments. This seems to be missing, here and for
>>>>> *putfield*.
>>>>>
>>>>> Some other weird ones are *arraylength*, *getfield*, *ifnonnull*, etc.
>>>>> The
>>>>> fact that these all result in *set_method_escape()*, and that
>>>>> *putfield*and
>>>>> *aastore* don't optimise *set_global_escape()* to
>>>>> *set_method_escape()*where possible, seem to point to the conclusion
>>>>> that *_is_arg_stack
>>>>> / set_method_escape()* actually encode only *!is_scalar_replaceable*. Is
>>>>> this the case? If so, why the confusing name?*
>>>>>
>>>>>
>>>>> Which leads to a much trickier but more interesting question, which is:
>>>>> what are the barriers to performing actual stack allocation of full
>>>>> objects, instead of scalar replacement? It is something I would be keen
>>>>> to
>>>>> investigate, but given my lack of familiarity with the codebase, it
>>>>> would
>>>>> be immensely helpful to hear what the major difficulties / showstoppers
>>>>> might be before trying to attack it.
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> Benedict
>>>>>
>>>>>
>>>>> *I do note that in escape.cpp *ArgEscape* is defined and is explicitly
>>>>> overloaded to include some of the characteristics of
>>>>> *is_scalar_replaceable*.
>>>>> However the *is_arg_stack()* method is commented with "The given
>>>>> argument
>>>>> escapes the callee, but does not become globally reachable." which seems
>>>>> to
>>>>> correspond to *ArgEscape* in the paper, but only *invoke()* seems to
>>>>> follow
>>>>> the spec, when invoking a method that cannot be analysed, and this would
>>>>> also be true for *!is_scalar_replaceable.*
>>>>>
>>>>>
>>>>>
>>>>