MaxBCEAEstimateSize and inlining clarification

Tue Sep 13 20:29:13 UTC 2016

On Tue, Sep 13, 2016 at 3:55 PM, Ruslan Cheremin <cheremin at gmail.com> wrote:

> >There was also another thread a few months back where I was asking why a
> small local array allocation wasn't scalarized, and the answer there was
> ordering between loop unrolling and EA passes (I can dig up that thread if
> you're interested).
>
> It would be very nice, please -- I've tried to google it by myself
> (because you've noted it already in the thread) but wasn't able to guess
> right keywords :)
>
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020546.html

>
>
> 2016-09-13 22:44 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com>:
>
>>
>>
>> On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin <cheremin at gmail.com>
>> wrote:
>>
>>> >how it can be made stable to the point where you can rely/depend on it
>>> for performance.
>>>
>>> Well, same can be said about any JIT optimization -- (may be it is time
>>> to rename dynamic runtime to stochastic runtime?). Personally I see SR to
>>> be the same order of stability as inlining. Actually, apart from few
>>> SR-specific issues (like with merge points), EA/SR mostly follow inlining:
>>> if you have enough scope inlined you'll have, say, 80% chance of SR.
>>> From my perspective it is inlining which is so surprisingly unstable.
>>>
>> Yeah, I'd agree.  The difference, in my mind, is failing to inline a
>> function may not have as drastic performance implications as failing to
>> eliminate temporaries.
>>
>>>
>>> BTW: have you considered to share you experience with EA/SR pitfalls?
>>> Even if "increase likelihood" is the best option available -- there are
>>> still very little information about it in the net.
>>>
>> I'm kind of doing that via the few emails on this list :).  I think you
>> pretty much covered the biggest (apparent) flake in the equation -
>> inlining, which can fail for all sorts of different reasons.  Beyond that,
>> there's the control flow insensitive aspect of the EA, which is
>> tangentially related to inlining (or lack thereof).
>>
>> There was also another thread a few months back where I was asking why a
>> small local array allocation wasn't scalarized, and the answer there was
>> ordering between loop unrolling and EA passes (I can dig up that thread if
>> you're interested).  The bizarre thing there was the loop operation was
>> folded into a constant, and the compiled method was returning a constant
>> value, but the array allocation was left behind (although it wasn't needed).
>>
>> I agree that there isn't much information about EA in Hotspot (there's a
>> lot of handwaving and inaccuracies online).  In particular, it'd be nice if
>> the performance wiki had a section on making user code play well with EA
>> (just like it has guidance on some other JIT aspects currently).
>>
>>>
>>> ----
>>> Ruslan
>>>
>>>
>>>
>>> 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com>:
>>>
>>>>
>>>>
>>>> On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin <cheremin at gmail.com>
>>>> wrote:
>>>>
>>>>> >That's my understanding as well (and matches what I'm seeing in some
>>>>> synthetic test harnesses).
>>>>>
>>>>> Ok, I just tried to clear it out, because it is not the first time I
>>>>> see BCEA... noted in context of scalar replacement, and I start to doubt my
>>>>> eyes :)
>>>>>
>>>>> >t's pretty brittle, sadly, and more importantly, unstable.
>>>>>
>>>>> Making similar experiments I see the same. E.g. HashMap.get(TupleKey)
>>>>> lookup can be successfully scalarized 99% cases, but scalarization become
>>>>> broken once with slightly changed key generation schema -- because
>>>>> hashcodes distribution becomes worse, and HashMap buckets start to convert
>>>>> themself to TreeBins, and TreeBins code is much harder task for EA.
>>>>>
>>>>> Another can of worms is mismatch between different inlining
>>>>> heuristics. E.g. FreqInlineSize and InlineSmallCode thresholds may give
>>>>> different decision for the same piece of code, and taken inlining decision
>>>>> depends on was method already compiled or not -- which depends on thinnest
>>>>> details of initialization order and execution profile. This scenarios
>>>>> becomes rare in 1.8 with InlineSmallCode increased, but I'm not sure they
>>>>> are gone...
>>>>>
>>>>> Currently, I'm starting to think code needs to be specifically written
>>>>> for EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get
>>>>> it for free (or it will be unstable).
>>>>>
>>>> I'm not sure this is practical, to be honest, at least for a big enough
>>>> application.  I've long considered EA (and scalar replacement) as a bonus
>>>> optimization, and never to rely on it if the allocations would hurt
>>>> otherwise.  I'm just a bit surprised *just* how unstable it appears to be,
>>>> in the "simplest" of cases.
>>>>
>>>> I think code can be written to increase likelihood of scalar
>>>> replacement, but I just can't see how it can be made stable to the point
>>>> where you can rely/depend on it for performance.
>>>>
>>>>>
>>>>> ----
>>>>> Ruslan
>>>>>
>>>>>
>>>>> 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com>:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tuesday, September 13, 2016, Cheremin Ruslan <cheremin at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> > I'm seeing some code that iterates over a ConcurrentHashMap's
>>>>>>> entrySet that allocates tens of GB of CHM$MapEntry objects even though they
>>>>>>> don't escape
>>>>>>>
>>>>>>>
>>>>>>> I'm a bit confused: I was sure BCEA-style params do affect EA, but
>>>>>>> don't affect scalar replacement. With bcEscapeAnalyser you can get (sort
>>>>>>> of) inter-procedural EA, but this only allows you to have more allocations
>>>>>>> identified as ArgEscape instead of GlobalEscape. But you can't get more
>>>>>>> NoEscape without real inlining. ArgEscape (afaik) is used only for
>>>>>>> synchronization removals in HotSpot, not for scalar replacements.
>>>>>>>
>>>>>>> Am I incorrect?
>>>>>>
>>>>>> That's my understanding as well (and matches what I'm seeing in some
>>>>>> synthetic test harnesses).
>>>>>>
>>>>>> I'm generally seeing a lot of variability in scalar replacement in
>>>>>> particular, all driven by profile data.  HashMap<Integer, ...>::get(int)
>>>>>> sometimes works at eliminating the box and sometimes doesn't - the
>>>>>> difference appears to be whether Integer::equals is inlined or not, which
>>>>>> in turn depends on whether the lookup finds something or not and whether
>>>>>> the number of successful lookups reaches compilation threshold. It's pretty
>>>>>> brittle, sadly, and more importantly, unstable.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> ----
>>>>>>> Ruslan
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from my phone
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160913/c140ce7e/attachment-0001.html>