MaxBCEAEstimateSize and inlining clarification

Tue Sep 13 19:44:05 UTC 2016

On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin <cheremin at gmail.com> wrote:

> >how it can be made stable to the point where you can rely/depend on it
> for performance.
>
> Well, same can be said about any JIT optimization -- (may be it is time to
> rename dynamic runtime to stochastic runtime?). Personally I see SR to be
> the same order of stability as inlining. Actually, apart from few
> SR-specific issues (like with merge points), EA/SR mostly follow inlining:
> if you have enough scope inlined you'll have, say, 80% chance of SR. From
> my perspective it is inlining which is so surprisingly unstable.
>
Yeah, I'd agree.  The difference, in my mind, is failing to inline a
function may not have as drastic performance implications as failing to
eliminate temporaries.

>
> BTW: have you considered to share you experience with EA/SR pitfalls? Even
> if "increase likelihood" is the best option available -- there are still
> very little information about it in the net.
>
I'm kind of doing that via the few emails on this list :).  I think you
pretty much covered the biggest (apparent) flake in the equation -
inlining, which can fail for all sorts of different reasons.  Beyond that,
there's the control flow insensitive aspect of the EA, which is
tangentially related to inlining (or lack thereof).

There was also another thread a few months back where I was asking why a
small local array allocation wasn't scalarized, and the answer there was
ordering between loop unrolling and EA passes (I can dig up that thread if
you're interested).  The bizarre thing there was the loop operation was
folded into a constant, and the compiled method was returning a constant
value, but the array allocation was left behind (although it wasn't needed).

I agree that there isn't much information about EA in Hotspot (there's a
lot of handwaving and inaccuracies online).  In particular, it'd be nice if
the performance wiki had a section on making user code play well with EA
(just like it has guidance on some other JIT aspects currently).

>
> ----
> Ruslan
>
>
>
> 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com>:
>
>>
>>
>> On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin <cheremin at gmail.com>
>> wrote:
>>
>>> >That's my understanding as well (and matches what I'm seeing in some
>>> synthetic test harnesses).
>>>
>>> Ok, I just tried to clear it out, because it is not the first time I see
>>> BCEA... noted in context of scalar replacement, and I start to doubt my
>>> eyes :)
>>>
>>> >t's pretty brittle, sadly, and more importantly, unstable.
>>>
>>> Making similar experiments I see the same. E.g. HashMap.get(TupleKey)
>>> lookup can be successfully scalarized 99% cases, but scalarization become
>>> broken once with slightly changed key generation schema -- because
>>> hashcodes distribution becomes worse, and HashMap buckets start to convert
>>> themself to TreeBins, and TreeBins code is much harder task for EA.
>>>
>>> Another can of worms is mismatch between different inlining heuristics.
>>> E.g. FreqInlineSize and InlineSmallCode thresholds may give different
>>> decision for the same piece of code, and taken inlining decision depends on
>>> was method already compiled or not -- which depends on thinnest details of
>>> initialization order and execution profile. This scenarios becomes rare in
>>> 1.8 with InlineSmallCode increased, but I'm not sure they are gone...
>>>
>>> Currently, I'm starting to think code needs to be specifically written
>>> for EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get
>>> it for free (or it will be unstable).
>>>
>> I'm not sure this is practical, to be honest, at least for a big enough
>> application.  I've long considered EA (and scalar replacement) as a bonus
>> optimization, and never to rely on it if the allocations would hurt
>> otherwise.  I'm just a bit surprised *just* how unstable it appears to be,
>> in the "simplest" of cases.
>>
>> I think code can be written to increase likelihood of scalar replacement,
>> but I just can't see how it can be made stable to the point where you can
>> rely/depend on it for performance.
>>
>>>
>>> ----
>>> Ruslan
>>>
>>>
>>> 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com>:
>>>
>>>>
>>>>
>>>> On Tuesday, September 13, 2016, Cheremin Ruslan <cheremin at gmail.com>
>>>> wrote:
>>>>
>>>>> > I'm seeing some code that iterates over a ConcurrentHashMap's
>>>>> entrySet that allocates tens of GB of CHM$MapEntry objects even though they
>>>>> don't escape
>>>>>
>>>>>
>>>>> I'm a bit confused: I was sure BCEA-style params do affect EA, but
>>>>> don't affect scalar replacement. With bcEscapeAnalyser you can get (sort
>>>>> of) inter-procedural EA, but this only allows you to have more allocations
>>>>> identified as ArgEscape instead of GlobalEscape. But you can't get more
>>>>> NoEscape without real inlining. ArgEscape (afaik) is used only for
>>>>> synchronization removals in HotSpot, not for scalar replacements.
>>>>>
>>>>> Am I incorrect?
>>>>
>>>> That's my understanding as well (and matches what I'm seeing in some
>>>> synthetic test harnesses).
>>>>
>>>> I'm generally seeing a lot of variability in scalar replacement in
>>>> particular, all driven by profile data.  HashMap<Integer, ...>::get(int)
>>>> sometimes works at eliminating the box and sometimes doesn't - the
>>>> difference appears to be whether Integer::equals is inlined or not, which
>>>> in turn depends on whether the lookup finds something or not and whether
>>>> the number of successful lookups reaches compilation threshold. It's pretty
>>>> brittle, sadly, and more importantly, unstable.
>>>>
>>>>
>>>>
>>>>> ----
>>>>> Ruslan
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from my phone
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160913/ccf4f99d/attachment.html>