MaxBCEAEstimateSize and inlining clarification

Tue Sep 13 20:15:06 UTC 2016

If allocation is done locally in loop it could be SR (but not guaranteed):

for () {
   Foo f = new Foo();
}

"Currently" we can't SR it if there is merge:

Foo f = new Foo();
for () {
   f = new Foo();
}
x = f.x;

Also we can't SR an array if it has index access because we can't map loads/stores to concrete element:

int[] a = new int[3];
for (i) {
   x = a[i]
}

If elements are accessed without index (using array to pass or return several values) or a loop is fully unrolled we can SR it:

x0 = a[0];
x1 = a[1];
x2 = a[2];

Regards,
Vladimir

On 9/13/16 12:55 PM, Ruslan Cheremin wrote:
>>There was also another thread a few months back where I was asking why a small local array allocation wasn't scalarized, and the answer there was ordering between loop unrolling and EA passes (I can
> dig up that thread if you're interested).
>
> It would be very nice, please -- I've tried to google it by myself (because you've noted it already in the thread) but wasn't able to guess right keywords :)
>
>
> 2016-09-13 22:44 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>
>
>
>     On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>
>         >how it can be made stable to the point where you can rely/depend on it for performance.
>
>         Well, same can be said about any JIT optimization -- (may be it is time to rename dynamic runtime to stochastic runtime?). Personally I see SR to be the same order of stability as inlining.
>         Actually, apart from few SR-specific issues (like with merge points), EA/SR mostly follow inlining: if you have enough scope inlined you'll have, say, 80% chance of SR. From my perspective it
>         is inlining which is so surprisingly unstable.
>
>     Yeah, I'd agree.  The difference, in my mind, is failing to inline a function may not have as drastic performance implications as failing to eliminate temporaries.
>
>
>         BTW: have you considered to share you experience with EA/SR pitfalls? Even if "increase likelihood" is the best option available -- there are still very little information about it in the net.
>
>     I'm kind of doing that via the few emails on this list :).  I think you pretty much covered the biggest (apparent) flake in the equation - inlining, which can fail for all sorts of different
>     reasons.  Beyond that, there's the control flow insensitive aspect of the EA, which is tangentially related to inlining (or lack thereof).
>
>     There was also another thread a few months back where I was asking why a small local array allocation wasn't scalarized, and the answer there was ordering between loop unrolling and EA passes (I
>     can dig up that thread if you're interested).  The bizarre thing there was the loop operation was folded into a constant, and the compiled method was returning a constant value, but the array
>     allocation was left behind (although it wasn't needed).
>
>     I agree that there isn't much information about EA in Hotspot (there's a lot of handwaving and inaccuracies online).  In particular, it'd be nice if the performance wiki had a section on making
>     user code play well with EA (just like it has guidance on some other JIT aspects currently).
>
>
>         ----
>         Ruslan
>
>
>
>         2016-09-13 21:33 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>
>
>
>             On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>
>                 >That's my understanding as well (and matches what I'm seeing in some synthetic test harnesses).
>
>                 Ok, I just tried to clear it out, because it is not the first time I see BCEA... noted in context of scalar replacement, and I start to doubt my eyes :)
>
>                 >t's pretty brittle, sadly, and more importantly, unstable.
>
>                 Making similar experiments I see the same. E.g. HashMap.get(TupleKey) lookup can be successfully scalarized 99% cases, but scalarization become broken once with slightly changed key
>                 generation schema -- because hashcodes distribution becomes worse, and HashMap buckets start to convert themself to TreeBins, and TreeBins code is much harder task for EA.
>
>                 Another can of worms is mismatch between different inlining heuristics. E.g. FreqInlineSize and InlineSmallCode thresholds may give different decision for the same piece of code, and
>                 taken inlining decision depends on was method already compiled or not -- which depends on thinnest details of initialization order and execution profile. This scenarios becomes rare in
>                 1.8 with InlineSmallCode increased, but I'm not sure they are gone...
>
>                 Currently, I'm starting to think code needs to be specifically written for EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get it for free (or it will be unstable).
>
>             I'm not sure this is practical, to be honest, at least for a big enough application.  I've long considered EA (and scalar replacement) as a bonus optimization, and never to rely on it if
>             the allocations would hurt otherwise.  I'm just a bit surprised *just* how unstable it appears to be, in the "simplest" of cases.
>
>             I think code can be written to increase likelihood of scalar replacement, but I just can't see how it can be made stable to the point where you can rely/depend on it for performance.
>
>
>                 ----
>                 Ruslan
>
>
>                 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>
>
>
>                     On Tuesday, September 13, 2016, Cheremin Ruslan <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>
>                         > I'm seeing some code that iterates over a ConcurrentHashMap's entrySet that allocates tens of GB of CHM$MapEntry objects even though they don't escape
>
>
>                         I'm a bit confused: I was sure BCEA-style params do affect EA, but don't affect scalar replacement. With bcEscapeAnalyser you can get (sort of) inter-procedural EA, but this
>                         only allows you to have more allocations identified as ArgEscape instead of GlobalEscape. But you can't get more NoEscape without real inlining. ArgEscape (afaik) is used only
>                         for synchronization removals in HotSpot, not for scalar replacements.
>
>                         Am I incorrect?
>
>                     That's my understanding as well (and matches what I'm seeing in some synthetic test harnesses).
>
>                     I'm generally seeing a lot of variability in scalar replacement in particular, all driven by profile data.  HashMap<Integer, ...>::get(int) sometimes works at eliminating the box
>                     and sometimes doesn't - the difference appears to be whether Integer::equals is inlined or not, which in turn depends on whether the lookup finds something or not and whether the
>                     number of successful lookups reaches compilation threshold. It's pretty brittle, sadly, and more importantly, unstable.
>
>
>
>                         ----
>                         Ruslan
>
>
>
>                     --
>                     Sent from my phone
>
>
>
>
>
>