MaxBCEAEstimateSize and inlining clarification
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue Sep 13 20:15:06 UTC 2016
If allocation is done locally in loop it could be SR (but not guaranteed):
for () {
Foo f = new Foo();
}
"Currently" we can't SR it if there is merge:
Foo f = new Foo();
for () {
f = new Foo();
}
x = f.x;
Also we can't SR an array if it has index access because we can't map loads/stores to concrete element:
int[] a = new int[3];
for (i) {
x = a[i]
}
If elements are accessed without index (using array to pass or return several values) or a loop is fully unrolled we can SR it:
x0 = a[0];
x1 = a[1];
x2 = a[2];
Regards,
Vladimir
On 9/13/16 12:55 PM, Ruslan Cheremin wrote:
>>There was also another thread a few months back where I was asking why a small local array allocation wasn't scalarized, and the answer there was ordering between loop unrolling and EA passes (I can
> dig up that thread if you're interested).
>
> It would be very nice, please -- I've tried to google it by myself (because you've noted it already in the thread) but wasn't able to guess right keywords :)
>
>
> 2016-09-13 22:44 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>
>
>
> On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>
> >how it can be made stable to the point where you can rely/depend on it for performance.
>
> Well, same can be said about any JIT optimization -- (may be it is time to rename dynamic runtime to stochastic runtime?). Personally I see SR to be the same order of stability as inlining.
> Actually, apart from few SR-specific issues (like with merge points), EA/SR mostly follow inlining: if you have enough scope inlined you'll have, say, 80% chance of SR. From my perspective it
> is inlining which is so surprisingly unstable.
>
> Yeah, I'd agree. The difference, in my mind, is failing to inline a function may not have as drastic performance implications as failing to eliminate temporaries.
>
>
> BTW: have you considered to share you experience with EA/SR pitfalls? Even if "increase likelihood" is the best option available -- there are still very little information about it in the net.
>
> I'm kind of doing that via the few emails on this list :). I think you pretty much covered the biggest (apparent) flake in the equation - inlining, which can fail for all sorts of different
> reasons. Beyond that, there's the control flow insensitive aspect of the EA, which is tangentially related to inlining (or lack thereof).
>
> There was also another thread a few months back where I was asking why a small local array allocation wasn't scalarized, and the answer there was ordering between loop unrolling and EA passes (I
> can dig up that thread if you're interested). The bizarre thing there was the loop operation was folded into a constant, and the compiled method was returning a constant value, but the array
> allocation was left behind (although it wasn't needed).
>
> I agree that there isn't much information about EA in Hotspot (there's a lot of handwaving and inaccuracies online). In particular, it'd be nice if the performance wiki had a section on making
> user code play well with EA (just like it has guidance on some other JIT aspects currently).
>
>
> ----
> Ruslan
>
>
>
> 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>
>
>
> On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>
> >That's my understanding as well (and matches what I'm seeing in some synthetic test harnesses).
>
> Ok, I just tried to clear it out, because it is not the first time I see BCEA... noted in context of scalar replacement, and I start to doubt my eyes :)
>
> >t's pretty brittle, sadly, and more importantly, unstable.
>
> Making similar experiments I see the same. E.g. HashMap.get(TupleKey) lookup can be successfully scalarized 99% cases, but scalarization become broken once with slightly changed key
> generation schema -- because hashcodes distribution becomes worse, and HashMap buckets start to convert themself to TreeBins, and TreeBins code is much harder task for EA.
>
> Another can of worms is mismatch between different inlining heuristics. E.g. FreqInlineSize and InlineSmallCode thresholds may give different decision for the same piece of code, and
> taken inlining decision depends on was method already compiled or not -- which depends on thinnest details of initialization order and execution profile. This scenarios becomes rare in
> 1.8 with InlineSmallCode increased, but I'm not sure they are gone...
>
> Currently, I'm starting to think code needs to be specifically written for EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get it for free (or it will be unstable).
>
> I'm not sure this is practical, to be honest, at least for a big enough application. I've long considered EA (and scalar replacement) as a bonus optimization, and never to rely on it if
> the allocations would hurt otherwise. I'm just a bit surprised *just* how unstable it appears to be, in the "simplest" of cases.
>
> I think code can be written to increase likelihood of scalar replacement, but I just can't see how it can be made stable to the point where you can rely/depend on it for performance.
>
>
> ----
> Ruslan
>
>
> 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>
>
>
> On Tuesday, September 13, 2016, Cheremin Ruslan <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>
> > I'm seeing some code that iterates over a ConcurrentHashMap's entrySet that allocates tens of GB of CHM$MapEntry objects even though they don't escape
>
>
> I'm a bit confused: I was sure BCEA-style params do affect EA, but don't affect scalar replacement. With bcEscapeAnalyser you can get (sort of) inter-procedural EA, but this
> only allows you to have more allocations identified as ArgEscape instead of GlobalEscape. But you can't get more NoEscape without real inlining. ArgEscape (afaik) is used only
> for synchronization removals in HotSpot, not for scalar replacements.
>
> Am I incorrect?
>
> That's my understanding as well (and matches what I'm seeing in some synthetic test harnesses).
>
> I'm generally seeing a lot of variability in scalar replacement in particular, all driven by profile data. HashMap<Integer, ...>::get(int) sometimes works at eliminating the box
> and sometimes doesn't - the difference appears to be whether Integer::equals is inlined or not, which in turn depends on whether the lookup finds something or not and whether the
> number of successful lookups reaches compilation threshold. It's pretty brittle, sadly, and more importantly, unstable.
>
>
>
> ----
> Ruslan
>
>
>
> --
> Sent from my phone
>
>
>
>
>
>
More information about the hotspot-compiler-dev
mailing list