MaxBCEAEstimateSize and inlining clarification

Wed Sep 14 16:13:50 UTC 2016

> Do OSR compilations run EA? I'm looking at some code (roughly) like this:
>
> while (true) {
>     for (Entry<...> e : concurrentHashMap.entrySet()) {
>          // e does not escape
>      }
>      Thread.sleep(...);
> }
>
> I see the enclosing method OSR compiled, but the iterator and entry
> aren't eliminated.  Makes me wonder if OSR doesn't do EA.  Is that the case?

EA is performed for OSR compilations, but keep in mind that the entry 
point for OSR compilation is the back branch in the loop.

The whole JVM state is passed as the argument, so EA can only detect 
that something is local for the duration of a single loop iteration, not 
when something temporary is allocated for the whole loop.

It means that the iterator object can't be eliminated in OSR 
compilation. Probably, it causes the element object to escape as well.

Best regards,
Vladimir Ivanov

> On Tuesday, September 13, 2016, Vladimir Kozlov
> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>
>     If allocation is done locally in loop it could be SR (but not
>     guaranteed):
>
>     for () {
>       Foo f = new Foo();
>     }
>
>     "Currently" we can't SR it if there is merge:
>
>     Foo f = new Foo();
>     for () {
>       f = new Foo();
>     }
>     x = f.x;
>
>     Also we can't SR an array if it has index access because we can't
>     map loads/stores to concrete element:
>
>     int[] a = new int[3];
>     for (i) {
>       x = a[i]
>     }
>
>     If elements are accessed without index (using array to pass or
>     return several values) or a loop is fully unrolled we can SR it:
>
>     x0 = a[0];
>     x1 = a[1];
>     x2 = a[2];
>
>     Regards,
>     Vladimir
>
>     On 9/13/16 12:55 PM, Ruslan Cheremin wrote:
>
>             There was also another thread a few months back where I was
>             asking why a small local array allocation wasn't scalarized,
>             and the answer there was ordering between loop unrolling and
>             EA passes (I can
>
>         dig up that thread if you're interested).
>
>         It would be very nice, please -- I've tried to google it by
>         myself (because you've noted it already in the thread) but
>         wasn't able to guess right keywords :)
>
>
>         2016-09-13 22:44 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com
>         <mailto:vitalyd at gmail.com>>:
>
>
>
>             On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin
>         <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>
>                 >how it can be made stable to the point where you can
>         rely/depend on it for performance.
>
>                 Well, same can be said about any JIT optimization --
>         (may be it is time to rename dynamic runtime to stochastic
>         runtime?). Personally I see SR to be the same order of stability
>         as inlining.
>                 Actually, apart from few SR-specific issues (like with
>         merge points), EA/SR mostly follow inlining: if you have enough
>         scope inlined you'll have, say, 80% chance of SR. From my
>         perspective it
>                 is inlining which is so surprisingly unstable.
>
>             Yeah, I'd agree.  The difference, in my mind, is failing to
>         inline a function may not have as drastic performance
>         implications as failing to eliminate temporaries.
>
>
>                 BTW: have you considered to share you experience with
>         EA/SR pitfalls? Even if "increase likelihood" is the best option
>         available -- there are still very little information about it in
>         the net.
>
>             I'm kind of doing that via the few emails on this list :).
>         I think you pretty much covered the biggest (apparent) flake in
>         the equation - inlining, which can fail for all sorts of different
>             reasons.  Beyond that, there's the control flow insensitive
>         aspect of the EA, which is tangentially related to inlining (or
>         lack thereof).
>
>             There was also another thread a few months back where I was
>         asking why a small local array allocation wasn't scalarized, and
>         the answer there was ordering between loop unrolling and EA
>         passes (I
>             can dig up that thread if you're interested).  The bizarre
>         thing there was the loop operation was folded into a constant,
>         and the compiled method was returning a constant value, but the
>         array
>             allocation was left behind (although it wasn't needed).
>
>             I agree that there isn't much information about EA in
>         Hotspot (there's a lot of handwaving and inaccuracies online).
>         In particular, it'd be nice if the performance wiki had a
>         section on making
>             user code play well with EA (just like it has guidance on
>         some other JIT aspects currently).
>
>
>                 ----
>                 Ruslan
>
>
>
>                 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich
>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>
>
>
>                     On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin
>         <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>
>                         >That's my understanding as well (and matches
>         what I'm seeing in some synthetic test harnesses).
>
>                         Ok, I just tried to clear it out, because it is
>         not the first time I see BCEA... noted in context of scalar
>         replacement, and I start to doubt my eyes :)
>
>                         >t's pretty brittle, sadly, and more
>         importantly, unstable.
>
>                         Making similar experiments I see the same. E.g.
>         HashMap.get(TupleKey) lookup can be successfully scalarized 99%
>         cases, but scalarization become broken once with slightly
>         changed key
>                         generation schema -- because hashcodes
>         distribution becomes worse, and HashMap buckets start to convert
>         themself to TreeBins, and TreeBins code is much harder task for EA.
>
>                         Another can of worms is mismatch between
>         different inlining heuristics. E.g. FreqInlineSize and
>         InlineSmallCode thresholds may give different decision for the
>         same piece of code, and
>                         taken inlining decision depends on was method
>         already compiled or not -- which depends on thinnest details of
>         initialization order and execution profile. This scenarios
>         becomes rare in
>                         1.8 with InlineSmallCode increased, but I'm not
>         sure they are gone...
>
>                         Currently, I'm starting to think code needs to
>         be specifically written for EA/SR in mind to be more-or-less
>         stably scalarized. I.e. you can't get it for free (or it will be
>         unstable).
>
>                     I'm not sure this is practical, to be honest, at
>         least for a big enough application.  I've long considered EA
>         (and scalar replacement) as a bonus optimization, and never to
>         rely on it if
>                     the allocations would hurt otherwise.  I'm just a
>         bit surprised *just* how unstable it appears to be, in the
>         "simplest" of cases.
>
>                     I think code can be written to increase likelihood
>         of scalar replacement, but I just can't see how it can be made
>         stable to the point where you can rely/depend on it for performance.
>
>
>                         ----
>                         Ruslan
>
>
>                         2016-09-13 20:51 GMT+03:00 Vitaly Davidovich
>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>
>
>
>                             On Tuesday, September 13, 2016, Cheremin
>         Ruslan <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>
>                                 > I'm seeing some code that iterates
>         over a ConcurrentHashMap's entrySet that allocates tens of GB of
>         CHM$MapEntry objects even though they don't escape
>
>
>                                 I'm a bit confused: I was sure
>         BCEA-style params do affect EA, but don't affect scalar
>         replacement. With bcEscapeAnalyser you can get (sort of)
>         inter-procedural EA, but this
>                                 only allows you to have more allocations
>         identified as ArgEscape instead of GlobalEscape. But you can't
>         get more NoEscape without real inlining. ArgEscape (afaik) is
>         used only
>                                 for synchronization removals in HotSpot,
>         not for scalar replacements.
>
>                                 Am I incorrect?
>
>                             That's my understanding as well (and matches
>         what I'm seeing in some synthetic test harnesses).
>
>                             I'm generally seeing a lot of variability in
>         scalar replacement in particular, all driven by profile data.
>         HashMap<Integer, ...>::get(int) sometimes works at eliminating
>         the box
>                             and sometimes doesn't - the difference
>         appears to be whether Integer::equals is inlined or not, which
>         in turn depends on whether the lookup finds something or not and
>         whether the
>                             number of successful lookups reaches
>         compilation threshold. It's pretty brittle, sadly, and more
>         importantly, unstable.
>
>
>
>                                 ----
>                                 Ruslan
>
>
>
>                             --
>                             Sent from my phone
>
>
>
>
>
>
>
>
> --
> Sent from my phone