MaxBCEAEstimateSize and inlining clarification

Wed Sep 14 16:18:19 UTC 2016

On Wed, Sep 14, 2016 at 12:13 PM, Vladimir Ivanov <
vladimir.x.ivanov at oracle.com> wrote:

> Do OSR compilations run EA? I'm looking at some code (roughly) like this:
>>
>> while (true) {
>>     for (Entry<...> e : concurrentHashMap.entrySet()) {
>>          // e does not escape
>>      }
>>      Thread.sleep(...);
>> }
>>
>> I see the enclosing method OSR compiled, but the iterator and entry
>> aren't eliminated.  Makes me wonder if OSR doesn't do EA.  Is that the
>> case?
>>
>
> EA is performed for OSR compilations, but keep in mind that the entry
> point for OSR compilation is the back branch in the loop.
>
> The whole JVM state is passed as the argument, so EA can only detect that
> something is local for the duration of a single loop iteration, not when
> something temporary is allocated for the whole loop.
>
> It means that the iterator object can't be eliminated in OSR compilation.
> Probably, it causes the element object to escape as well.
>
Darn! Ok, thanks Vladimir - that would explain what I'm seeing.  So
basically need to find a way to avoid OSR compiles for cases like this.

>
> Best regards,
> Vladimir Ivanov
>
> On Tuesday, September 13, 2016, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>>     If allocation is done locally in loop it could be SR (but not
>>     guaranteed):
>>
>>     for () {
>>       Foo f = new Foo();
>>     }
>>
>>     "Currently" we can't SR it if there is merge:
>>
>>     Foo f = new Foo();
>>     for () {
>>       f = new Foo();
>>     }
>>     x = f.x;
>>
>>     Also we can't SR an array if it has index access because we can't
>>     map loads/stores to concrete element:
>>
>>     int[] a = new int[3];
>>     for (i) {
>>       x = a[i]
>>     }
>>
>>     If elements are accessed without index (using array to pass or
>>     return several values) or a loop is fully unrolled we can SR it:
>>
>>     x0 = a[0];
>>     x1 = a[1];
>>     x2 = a[2];
>>
>>     Regards,
>>     Vladimir
>>
>>     On 9/13/16 12:55 PM, Ruslan Cheremin wrote:
>>
>>             There was also another thread a few months back where I was
>>             asking why a small local array allocation wasn't scalarized,
>>             and the answer there was ordering between loop unrolling and
>>             EA passes (I can
>>
>>         dig up that thread if you're interested).
>>
>>         It would be very nice, please -- I've tried to google it by
>>         myself (because you've noted it already in the thread) but
>>         wasn't able to guess right keywords :)
>>
>>
>>         2016-09-13 22:44 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com
>>         <mailto:vitalyd at gmail.com>>:
>>
>>
>>
>>             On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin
>>         <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>>
>>                 >how it can be made stable to the point where you can
>>         rely/depend on it for performance.
>>
>>                 Well, same can be said about any JIT optimization --
>>         (may be it is time to rename dynamic runtime to stochastic
>>         runtime?). Personally I see SR to be the same order of stability
>>         as inlining.
>>                 Actually, apart from few SR-specific issues (like with
>>         merge points), EA/SR mostly follow inlining: if you have enough
>>         scope inlined you'll have, say, 80% chance of SR. From my
>>         perspective it
>>                 is inlining which is so surprisingly unstable.
>>
>>             Yeah, I'd agree.  The difference, in my mind, is failing to
>>         inline a function may not have as drastic performance
>>         implications as failing to eliminate temporaries.
>>
>>
>>                 BTW: have you considered to share you experience with
>>         EA/SR pitfalls? Even if "increase likelihood" is the best option
>>         available -- there are still very little information about it in
>>         the net.
>>
>>             I'm kind of doing that via the few emails on this list :).
>>         I think you pretty much covered the biggest (apparent) flake in
>>         the equation - inlining, which can fail for all sorts of different
>>             reasons.  Beyond that, there's the control flow insensitive
>>         aspect of the EA, which is tangentially related to inlining (or
>>         lack thereof).
>>
>>             There was also another thread a few months back where I was
>>         asking why a small local array allocation wasn't scalarized, and
>>         the answer there was ordering between loop unrolling and EA
>>         passes (I
>>             can dig up that thread if you're interested).  The bizarre
>>         thing there was the loop operation was folded into a constant,
>>         and the compiled method was returning a constant value, but the
>>         array
>>             allocation was left behind (although it wasn't needed).
>>
>>             I agree that there isn't much information about EA in
>>         Hotspot (there's a lot of handwaving and inaccuracies online).
>>         In particular, it'd be nice if the performance wiki had a
>>         section on making
>>             user code play well with EA (just like it has guidance on
>>         some other JIT aspects currently).
>>
>>
>>                 ----
>>                 Ruslan
>>
>>
>>
>>                 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich
>>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>>
>>
>>
>>                     On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin
>>         <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>>
>>                         >That's my understanding as well (and matches
>>         what I'm seeing in some synthetic test harnesses).
>>
>>                         Ok, I just tried to clear it out, because it is
>>         not the first time I see BCEA... noted in context of scalar
>>         replacement, and I start to doubt my eyes :)
>>
>>                         >t's pretty brittle, sadly, and more
>>         importantly, unstable.
>>
>>                         Making similar experiments I see the same. E.g.
>>         HashMap.get(TupleKey) lookup can be successfully scalarized 99%
>>         cases, but scalarization become broken once with slightly
>>         changed key
>>                         generation schema -- because hashcodes
>>         distribution becomes worse, and HashMap buckets start to convert
>>         themself to TreeBins, and TreeBins code is much harder task for
>> EA.
>>
>>                         Another can of worms is mismatch between
>>         different inlining heuristics. E.g. FreqInlineSize and
>>         InlineSmallCode thresholds may give different decision for the
>>         same piece of code, and
>>                         taken inlining decision depends on was method
>>         already compiled or not -- which depends on thinnest details of
>>         initialization order and execution profile. This scenarios
>>         becomes rare in
>>                         1.8 with InlineSmallCode increased, but I'm not
>>         sure they are gone...
>>
>>                         Currently, I'm starting to think code needs to
>>         be specifically written for EA/SR in mind to be more-or-less
>>         stably scalarized. I.e. you can't get it for free (or it will be
>>         unstable).
>>
>>                     I'm not sure this is practical, to be honest, at
>>         least for a big enough application.  I've long considered EA
>>         (and scalar replacement) as a bonus optimization, and never to
>>         rely on it if
>>                     the allocations would hurt otherwise.  I'm just a
>>         bit surprised *just* how unstable it appears to be, in the
>>         "simplest" of cases.
>>
>>                     I think code can be written to increase likelihood
>>         of scalar replacement, but I just can't see how it can be made
>>         stable to the point where you can rely/depend on it for
>> performance.
>>
>>
>>                         ----
>>                         Ruslan
>>
>>
>>                         2016-09-13 20:51 GMT+03:00 Vitaly Davidovich
>>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>>
>>
>>
>>                             On Tuesday, September 13, 2016, Cheremin
>>         Ruslan <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>>
>>                                 > I'm seeing some code that iterates
>>         over a ConcurrentHashMap's entrySet that allocates tens of GB of
>>         CHM$MapEntry objects even though they don't escape
>>
>>
>>                                 I'm a bit confused: I was sure
>>         BCEA-style params do affect EA, but don't affect scalar
>>         replacement. With bcEscapeAnalyser you can get (sort of)
>>         inter-procedural EA, but this
>>                                 only allows you to have more allocations
>>         identified as ArgEscape instead of GlobalEscape. But you can't
>>         get more NoEscape without real inlining. ArgEscape (afaik) is
>>         used only
>>                                 for synchronization removals in HotSpot,
>>         not for scalar replacements.
>>
>>                                 Am I incorrect?
>>
>>                             That's my understanding as well (and matches
>>         what I'm seeing in some synthetic test harnesses).
>>
>>                             I'm generally seeing a lot of variability in
>>         scalar replacement in particular, all driven by profile data.
>>         HashMap<Integer, ...>::get(int) sometimes works at eliminating
>>         the box
>>                             and sometimes doesn't - the difference
>>         appears to be whether Integer::equals is inlined or not, which
>>         in turn depends on whether the lookup finds something or not and
>>         whether the
>>                             number of successful lookups reaches
>>         compilation threshold. It's pretty brittle, sadly, and more
>>         importantly, unstable.
>>
>>
>>
>>                                 ----
>>                                 Ruslan
>>
>>
>>
>>                             --
>>                             Sent from my phone
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Sent from my phone
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160914/a6ef496d/attachment-0001.html>