MaxBCEAEstimateSize and inlining clarification
Vitaly Davidovich
vitalyd at gmail.com
Wed Sep 14 16:18:19 UTC 2016
On Wed, Sep 14, 2016 at 12:13 PM, Vladimir Ivanov <
vladimir.x.ivanov at oracle.com> wrote:
> Do OSR compilations run EA? I'm looking at some code (roughly) like this:
>>
>> while (true) {
>> for (Entry<...> e : concurrentHashMap.entrySet()) {
>> // e does not escape
>> }
>> Thread.sleep(...);
>> }
>>
>> I see the enclosing method OSR compiled, but the iterator and entry
>> aren't eliminated. Makes me wonder if OSR doesn't do EA. Is that the
>> case?
>>
>
> EA is performed for OSR compilations, but keep in mind that the entry
> point for OSR compilation is the back branch in the loop.
>
> The whole JVM state is passed as the argument, so EA can only detect that
> something is local for the duration of a single loop iteration, not when
> something temporary is allocated for the whole loop.
>
> It means that the iterator object can't be eliminated in OSR compilation.
> Probably, it causes the element object to escape as well.
>
Darn! Ok, thanks Vladimir - that would explain what I'm seeing. So
basically need to find a way to avoid OSR compiles for cases like this.
>
> Best regards,
> Vladimir Ivanov
>
> On Tuesday, September 13, 2016, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>> If allocation is done locally in loop it could be SR (but not
>> guaranteed):
>>
>> for () {
>> Foo f = new Foo();
>> }
>>
>> "Currently" we can't SR it if there is merge:
>>
>> Foo f = new Foo();
>> for () {
>> f = new Foo();
>> }
>> x = f.x;
>>
>> Also we can't SR an array if it has index access because we can't
>> map loads/stores to concrete element:
>>
>> int[] a = new int[3];
>> for (i) {
>> x = a[i]
>> }
>>
>> If elements are accessed without index (using array to pass or
>> return several values) or a loop is fully unrolled we can SR it:
>>
>> x0 = a[0];
>> x1 = a[1];
>> x2 = a[2];
>>
>> Regards,
>> Vladimir
>>
>> On 9/13/16 12:55 PM, Ruslan Cheremin wrote:
>>
>> There was also another thread a few months back where I was
>> asking why a small local array allocation wasn't scalarized,
>> and the answer there was ordering between loop unrolling and
>> EA passes (I can
>>
>> dig up that thread if you're interested).
>>
>> It would be very nice, please -- I've tried to google it by
>> myself (because you've noted it already in the thread) but
>> wasn't able to guess right keywords :)
>>
>>
>> 2016-09-13 22:44 GMT+03:00 Vitaly Davidovich <vitalyd at gmail.com
>> <mailto:vitalyd at gmail.com>>:
>>
>>
>>
>> On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin
>> <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>>
>> >how it can be made stable to the point where you can
>> rely/depend on it for performance.
>>
>> Well, same can be said about any JIT optimization --
>> (may be it is time to rename dynamic runtime to stochastic
>> runtime?). Personally I see SR to be the same order of stability
>> as inlining.
>> Actually, apart from few SR-specific issues (like with
>> merge points), EA/SR mostly follow inlining: if you have enough
>> scope inlined you'll have, say, 80% chance of SR. From my
>> perspective it
>> is inlining which is so surprisingly unstable.
>>
>> Yeah, I'd agree. The difference, in my mind, is failing to
>> inline a function may not have as drastic performance
>> implications as failing to eliminate temporaries.
>>
>>
>> BTW: have you considered to share you experience with
>> EA/SR pitfalls? Even if "increase likelihood" is the best option
>> available -- there are still very little information about it in
>> the net.
>>
>> I'm kind of doing that via the few emails on this list :).
>> I think you pretty much covered the biggest (apparent) flake in
>> the equation - inlining, which can fail for all sorts of different
>> reasons. Beyond that, there's the control flow insensitive
>> aspect of the EA, which is tangentially related to inlining (or
>> lack thereof).
>>
>> There was also another thread a few months back where I was
>> asking why a small local array allocation wasn't scalarized, and
>> the answer there was ordering between loop unrolling and EA
>> passes (I
>> can dig up that thread if you're interested). The bizarre
>> thing there was the loop operation was folded into a constant,
>> and the compiled method was returning a constant value, but the
>> array
>> allocation was left behind (although it wasn't needed).
>>
>> I agree that there isn't much information about EA in
>> Hotspot (there's a lot of handwaving and inaccuracies online).
>> In particular, it'd be nice if the performance wiki had a
>> section on making
>> user code play well with EA (just like it has guidance on
>> some other JIT aspects currently).
>>
>>
>> ----
>> Ruslan
>>
>>
>>
>> 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich
>> <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>>
>>
>>
>> On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin
>> <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>>
>> >That's my understanding as well (and matches
>> what I'm seeing in some synthetic test harnesses).
>>
>> Ok, I just tried to clear it out, because it is
>> not the first time I see BCEA... noted in context of scalar
>> replacement, and I start to doubt my eyes :)
>>
>> >t's pretty brittle, sadly, and more
>> importantly, unstable.
>>
>> Making similar experiments I see the same. E.g.
>> HashMap.get(TupleKey) lookup can be successfully scalarized 99%
>> cases, but scalarization become broken once with slightly
>> changed key
>> generation schema -- because hashcodes
>> distribution becomes worse, and HashMap buckets start to convert
>> themself to TreeBins, and TreeBins code is much harder task for
>> EA.
>>
>> Another can of worms is mismatch between
>> different inlining heuristics. E.g. FreqInlineSize and
>> InlineSmallCode thresholds may give different decision for the
>> same piece of code, and
>> taken inlining decision depends on was method
>> already compiled or not -- which depends on thinnest details of
>> initialization order and execution profile. This scenarios
>> becomes rare in
>> 1.8 with InlineSmallCode increased, but I'm not
>> sure they are gone...
>>
>> Currently, I'm starting to think code needs to
>> be specifically written for EA/SR in mind to be more-or-less
>> stably scalarized. I.e. you can't get it for free (or it will be
>> unstable).
>>
>> I'm not sure this is practical, to be honest, at
>> least for a big enough application. I've long considered EA
>> (and scalar replacement) as a bonus optimization, and never to
>> rely on it if
>> the allocations would hurt otherwise. I'm just a
>> bit surprised *just* how unstable it appears to be, in the
>> "simplest" of cases.
>>
>> I think code can be written to increase likelihood
>> of scalar replacement, but I just can't see how it can be made
>> stable to the point where you can rely/depend on it for
>> performance.
>>
>>
>> ----
>> Ruslan
>>
>>
>> 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich
>> <vitalyd at gmail.com <mailto:vitalyd at gmail.com>>:
>>
>>
>>
>> On Tuesday, September 13, 2016, Cheremin
>> Ruslan <cheremin at gmail.com <mailto:cheremin at gmail.com>> wrote:
>>
>> > I'm seeing some code that iterates
>> over a ConcurrentHashMap's entrySet that allocates tens of GB of
>> CHM$MapEntry objects even though they don't escape
>>
>>
>> I'm a bit confused: I was sure
>> BCEA-style params do affect EA, but don't affect scalar
>> replacement. With bcEscapeAnalyser you can get (sort of)
>> inter-procedural EA, but this
>> only allows you to have more allocations
>> identified as ArgEscape instead of GlobalEscape. But you can't
>> get more NoEscape without real inlining. ArgEscape (afaik) is
>> used only
>> for synchronization removals in HotSpot,
>> not for scalar replacements.
>>
>> Am I incorrect?
>>
>> That's my understanding as well (and matches
>> what I'm seeing in some synthetic test harnesses).
>>
>> I'm generally seeing a lot of variability in
>> scalar replacement in particular, all driven by profile data.
>> HashMap<Integer, ...>::get(int) sometimes works at eliminating
>> the box
>> and sometimes doesn't - the difference
>> appears to be whether Integer::equals is inlined or not, which
>> in turn depends on whether the lookup finds something or not and
>> whether the
>> number of successful lookups reaches
>> compilation threshold. It's pretty brittle, sadly, and more
>> importantly, unstable.
>>
>>
>>
>> ----
>> Ruslan
>>
>>
>>
>> --
>> Sent from my phone
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Sent from my phone
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160914/a6ef496d/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list