[External] : The cost of nmethod entry barriers [was: RFR: 8269476: Skip nmethod entry barrier if there is no oops in the jit code [v4]]

Thu Jul 1 20:56:31 UTC 2021

Nice work; thanks.

So for javac (which BTW has a lot of polymorphism in it)
the dynamic proportion of retired nmethod entry
instructions is 1.38%.  Unless those instructions are
unusually slow ones, that percentage is an upper limit
to their wall clock contribution.  If we can get rid of
2 out of 5 of them, the estimate would be 0.83%, with
a savings of 0.55% (upper bound).

That is indeed a tempting target.   Just as performance
can degrade by the “death of a thousand cuts” as you
describe, it improves, often, by doggedly adding half
percent improvements over and over.

And (as we all know) if you need a “success of a thousand
tweaks”, you have to pick and choose which tweaks you
attempt.  One tweak that promises good effects but has
high costs in complexity and maintainability (like, perhaps,
the one we are talking about here), can have the economic
effect (a la Bastiat “that which is not seen”) of quashing
five other tweaks that might require an equal amount
of maintenance but provide better combined benefit.
Also, an overly complex tweak can have the effect of
muddying the code so that later tweaks (like Erik’s
Swiss army knife tactics) become more costly
themselves or even impossible.

BTW, that 1.38% is a *lower* limit for nmethod overhead
proper, because it doesn’t count the effects of lost
inlining, which are surely at least as large.  The combined
effects of lost inlining (whatever they are) are IMO the root
reason why we keep piling on more and more inlining tactics,
trying to make nmethods large and calls infrequent.

HTH!

> On Jul 1, 2021, at 6:49 AM, Andrew Haley <aph at redhat.com> wrote:
> 
> On 7/1/21 11:00 AM, Andrew Haley wrote:
> 
>> I am going to take a little to to quantify, hopefully in percentage
>> terms, the cost of nmethod barriers, as much for my own education as
>> anything else. I'll get back to y'all.
> 
> I have numbers.
> 
> Running javac (@java.base) on AArch64 we execute on average 657152838
> nmethod barriers. That number varies by as much as 2.5%, to be
> expected given that compiler threads are racing with Java threads.
> 
> perf stats are typcally:
> 
>      77232.039076      task-clock:u (msec)       #    3.284 CPUs utilized
>            933972      page-faults:u             #    0.012 M/sec
>      167237276242      cycles:u                  #    2.165 GHz
>      239468489262      instructions:u            #    1.43  insn per cycle
>        1762368611      branch-misses:u
> 
>      23.520042515 seconds time elapsed
> 
> An nmethod barrier is 5 instructions, so the proportion of instructions
> executed by nmethod barriers is
> 
>   (657152838.0*5)/239468489262 = 1.38%
> 
> That's quite a lot. I would have expected it to be noticeable. It
> might well be that the barrier instructions commonly are (fully or
> partially) speculated in parallel on a big out-of-order machine so
> don't show up on a wall clock, but they will show up in perf
> stats. Also, that 1% is about the level of run-to-run variance even on
> a quiet server, so it'd take many runs averaged out to see it. But the
> effect is real; unless I have messed up my measurements or my
> thinking, which happens.
> 
> -- 
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://urldefense.com/v3/__https://www.redhat.com__;!!ACWV5N9M2RV99hQ!d79BL2gH4Ia5-JuRczEGCL852C2FIKBiVsyvUa6wnQB_2CTVMxzM_KQ31hqPOGZD$ >
> https://urldefense.com/v3/__https://keybase.io/andrewhaley__;!!ACWV5N9M2RV99hQ!d79BL2gH4Ia5-JuRczEGCL852C2FIKBiVsyvUa6wnQB_2CTVMxzM_KQ31rYrRuHG$ 
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>