Fwd: Tiered compilation and virtual call heuristics

Thu Jul 30 02:08:56 UTC 2015

Hi Carsten,

The main issue here is that without Tiered Interpreter starts collection 
profiling information only after 3300 invocations 
(InterpreterProfilePercentage). As result data from first invocations is 
not recorded.
On other hand with Tiered C1 compilation (with profiling code) is 
triggered after 100 invocations. So you have a lot more data as you 
observed.

If you can sacrifice a startup performance you can try to use 
CompileThresholdScaling to increase compilation thresholds to delay 
compilations.

Or you can also try to increase Tier3InvocationThreshold and 
Tier3CompileThreshold to delay only C1 compilation:

Here is formula from simpleThresholdPolicy.inline.hpp:

     return (i >= Tier3InvocationThreshold * scale) ||
            (i >= Tier3MinInvocationThreshold * scale && i + b >= 
Tier3CompileThreshold * scale);

But if you have real "flat" profile (all called methods are relatively 
warm) nothing will help you.

If you have some methods which are relatively hot you can solve that by 
trying to call them at the beginning. For example, if you had 
count400(0) called first (or second) you will get record for it in MDO.
And then you can try to low TypeProfileMajorReceiverPercent to avoid 
virtual call at least for on hot method (recorded in MDO):

   product(intx, TypeProfileMajorReceiverPercent, 90,
           "% of major receiver type to all profiled receivers")

Regards,
Vladimir

On 7/22/15 10:37 AM, Carsten Varming wrote:
> Dear Hotspot compiler group,
>
> I have had a few issues with tiered compilation in JDK8 lately and was
> wondering if you have some comments or ideas for the given problem.
>
> Here is my problem as I currently understand it. Feel free to correct
> any misunderstandings I may have. With tiered compilation the heuristics
> for inlining virtual calls seems to degrade quite a bit. I think this is
> due to MethodData objects being created much earlier with tiered than
> without. This causes the tracking of the hottest target methods at a
> virtual call site to go awry, due to the limit (2) on the number of
> MethodData objects that can be associated with a bci in a method. It
> seems like the only virtual call targets tracked are the targets that
> are warm when when C1 is invoked.
>
> The program ends up with all call-sites in
> scala.collection.IndexedSeqOptimized.slice using virtual dispatch with
> tiered and bimorphic call sites without tiered. The end result with
> tiered is a tripling of the cpu required to run the program, and
> instruction pointers from the compiled slice method end up in 90% of all
> cpu samples (collected with perf at 4kHz).
>
> The problem is with a small application built in Scala on top of Netty.
> I have written a small sample program (see attached Main.java) to spare
> you the details (and to be able to give you code).
>
> When I run the sample program with tiered then the call to count end up
> being a virtual call, due to Instance$3.count  and Instance4.count being
> warm when C1 kicks in. Without tiered Instance$1.count is the only hot
> method.
>
> I wonder if you guys have seen this problem in the wild or if I just
> happen to be unlucky. Increasing BciProfileWidth should help in my case,
> but it is not a product flag. Do you have any experience regarding cost
> of increasing BciProfileWidth? Do you have any thoughts on throwing out
> MethodData objects for virtual call sites that turns out to be pretty cold?
>
> Thank you in advance for your thoughts,
> Carsten
>