RFR(L): JDK-8032463 VirtualDispatch test timeout with DeoptimizeALot

Wed May 7 21:27:22 UTC 2014

On 5/7/14 12:37 AM, Igor Veresov wrote:
> Vladimir,
>
> Thanks for the review!
> Please find the reply inline.
>
> igor
>
> On May 6, 2014, at 5:05 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>
>> I think we need to let SAP know about this change (counter code in C2) since it may affect their tiered compilation. It use the same Deoptimization::Reason_age.
>>
> Oh, I didn’t know that. I’ll introduce another constant.
>
>> Can you separate C1 trap_request newa code in a separate RFE and push it first?
>>
>
> Do you think this is necessary? It so happens that the DeoptimizeStub was not used before this by any code. So the aging scheme is the only user of it so far.

I am fine to keep it in this fix but I can't review it. Ask Christian to 
look.

>
>> Can you add the field to MDO and not to MethodCounters? We doing this only for methods which were compiled already and have MDO. MethodCounters small size is important.
>
> It was in MDO originally, unfortunately I have to make it work for pure C1 and we usually don’t allocate MDOs in that configuration. Although, we need to only allocate MDOs for methods with aging, a set of which may be would be smallish..  But MDOs are kind of big.. Would that be better?

I forgot about pure C1.
Chris Plummer pointed me, for RTM changes, MethodCounters are allocated 
for all executed methods which may not be compiled. About 30% methods 
are executed.

>
>> Why the change in methodData.hpp and nmethod.cpp? '||' should be '&&' in nmethod::inc_decompile_count(), I think.
>
> You’re right, for pure C1 that won’t work anyways, because there are no MDOs. I’ll remove these for now. I probably need another decompile counter to put a cap on the number of recompiles, see the idea below.
>
>> In globals.hpp flags description indention is off.
>>
>
> Fixed.

You moved it to opposite direction :) . Indention of all comments are 
the same in the file. They are not aligned to a flag's declaration line.

>
>> sweeper.cpp:
>>
>> // The stack-scanning low-cost detection may not see the method was used (which can happen for
>>
>> // previous MinPassesBeforeFlush sweeps. Reset the counter. Stay in the existing

You did not change above comments as I suggested.

>>
>> Why do you need to repeat? For small leaf nmethods (with no safepoint, for example) it could trigger continues recompilation:
>>
>> +             if (time_since_reset > MinPassesBeforeFlush * 2) {
>> +               // It's been long enough, we still haven't seen it on stack.
>> +               // Try to flush it, but enable counters the next time.
>> +               mc->reset_nmethod_age();
>>
>
> That’s a good question. Before this change we would’ve recompiled warm methods continuously (given some code cache pressure), so the number of such cases is probably small. The options here with the code aging are:
> - don’t ever again flush a method that we’ve seen deopt once;
> - allow the flusher machinery to get rid of it if it becomes idle again.
>
> Currently I decided to stick with the latter but also give the method twice the standard time initially to be noticed by the stack-walker. You are right that there is a problem for small methods. Technically there is always one safepoint in the epilog though, right? I’m good with either option. Vladimir, what are your thoughts on this? Albert, if you skimming through this, what do you think?
>
> Here is an idea... We could count the number of decompiles for this particular reason and use it to scale the number of stack-walks we have to do before we flush the method again. For example we can wait MinPassesBeforeFlush * number_of_aging_deopts() stack-walks. Essentially this makes it unflushable in the limit - a combination of the two options above.
> I’ll need to allocate an MDO or add another counter to methodCounters for that though, or pack a small value info method_age itself, at the expense of a slightly bigger code. And there is an MT problem, as usual, with counting the number of deopts.

As we discussed the check will be scaled by number of recompilation.

Thanks,
Vladimir

>
> Here is the updated (with the changes addressed so far) webrev: http://cr.openjdk.java.net/~iveresov/8032463/webrev.01/
>
> Thanks,
> igor
>
>> Thanks,
>> Vladimir
>>
>> On 5/5/14 7:44 PM, Igor Veresov wrote:
>>> The current implementation of the code cache flusher tries to determine the working set of methods by doing periodic stack walks during safepoints. Methods that are seen on the stack are considered to be used by the program and are excluded from the set of flush candidates. Such sampling is a great zero-overhead approach to collect the working set of super-hot methods without instrumentation. However it doesn’t work that good for flat profiles, which the test in the bug report happens to expose (thousands of methods calls sequentially one after another in the loop). The solution I ended up implementing uses conservative on-demand instrumentation to obtain information about the method usage. The algorithms is as follows:
>>>
>>> 1. Initially a method is compiled as usual (no instrumentation, no overhead).
>>> 2. When the sampling (stack-walking) scheme fails to detect activity we flush the method (again, as usual). However before the flush we set the age counter (an int added to MethodCounters) to a value that instructs the compilers to instrument the code.
>>> 3. If we ever request to compile this method again the aging code is inserted, which decrements the counter.
>>> 4. The value of the counter is then used by the sweeper to determine is the method is in fact used.
>>> 5. If the counter reaches zero before the sweeper is able examine and reset the counter we deopt and recompile the method without the counters, basically switching it back to use the sampling scheme again.
>>>
>>> Webrev: http://cr.openjdk.java.net/~iveresov/8032463/webrev.00/
>>>
>>> Testing:
>>> - jprt,
>>> - aurora-perf No performance effects with the default code cache, since we have enough headroom and the flusher is not exercised that much. No statistically significant effects after warmup (for spec benchmarks) in stress mode (-XX:+StressCodeAging) either, that is if all methods are compiled initially with counters deopt and switch to sampling later after 100000 invocations.
>>>
>>> igor
>>>
>