RFR(L): JDK-8032463 VirtualDispatch test timeout with DeoptimizeALot

Fri May 9 01:28:43 UTC 2014

Looks good.

Thanks,
Vladimir

On 5/8/14 12:30 PM, Igor Veresov wrote:
> Alright, here is the update:
> http://cr.openjdk.java.net/~iveresov/8032463/webrev.02/
>
> The change is that I added a trap counter in the MDO that is used to
> scale up the number of stack walks that are necessary for the tenured
> method to revert back to the standard flushing policy. This required
> some changes in c1_Runtime.cpp and deoptimization.cpp to handle these
> deopts and allocate MDOs (in C1). The counter increment is guarded by
> the make_not_entrant() call that guarantees that the increment happens
> only once per method state change.
> Also fixed the comments in sweeper.cpp and indent in globals.hpp that I
> forgot to do the last time.
>
> Thanks!
> igor
>
>
> On May 7, 2014, at 2:27 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com
> <mailto:vladimir.kozlov at oracle.com>> wrote:
>
>> On 5/7/14 12:37 AM, Igor Veresov wrote:
>>> Vladimir,
>>>
>>> Thanks for the review!
>>> Please find the reply inline.
>>>
>>> igor
>>>
>>> On May 6, 2014, at 5:05 PM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>
>>>> I think we need to let SAP know about this change (counter code in
>>>> C2) since it may affect their tiered compilation. It use the same
>>>> Deoptimization::Reason_age.
>>>>
>>> Oh, I didn’t know that. I’ll introduce another constant.
>>>
>>>> Can you separate C1 trap_request newa code in a separate RFE and
>>>> push it first?
>>>>
>>>
>>> Do you think this is necessary? It so happens that the DeoptimizeStub
>>> was not used before this by any code. So the aging scheme is the only
>>> user of it so far.
>>
>> I am fine to keep it in this fix but I can't review it. Ask Christian
>> to look.
>>
>>>
>>>> Can you add the field to MDO and not to MethodCounters? We doing
>>>> this only for methods which were compiled already and have MDO.
>>>> MethodCounters small size is important.
>>>
>>> It was in MDO originally, unfortunately I have to make it work for
>>> pure C1 and we usually don’t allocate MDOs in that configuration.
>>> Although, we need to only allocate MDOs for methods with aging, a set
>>> of which may be would be smallish..  But MDOs are kind of big.. Would
>>> that be better?
>>
>> I forgot about pure C1.
>> Chris Plummer pointed me, for RTM changes, MethodCounters are
>> allocated for all executed methods which may not be compiled. About
>> 30% methods are executed.
>>
>>>
>>>> Why the change in methodData.hpp and nmethod.cpp? '||' should be
>>>> '&&' in nmethod::inc_decompile_count(), I think.
>>>
>>> You’re right, for pure C1 that won’t work anyways, because there are
>>> no MDOs. I’ll remove these for now. I probably need another decompile
>>> counter to put a cap on the number of recompiles, see the idea below.
>>>
>>>> In globals.hpp flags description indention is off.
>>>>
>>>
>>> Fixed.
>>
>> You moved it to opposite direction :) . Indention of all comments are
>> the same in the file. They are not aligned to a flag's declaration line.
>>
>>>
>>>> sweeper.cpp:
>>>>
>>>> // The stack-scanning low-cost detection may not see the method was
>>>> used (which can happen for
>>>>
>>>> // previous MinPassesBeforeFlush sweeps. Reset the counter. Stay in
>>>> the existing
>>
>> You did not change above comments as I suggested.
>>
>>>>
>>>> Why do you need to repeat? For small leaf nmethods (with no
>>>> safepoint, for example) it could trigger continues recompilation:
>>>>
>>>> +             if (time_since_reset > MinPassesBeforeFlush * 2) {
>>>> +               // It's been long enough, we still haven't seen it
>>>> on stack.
>>>> +               // Try to flush it, but enable counters the next time.
>>>> +               mc->reset_nmethod_age();
>>>>
>>>
>>> That’s a good question. Before this change we would’ve recompiled
>>> warm methods continuously (given some code cache pressure), so the
>>> number of such cases is probably small. The options here with the
>>> code aging are:
>>> - don’t ever again flush a method that we’ve seen deopt once;
>>> - allow the flusher machinery to get rid of it if it becomes idle again.
>>>
>>> Currently I decided to stick with the latter but also give the method
>>> twice the standard time initially to be noticed by the stack-walker.
>>> You are right that there is a problem for small methods. Technically
>>> there is always one safepoint in the epilog though, right? I’m good
>>> with either option. Vladimir, what are your thoughts on this? Albert,
>>> if you skimming through this, what do you think?
>>>
>>> Here is an idea... We could count the number of decompiles for this
>>> particular reason and use it to scale the number of stack-walks we
>>> have to do before we flush the method again. For example we can wait
>>> MinPassesBeforeFlush * number_of_aging_deopts() stack-walks.
>>> Essentially this makes it unflushable in the limit - a combination of
>>> the two options above.
>>> I’ll need to allocate an MDO or add another counter to methodCounters
>>> for that though, or pack a small value info method_age itself, at the
>>> expense of a slightly bigger code. And there is an MT problem, as
>>> usual, with counting the number of deopts.
>>
>> As we discussed the check will be scaled by number of recompilation.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Here is the updated (with the changes addressed so far) webrev:
>>> http://cr.openjdk.java.net/~iveresov/8032463/webrev.01/
>>>
>>> Thanks,
>>> igor
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 5/5/14 7:44 PM, Igor Veresov wrote:
>>>>> The current implementation of the code cache flusher tries to
>>>>> determine the working set of methods by doing periodic stack walks
>>>>> during safepoints. Methods that are seen on the stack are
>>>>> considered to be used by the program and are excluded from the set
>>>>> of flush candidates. Such sampling is a great zero-overhead
>>>>> approach to collect the working set of super-hot methods without
>>>>> instrumentation. However it doesn’t work that good for flat
>>>>> profiles, which the test in the bug report happens to expose
>>>>> (thousands of methods calls sequentially one after another in the
>>>>> loop). The solution I ended up implementing uses conservative
>>>>> on-demand instrumentation to obtain information about the method
>>>>> usage. The algorithms is as follows:
>>>>>
>>>>> 1. Initially a method is compiled as usual (no instrumentation, no
>>>>> overhead).
>>>>> 2. When the sampling (stack-walking) scheme fails to detect
>>>>> activity we flush the method (again, as usual). However before the
>>>>> flush we set the age counter (an int added to MethodCounters) to a
>>>>> value that instructs the compilers to instrument the code.
>>>>> 3. If we ever request to compile this method again the aging code
>>>>> is inserted, which decrements the counter.
>>>>> 4. The value of the counter is then used by the sweeper to
>>>>> determine is the method is in fact used.
>>>>> 5. If the counter reaches zero before the sweeper is able examine
>>>>> and reset the counter we deopt and recompile the method without the
>>>>> counters, basically switching it back to use the sampling scheme again.
>>>>>
>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8032463/webrev.00/
>>>>>
>>>>> Testing:
>>>>> - jprt,
>>>>> - aurora-perf No performance effects with the default code cache,
>>>>> since we have enough headroom and the flusher is not exercised that
>>>>> much. No statistically significant effects after warmup (for spec
>>>>> benchmarks) in stress mode (-XX:+StressCodeAging) either, that is
>>>>> if all methods are compiled initially with counters deopt and
>>>>> switch to sampling later after 100000 invocations.
>>>>>
>>>>> igor
>