RFR(S): 8020151: PSR:PERF Large performance regressions when code cache is filled

Albert Noll albert.noll at oracle.com
Thu Sep 5 05:48:48 PDT 2013


Hi Vladimir,

thanks again for the detailed review.

I think it is important to have a small NmethodSweepFraction for small 
code cache sizes,
which is provided in the current version. Since we plan to have a 
separate sweeper thread,
I would propose the further investigate the impact of 
NmethodSweepFraction when we implement
the sweeper thread.

As for:

JavaCalls::call_helper()

   if (CompilationPolicy::must_be_compiled(method)) {
     CompileBroker::compile_method(method, InvocationEntryBci,

CompilationPolicy::policy()->initial_compile_level(),
                                   methodHandle(), 0, 
"must_be_compiled", CHECK);
+ } else if (UseCodeCacheFlushing) {
+   nmethod* saved = CodeCache::reanimate_saved_code(method());
+   if (saved != NULL) {
+     method->set_code(method, saved);
+   }
   }

The current places where code is re-animated 
(Method::verified_code_entry() and CompileBroker::compile_method() seem 
effective.

One important note: The version of webrev.01 computes the following 
parameter incorrectly:

const double flushing_fraction = 1 - (pow((ReservedCodeCacheSize / M), -1/3) + CodeCacheFlushingMinimumPercentage / 100);


As a result, flushing fraction was always 0, which means that no memory 
was speculatively disconnected.
This is a strong indicator that the hotness counter renders the 
speculatively disconnect + reanimate obsolete.
I'll work on a new version where this code is removed.

Best,
Albert


On 04.09.2013 22:33, Vladimir Kozlov wrote:
> On 9/4/13 4:07 AM, Albert Noll wrote:
>> Hi,
>>
>> here is an updated version of the patch. In summary, changes to the
>> previous version are:
>>
>> 1) Adapt NmethodSweepFraction based on the code cache size:
>>      Before the patch, NmethodSweepFraction was defined as a constant 
>> value.
>>      I.e., NmethodSweepFraction was 16 for a code cache size of 32m as
>> well as a
>>      code cache size of 256m. This seems unreasonable, since
>> NmethodSweepFraction
>>      was originally introduce to keep the time spent in the sweeper
>> (note that sweeping
>>      was originally done at safepoints) at a reasonable level.
>
> Albert,
>
> General question first. Do we still need to sweep in fractions since 
> we don't do sweeping at safepoint and it stops for safepoints? The 
> only drawback is one Compiler thread is used for sweeping and can't 
> compile when it sweeps. But we will address that later by using 
> special sweeping thread.
>
>>      The patch adapts NmethodSweepFraction to the ReservedCodeCacheSize.
>> I.e.,
>>      each sweep operation roughly  covers 16m of the code cache.
>
> Why you stop at 16?:
>
> +    } else if (ReservedCodeCacheSize > (256 * M)) {
> +      FLAG_SET_DEFAULT(NmethodSweepFraction, 16);
>
> ReservedCodeCacheSize could be 2GB so one sweep have to process 128Mb.
> Next should cover all cases, I think:
>
> FLAG_SET_DEFAULT(NmethodSweepFraction, 1 + ReservedCodeCacheSize / (16 
> * M))
>
>>
>> 2) The parameter NmethodSweepFraction is replaced by a function that
>> computes the
>>      NmethodSweepFraction based on the code cache size. See
>> 'sweeper.cpp' for a detailed
>>      description
>>
>> 3) The initial hotness counter depends on the code cache size. More
>> specifically, the initial
>>      hotness counter is 2*ReservedCodeCacheSize. As a result, a method
>> stays longer in the
>>      code cache if ReservedCodeCacheSize is larger.
>
> I don't like spreading '(ReservedCodeCacheSize / M) * 2' through the 
> code. Use NMethodSweeper::get_hotness_counter_reset_val() in nmethod 
> constuctors.
>
> Also you need to check for min value if ReservedCodeCacheSize < 1Mb.
>
> Also drop 'get_' from name. We usually don't use 'get_' in accessor 
> methods. the same for nmethod::get_hotness_counter().
>
> I already asked you to make hotness_counter_decay member of nmethod so 
> that you don't need to pass it to dec_hotness_counter(). Or remove it 
> at all since it is always 1:
>
> void dec_hotness_counter() { _hotness_counter--;  }
>
> I would prefer all code which modifies _hotness_counter be in nmethod 
> class.
>
>>
>> 4) Newly compiled methods are guaranteed to not be evicted by the
>> sweeper for 10
>>      sweep cycles. This ensures the newly compiled methods are not
>> immediately made
>>      not entrant after compialtion.
>>
>> 5) The hotness counter is reset EVERY TIME active methods are scanned.
>> In the previous version
>>      the hotness counter was only reset after a full sweep cycle of the
>> code cache. Resetting the
>>     hotness counter more frequently provides a better hotness coverage
>> of methods.
>>
>> 6) Methods are flushed in blocks of 1m. The algorithm computes the
>> average hotness of a
>>      a nmethod block and evicts the entire block. This should reduce
>> fragmentation.
>>
>> Please let me know what you think about these changes. Performance
>> results can be found at:
>> https://bugs.openjdk.java.net/browse/JDK-8020151
>> I will continuously provide more results.
>>
>> Here is the new webrev:
>> http://cr.openjdk.java.net/~anoll/8020151/webrev.01/
>> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.01/>
>
> Very nice comment in sweeper.hpp. Could you please extend it?
>
> 2) sweep nmethods
>
> Sweeping is currently done by Compiler thread between compilations or 
> at least each 5 sec (NmethodSweepCheckInterval) when CodeCache is full.
>
> The comment is not cleare that nmethods could be marked as not-entrant 
> by different code (deoptimization, dependency invalidation, replace 
> old nmethod) and not just by sweeper (which does that only for flushed 
> nmethods).
>
> Each nmethod's state change happens during separate sweeps. It may 
> take at least 3 sweeps before nmethod's space is freed.
>
> 3) code cache flushing
> ... as a VM operation at safepoint.
>
> Flashing VM operation requires safepoint.
>
> As you pointed current code re-animates nmethods only if we get 
> compilation request. It means that it depends on profiling counters. 
> It could be the case that a method have to be executed in interpreter 
> for some time again to hit compilation threshold.
>
> I think we need to re-animate nmethod immediately during java call:
>
> JavaCalls::call_helper()
>
>   if (CompilationPolicy::must_be_compiled(method)) {
>     CompileBroker::compile_method(method, InvocationEntryBci,
>
> CompilationPolicy::policy()->initial_compile_level(),
>                                   methodHandle(), 0, 
> "must_be_compiled", CHECK);
> + } else if (UseCodeCacheFlushing) {
> +   nmethod* saved = CodeCache::reanimate_saved_code(method());
> +   if (saved != NULL) {
> +     method->set_code(method, saved);
> +   }
>   }
>
> Can you verify how this work?
>
> I will send additional comments when look through the rest of code.
>
> Thanks,
> Vladimir
>
>>
>> Best,
>> Albert
>>
>>
>> On 22.08.2013 15:16, Albert Noll wrote:
>>> Hi Igor,
>>>
>>> thanks again for your comments. You are right. I will run some
>>> benchmarks to see if removing
>>> the disconnect logic is feasible or not.
>>>
>>> Best,
>>> Albert
>>>
>>> On 22.08.2013 12:39, Igor Veresov wrote:
>>>> It's worth a try. But the "hotness" logic is probabilistic and
>>>> imprecise - it will notice only methods that are on stack during
>>>> safepoints. Those are going to be pretty narrow snapshots of
>>>> activity. I suspect that for large flat profiles (like enterprise
>>>> apps and friends) you could be missing methods that are rather warm
>>>> in reality, which will cause recompilation oscillations. Although if
>>>> the statistics are allowed to accumulate enough may be it's going to
>>>> work out, an experiment will tell. The patch is a good start, the
>>>> stack sampling, IMO, is totally the right approach for filtering out
>>>> the hot methods.
>>>>
>>>> The "disconnect" logic on the other hand is sort of precise.
>>>> Although, looking now at the code it's not quite clear to me how it
>>>> works, it doesn't seem to be any patching going on to divert the
>>>> control for the cases when the nmethod is called directly or through
>>>> an IC. So I guess it's not really a full disconnect? Anyways, in
>>>> theory, with some work, we can make the disconnect logic to precisely
>>>> measure the time the method is inactive. Which should provide precise
>>>> information about the warm/cold methods.
>>>>
>>>> Btw, also just noticed a bunch of flaws in the interaction of the
>>>> disconnect logic and tiered. The nmethod's "reconnection" happens in
>>>> CompileBroker::compile_method(), which firstly will be called by
>>>> tiered only after a rather substantial number of invocations in the
>>>> interpreter (up to 128), and secondly will be subject to all the
>>>> prioritization rules (it probably should not), and also we don't
>>>> check if the comp level of the reanimated nmethod matches the
>>>> request. If the disconnect logic is to stay, the interpreter should
>>>> be able to know if the method has saved code and be able to call into
>>>> the runtime immediately to reanimate it.
>>>>
>>>> igor
>>>>
>>>> On Aug 21, 2013, at 10:27 PM, Albert Noll <albert.noll at oracle.com
>>>> <mailto:albert.noll at oracle.com>> wrote:
>>>>
>>>>> Hi Igor,
>>>>>
>>>>> thanks for looking at the patch. Actually, I think - just as
>>>>> Vladimir pointed out - that we can get rid of
>>>>> the "disconnect" logic. We now have the hotness of a method and if
>>>>> the code cache fills up, we and
>>>>> we decide to schedule the method for removal, we set it to 
>>>>> not_entrant.
>>>>> It seems that adding the method to the list of disconnected methods
>>>>> just buys a little more time until we decide to make the method
>>>>> not-entrant. However, we can have the same effect by setting the
>>>>> threshold differently.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Best,
>>>>> Albert
>>>>>
>>>>> On 22.08.2013 10:02, Igor Veresov wrote:
>>>>>> May be instead of "(_traversals > _last_flush_traversal_id + 2)" we
>>>>>> should timestamp a method when it's disconnected, and then use a
>>>>>> rule like if a method has been disconnected for k *
>>>>>> reverse_free_ratio() seconds then it's ok to kill it. We can also
>>>>>> sort the nmethods that pass that filter by the amount of time they
>>>>>> were disconnected and select most likely candidates for flushing.
>>>>>> This should allow to basically do disconnect/flush in every
>>>>>> traversal, which should make things faster. Timestamps would be
>>>>>> obtained only once per traversal or something like that. What do
>>>>>> you think?
>>>>>>
>>>>>> Pretty cool idea to reverse-prioritize disconnects on hotness.
>>>>>>
>>>>>> igor
>>>>>>
>>>>>> On Aug 21, 2013, at 4:42 AM, Albert Noll <albert.noll at oracle.com
>>>>>> <mailto:albert.noll at oracle.com>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> could I have reviews for this patch? Please note
>>>>>>> that I do not yet feel very confident with the sweeper,
>>>>>>> so please take a close look.
>>>>>>>
>>>>>>> jbs: https://jbs.oracle.com/bugs/browse/JDK-8020151
>>>>>>> webrev: http://cr.openjdk.java.net/~anoll/8020151/webrev.00/
>>>>>>> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.00/>
>>>>>>>
>>>>>>>
>>>>>>> Many thanks in advance,
>>>>>>> Albert
>>>>>>>
>>>>>>>
>>>>>>> Problem: There can be large performance regressions when the code
>>>>>>> cache fills up. There are
>>>>>>> several reasons for the performance regression: First (1), when
>>>>>>> the code cache is full and methods
>>>>>>> are speculatively disconnected, the oldest methods (based on
>>>>>>> compilation ID) are scheduled for
>>>>>>> flushing. This can result in flushing hot methods. Second (2),
>>>>>>> when compilation is disabled due to a full
>>>>>>> code cache, the number of sweeps can go down. A lower number of
>>>>>>> sweep operations results
>>>>>>> in slower method flushing.
>>>>>>>
>>>>>>> Solution:
>>>>>>> Introduce a hotness counter that is set to a particular value
>>>>>>> (e.g., 100) when there is an activation
>>>>>>> of the method during stack scanning. The counter is decremented by
>>>>>>> 1 every time the sweeper
>>>>>>> is invoked.
>>>>>>>
>>>>>>> ad (1):
>>>>>>>   A VM operation that speculatively disconnects nmethods, selects
>>>>>>> the methods that should be
>>>>>>>   flushed based on the hotness. For example, if 50% of the code
>>>>>>> cache shall be flushed, we flush
>>>>>>>   those methods that have not been active while stack scanning for
>>>>>>> the longest time. Note that
>>>>>>>   while this strategy is more likely to flush cold methods, it is
>>>>>>> not clear to what extent the new
>>>>>>>   strategy fragments the code cache.
>>>>>>>
>>>>>>>   Changes in NMethodSweeper::speculative_disconnect_nmethods(bool
>>>>>>> is_full)
>>>>>>>
>>>>>>> ad (2)
>>>>>>>   Currently, methods are removed from the code cache if:
>>>>>>>     a) code cache is full
>>>>>>>     b) class is unloaded
>>>>>>>     c) method is replaced by another version (i.e., compiled with
>>>>>>> a different tier)
>>>>>>>     d) deopt
>>>>>>>
>>>>>>>    The current patch adds a 5-th possibility to remove a method
>>>>>>> from the code cache.
>>>>>>>    In particular, if a method has not been active during stack
>>>>>>> scanning for a long-enough
>>>>>>>    amount of time, the method is removed from the code cache. The
>>>>>>> amount of time
>>>>>>>    required to flush the method depends on the available space in
>>>>>>> the code cache.
>>>>>>>
>>>>>>>    Here is one example: If a method was seen on a stack the
>>>>>>> hotness counter
>>>>>>>    is set to 100. A sweep operation takes roughly place every
>>>>>>> 100ms. I.e., it takes
>>>>>>>    100ms * 100 = 10s until the hotness counter reaches 0. The
>>>>>>> threshold that determines
>>>>>>>    if a method should be removed from the code cache is calculated
>>>>>>> as follows:
>>>>>>>
>>>>>>>    threshold = -100 + (CodeCache::reverse_free_ratio() *
>>>>>>> NMethodSweepActivity)
>>>>>>>
>>>>>>>     For example, if 25% of the code cache is free,
>>>>>>> reverse_free_ratio returns 4.
>>>>>>>     The default value of NMethodSweepActivity is 10. As a result,
>>>>>>> threshold = -60.
>>>>>>>     Consequently, all methods that have a hotness value smaller
>>>>>>> than -60 (which
>>>>>>>     means they have not been seen on the stack for 16s) are
>>>>>>> scheduled to be flushed
>>>>>>>     from the code cache. See an illustration of the threshold as a
>>>>>>> function of the available
>>>>>>>     code cache in threshold.pdf
>>>>>>>
>>>>>>>     Note that NMethodSweepActivity is a parameter that can be
>>>>>>> specified via a -XX
>>>>>>>     flag.
>>>>>>>
>>>>>>> Changes in NMethodSweeper::sweep_code_cache()
>>>>>>>
>>>>>>>
>>>>>>> A very preliminary performance evaluation looks promising. I used
>>>>>>> the DaCapo
>>>>>>> benchmarks where a series of benchmarks is executed in the same VM
>>>>>>> instance.
>>>>>>> See performance.pdf . The x-axis shows the benchmarks. Assume we
>>>>>>> have 2 benchmarks
>>>>>>> (BM). The execution sequence is as follows:
>>>>>>>
>>>>>>> BM1 (Run 1-1)
>>>>>>> BM1 (Run 2-1)
>>>>>>> BM2 (Run 1-1)
>>>>>>> BM2 (Run 2-1)
>>>>>>>
>>>>>>> BM1 (Run 1-2)
>>>>>>> BM1 (Run 2-2)
>>>>>>> BM2 (Run 1-2)
>>>>>>> BM2 (Run 2-2)
>>>>>>>
>>>>>>>
>>>>>>> A value larger than 0 on the x-axis indicates that the version
>>>>>>> including the proposed patch is faster.
>>>>>>> I.e., the values are calculated as follows: (T_original /
>>>>>>> T_with_patch) - 1. T is the execution time
>>>>>>> (wall clock time) of the benchmark. ReservedCodeCacheSize is set
>>>>>>> to 50m.  I used three runs and
>>>>>>> the arithmetic average to compare the numbers. I know, we need
>>>>>>> much more data, however,
>>>>>>> I think we can see a trend.
>>>>>>>
>>>>>>> The current patch does not trigger a warning that the code cache
>>>>>>> is full and compilation has been
>>>>>>> disabled.
>>>>>>>
>>>>>>> Please let me know that you think.
>>>>>>> <threshold.pdf><performance.pdf>
>>>>>>
>>>>>
>>>>
>>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130905/f830db55/attachment-0001.html 


More information about the hotspot-compiler-dev mailing list