RFR(S): 8020151: PSR:PERF Large performance regressions when code cache is filled

Fri Aug 30 06:44:24 PDT 2013

Hi Chris,

thanks again for sharing your thoughts. See the comments below:

On 27.08.2013 21:20, Chris Plummer wrote:
> Hi Albert,
>
> Comments inline below:
>
> On 8/27/13 6:03 AM, Albert Noll wrote:
>> Hi Chris,
>>
>> thanks a lot for your feedback. I will do the performance evaluation 
>> that you proposed
>> to validate that the code changes also perform well for small code 
>> cache sizes.
>>
>> On 22.08.2013 23:29, Chris Plummer wrote:
>>> Hi Albert,
>>>
>>> sort_nmentod_by_hotness should be sort_nmethod_by_hotness.
>>>
>> Thanks, I will fix the type error.
>>> We already store NMethodSweeper::_traversal_count in 
>>> nmethod::_stack_traversal_mark each time scan_stacks() is called. 
>>> Why not just use that to determine how long it's been since an 
>>> nmethod was active? No need to decrement it. Just look at 
>>> nmethod::_stack_traversal_mark when sorting.
>>>
>> _stack_traversal_mark is only set for not-entrant methods. However, 
>> the hotness counter should
>> also be available for all methods, including methods that have the 
>> state 'in_use'. As a result, we
>> need _hotness_counter.
> I see what you mean about only be set for not-entrant methods, but I 
> don't see anything that would indicate that it would be problematic to 
> always set it, not just if the method is not-entrant. In fact the 
> comment alludes that this is what is actually done, although it is not:
>
Currently, stack scanning and marking is only performed if there is no 
sweep in progress (!sweep_in_progress() ...). Note that this is due to 
the correctness of the sweeping procedure.
Having a separate field (the _hotness_counter) enables us to do stack 
scanning every time
the scan_stacks()  function is called. This improves the precision of 
live method detection approx.
by a factor of 6. So, if we want more precision, we have to have the 
extra field plus some
overhead due to the more frequent stack scanning. I am currently 
evaluating the overhead.
>   // not_entrant method removal. Each mark_sweep pass will update
>   // this mark to current sweep invocation count if it is seen on the
>   // stack.  An not_entrant method can be removed when there is no
>   // more activations, i.e., when the _stack_traversal_mark is less than
>   // current sweep traversal index.
>   long _stack_traversal_mark;
>
> BTW, there is bit of comment rot in the following. Maybe you can clean 
> it up as part of your changes. It should reference 
> _stack_traversal_mark instead of _last_seen_on_stack
>
>   // See comment at definition of _last_seen_on_stack
>   void mark_as_seen_on_stack();
>   bool can_not_entrant_be_converted();
>
I will clean up the comments.
>
>>> Also, keep in mind that how long it's been since a method was last 
>>> used does not always equate to how hot it is. You might have a 
>>> method that is called once every second, and another that is called 
>>> 1000 times every 10 seconds. The later is actually hotter, but your 
>>> algorithm would favor sweeping the former. Not a deal breaker, but 
>>> something to consider when trying to improve your criteria for 
>>> selecting how hot a method is.
>>>
>> I cannot follow your example. Since we use stack scanning to 
>> determine the hotness of a method,
>> it is irrelevant how often a method is called. The probability that a 
>> method 'M' is hot depends only
>> on the time that is spent executing 'M'; the more time we spend in 
>> 'M', the more likely it is that we
>> hit 'M' during stack scanning. For example, assume that a program 
>> spends 10ms/second in 'M'.
>> If 'M' is called 1 time/second and the execution time of 'M'=10ms, or 
>> M is called 10 times/second
>> and the execution time of 'M'=1ms is likely to yield the same hotness.
> I was referring to if you instead used _last_seen_on_stack, which only 
> indicates how recently the method was seen, not how frequently. 
> However, I think your approach suffers the same problem, since 
> scan_stacks sets the hotness to 100, rather than increasing the 
> hotness each time the method is found. So it to is only an indicator 
> of how recently the method was seen.
That is true. However, hot methods are more likely to be seen 'recently' 
than cold methods. I.e.,
the average 'temperature' of methods in which the application spends a 
significant amount of time will be higher than the average temperature 
of methods that are barely used, simply because 'hot methods' will be 
'reheated' (by stack scanning) more often.

If we would increase a counter every time we see a method on the stack, 
we have to reset/decrement
the counter somewhere else (e.g., in the sweeper). Otherwise, if a 
method is once marked as hot that method will always stay hot even if it 
is not anymore. If we decrement the hotness_counter, the information we 
get is, I think, less precise compared to the current approach. I.e., 
simply by looking at the counter does not tell you when the method was 
used the last time: A value of, e.g. 10, could mean that the method is 
'medium hot' (we do not want to evict), or that the method was hot a 
long time ago and is now cold (we want to evict). Do you have any 
thoughts about this?
>> Using stack scanning to determine the hotness of a method is only an 
>> approximation based on
>> sampling. I.e., if we are unlucky and never see a hot method on the 
>> stack, the hot method will
>> eventually be flushed.
> True, but it doesn't need to be perfect. However, I think with tiered 
> you need to be a lot more careful than for example when just using C1. 
> It doesn't take that much work to get a C1 method compiled, whereas 
> there is a lot of work put into a tiered C2 compilation (including 
> doing all the tiered compilations again). So throwing away a C2 tiered 
> compiled method that is indeed hot can be costly. Maybe you should 
> take this into account when determining the hotness, and give C2 
> tiered compiled methods a higher hotness count when found on the stack.
>
That's a good idea. I will look into different eviction strategies.

> cheers,
>
> Chris
>>
>>
>>> Please verify your results with specjvm98 using an artificially 
>>> small codecache size. First do a run to get code cache usage using 
>>> -XX:+PrintCodeCache. For example:
>>>
>>> CodeCache: size=32768Kb used=3758Kb max_used=3758Kb free=29009Kb
>>>
>>> Take the "used" size, divide by 2, and then add 1500k 
>>> (CodeCacheFlushingMinimumFreeSpace). Use that for your new 
>>> ReservedCodeCacheSize. From my recent experience, you should see 
>>> less than a 10% performance degradation than when not constraining 
>>> the codecache size in this manner. However, what's most important is 
>>> that performance using this configuration is the same or better than 
>>> without your changes. Take another 500-1000k off the code cache size 
>>> and measure again. You'll see much bigger performance degradation, 
>>> but once again what is important is that with your change in place, 
>>> performance has not gotten any worse. Do the same running Nashorn 
>>> with the v8 benchmarks.
>>>
>>> thanks,
>>>
>>> Chris
>>>
>>> On 8/21/13 7:41 AM, hotspot-compiler-dev-request at openjdk.java.net 
>>> wrote:
>>>> Hi all,
>>>>
>>>> could I have reviews for this patch? Please note
>>>> that I do not yet feel very confident with the sweeper,
>>>> so please take a close look.
>>>>
>>>> jbs:https://jbs.oracle.com/bugs/browse/JDK-8020151
>>>> webrev:http://cr.openjdk.java.net/~anoll/8020151/webrev.00/ 
>>>> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.00/>
>>>>
>>>>
>>>> Many thanks in advance,
>>>> Albert
>>>>
>>>>
>>>> Problem: There can be large performance regressions when the code 
>>>> cache
>>>> fills up. There are
>>>> several reasons for the performance regression: First (1), when the 
>>>> code
>>>> cache is full and methods
>>>> are speculatively disconnected, the oldest methods (based on 
>>>> compilation
>>>> ID) are scheduled for
>>>> flushing. This can result in flushing hot methods. Second (2), when
>>>> compilation is disabled due to a full
>>>> code cache, the number of sweeps can go down. A lower number of sweep
>>>> operations results
>>>> in slower method flushing.
>>>>
>>>> Solution:
>>>> Introduce a hotness counter that is set to a particular value (e.g.,
>>>> 100) when there is an activation
>>>> of the method during stack scanning. The counter is decremented by 1
>>>> every time the sweeper
>>>> is invoked.
>>>>
>>>> ad (1):
>>>>     A VM operation that speculatively disconnects nmethods, selects 
>>>> the
>>>> methods that should be
>>>>     flushed based on the hotness. For example, if 50% of the code 
>>>> cache
>>>> shall be flushed, we flush
>>>>     those methods that have not been active while stack scanning 
>>>> for the
>>>> longest time. Note that
>>>>     while this strategy is more likely to flush cold methods, it is 
>>>> not
>>>> clear to what extent the new
>>>>     strategy fragments the code cache.
>>>>
>>>>     Changes in NMethodSweeper::speculative_disconnect_nmethods(bool 
>>>> is_full)
>>>>
>>>> ad (2)
>>>>     Currently, methods are removed from the code cache if:
>>>>       a) code cache is full
>>>>       b) class is unloaded
>>>>       c) method is replaced by another version (i.e., compiled with a
>>>> different tier)
>>>>       d) deopt
>>>>
>>>>      The current patch adds a 5-th possibility to remove a method from
>>>> the code cache.
>>>>      In particular, if a method has not been active during stack 
>>>> scanning
>>>> for a long-enough
>>>>      amount of time, the method is removed from the code cache. The
>>>> amount of time
>>>>      required to flush the method depends on the available space in 
>>>> the
>>>> code cache.
>>>>
>>>>      Here is one example: If a method was seen on a stack the hotness
>>>> counter
>>>>      is set to 100. A sweep operation takes roughly place every 100ms.
>>>> I.e., it takes
>>>>      100ms * 100 = 10s until the hotness counter reaches 0. The 
>>>> threshold
>>>> that determines
>>>>      if a method should be removed from the code cache is 
>>>> calculated as
>>>> follows:
>>>>
>>>>      threshold = -100 + (CodeCache::reverse_free_ratio() *
>>>> NMethodSweepActivity)
>>>>
>>>>       For example, if 25% of the code cache is free, 
>>>> reverse_free_ratio
>>>> returns 4.
>>>>       The default value of NMethodSweepActivity is 10. As a result,
>>>> threshold = -60.
>>>>       Consequently, all methods that have a hotness value smaller than
>>>> -60 (which
>>>>       means they have not been seen on the stack for 16s) are 
>>>> scheduled
>>>> to be flushed
>>>>       from the code cache. See an illustration of the threshold as a
>>>> function of the available
>>>>       code cache in threshold.pdf
>>>>
>>>>       Note that NMethodSweepActivity is a parameter that can be 
>>>> specified
>>>> via a -XX
>>>>       flag.
>>>>
>>>> Changes in NMethodSweeper::sweep_code_cache()
>>>>
>>>>
>>>> A very preliminary performance evaluation looks promising. I used the
>>>> DaCapo
>>>> benchmarks where a series of benchmarks is executed in the same VM 
>>>> instance.
>>>> See performance.pdf . The x-axis shows the benchmarks. Assume we 
>>>> have 2
>>>> benchmarks
>>>> (BM). The execution sequence is as follows:
>>>>
>>>> BM1 (Run 1-1)
>>>> BM1 (Run 2-1)
>>>> BM2 (Run 1-1)
>>>> BM2 (Run 2-1)
>>>>
>>>> BM1 (Run 1-2)
>>>> BM1 (Run 2-2)
>>>> BM2 (Run 1-2)
>>>> BM2 (Run 2-2)
>>>>
>>>>
>>>> A value larger than 0 on the x-axis indicates that the version 
>>>> including
>>>> the proposed patch is faster.
>>>> I.e., the values are calculated as follows: (T_original / 
>>>> T_with_patch)
>>>> - 1. T is the execution time
>>>> (wall clock time) of the benchmark. ReservedCodeCacheSize is set to
>>>> 50m.  I used three runs and
>>>> the arithmetic average to compare the numbers. I know, we need much 
>>>> more
>>>> data, however,
>>>> I think we can see a trend.
>>>>
>>>> The current patch does not trigger a warning that the code cache is 
>>>> full
>>>> and compilation has been
>>>> disabled.
>>>>
>>>> Please let me know that you think.
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>> URL:http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130821/6478f023/attachment.html 
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: threshold.pdf
>>>> Type: application/pdf
>>>> Size: 13321 bytes
>>>> Desc: not available
>>>> Url 
>>>> :http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130821/6478f023/threshold.pdf 
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: performance.pdf
>>>> Type: application/pdf
>>>> Size: 14641 bytes
>>>> Desc: not available
>>>> Url 
>>>> :http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130821/6478f023/performance.pdf 
>>>
>>>
>>>
>>>
>>
>