RFR(S): 8020151: PSR:PERF Large performance regressions when code cache is filled

Thu Aug 22 01:02:16 PDT 2013

May be instead of "(_traversals > _last_flush_traversal_id + 2)" we should timestamp a method when it's disconnected, and then use a rule like if a method has been disconnected for k * reverse_free_ratio() seconds then it's ok to kill it. We can also sort the nmethods that pass that filter by the amount of time they were disconnected and select most likely candidates for flushing. This should allow to basically do disconnect/flush in every traversal, which should make things faster. Timestamps would be obtained only once per traversal or something like that. What do you think?

Pretty cool idea to reverse-prioritize disconnects on hotness.

igor

On Aug 21, 2013, at 4:42 AM, Albert Noll <albert.noll at oracle.com> wrote:

> Hi all,
> 
> could I have reviews for this patch? Please note
> that I do not yet feel very confident with the sweeper,
> so please take a close look.
> 
> jbs: https://jbs.oracle.com/bugs/browse/JDK-8020151
> webrev: http://cr.openjdk.java.net/~anoll/8020151/webrev.00/
> 
> 
> Many thanks in advance,
> Albert
> 
> 
> Problem: There can be large performance regressions when the code cache fills up. There are 
> several reasons for the performance regression: First (1), when the code cache is full and methods 
> are speculatively disconnected, the oldest methods (based on compilation ID) are scheduled for
> flushing. This can result in flushing hot methods. Second (2), when compilation is disabled due to a full
> code cache, the number of sweeps can go down. A lower number of sweep operations results 
> in slower method flushing.
> 
> Solution:
> Introduce a hotness counter that is set to a particular value (e.g., 100) when there is an activation
> of the method during stack scanning. The counter is decremented by 1 every time the sweeper 
> is invoked.
> 
> ad (1):
>   A VM operation that speculatively disconnects nmethods, selects the methods that should be
>   flushed based on the hotness. For example, if 50% of the code cache shall be flushed, we flush 
>   those methods that have not been active while stack scanning for the longest time. Note that 
>   while this strategy is more likely to flush cold methods, it is not clear to what extent the new 
>   strategy fragments the code cache.
> 
>   Changes in NMethodSweeper::speculative_disconnect_nmethods(bool is_full)
> 
> ad (2)
>   Currently, methods are removed from the code cache if:
>     a) code cache is full 
>     b) class is unloaded 
>     c) method is replaced by another version (i.e., compiled with a different tier) 
>     d) deopt
> 
>    The current patch adds a 5-th possibility to remove a method from the code cache. 
>    In particular, if a method has not been active during stack scanning for a long-enough 
>    amount of time, the method is removed from the code cache. The amount of time
>    required to flush the method depends on the available space in the code cache. 
>    
>    Here is one example: If a method was seen on a stack the hotness counter 
>    is set to 100. A sweep operation takes roughly place every 100ms. I.e., it takes 
>    100ms * 100 = 10s until the hotness counter reaches 0. The threshold that determines
>    if a method should be removed from the code cache is calculated as follows:
>  
>    threshold = -100 + (CodeCache::reverse_free_ratio() * NMethodSweepActivity)
> 
>     For example, if 25% of the code cache is free, reverse_free_ratio returns 4.
>     The default value of NMethodSweepActivity is 10. As a result, threshold = -60.
>     Consequently, all methods that have a hotness value smaller than -60 (which 
>     means they have not been seen on the stack for 16s) are scheduled to be flushed
>     from the code cache. See an illustration of the threshold as a function of the available
>     code cache in threshold.pdf
> 
>     Note that NMethodSweepActivity is a parameter that can be specified via a -XX
>     flag.
> 
> Changes in NMethodSweeper::sweep_code_cache()
> 
> 
> A very preliminary performance evaluation looks promising. I used the DaCapo 
> benchmarks where a series of benchmarks is executed in the same VM instance.
> See performance.pdf . The x-axis shows the benchmarks. Assume we have 2 benchmarks 
> (BM). The execution sequence is as follows:
> 
> BM1 (Run 1-1)
> BM1 (Run 2-1)
> BM2 (Run 1-1)
> BM2 (Run 2-1)
> 
> BM1 (Run 1-2)
> BM1 (Run 2-2)
> BM2 (Run 1-2)
> BM2 (Run 2-2)
> 
> 
> A value larger than 0 on the x-axis indicates that the version including the proposed patch is faster.
> I.e., the values are calculated as follows: (T_original / T_with_patch) - 1. T is the execution time 
> (wall clock time) of the benchmark. ReservedCodeCacheSize is set to 50m.  I used three runs and 
> the arithmetic average to compare the numbers. I know, we need much more data, however, 
> I think we can see a trend.
> 
> The current patch does not trigger a warning that the code cache is full and compilation has been
> disabled.
> 
> Please let me know that you think.
> <threshold.pdf><performance.pdf>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130821/7dcf387d/attachment.html