RFR(S): 8020151: PSR:PERF Large performance regressions when code cache is filled

Wed Aug 21 20:58:00 PDT 2013

Thank you, Albert

Very good work. It would be nice to get refworkload and nashorn 
performance data.

I thought you will throw out CodeCache::speculatively_disconnect(nm) 
code which we have because we didn't know which method is hot/used. We 
only invoke it when code cache is almost full and when we need space as 
soon as possible. As you pointed we don't get that space until 2 
additional traversals:

(_traversals > _last_flush_traversal_id + 2)

Actually it is not 'flush' traversals but the number of stack scans and 
codecache sweeps because _traversals is incremented in scan_stacks() 
when new sweep is initiated. We need to rename _last_flush_traversal_id.

So why not mark not_entrant immediately since we now know which methods 
are hot and which are not? I did see this message:

### Couldn't make progress on some nmethods so stopping sweep

I think we dont' do '_resweep = true;' when we mark disconnected methods 
as non_entrant in process_nmethod().

Also CodeCacheFlushingFraction is 2 - removing half of nmethods could be 
overkill. Can you investigate how many nmethod we removed in reality? 
May be you already have the answer.

Could you also check sweeping invocation? I see that it is done when 
(_invocations > 0) in NMethodSweeper::possibly_sweep(). _invocations is 
set to NmethodSweepFraction in scan_stacks() when (!sweep_in_progress() 
&& _resweep). So is it possible to get _resweep so we can't make progress?

I think you need to initialize _hotness_counter to 100 in constructors 
to avoid immediate removal from codecache.

We use anonymous enum to define values like 
hc_reset_value/hc_dec_value. And they should be in nmethod class where 
counter field is defined. And they can be private and only used in:

+  void dec_hotness_counter()   { _hotness_counter -= hc_dec_value;   }
+  void reset_hotness_counter() { _hotness_counter  = hc_reset_value; }

Why you skip first method and look on not alive (next_nmethod() is used) ?:

+    nmethod* nm = CodeCache::next_nmethod(CodeCache::first());

And why you used next_nmethod() for iteration instead of 
alive_nmethod()? I see you added the check nm->is_alive() but why?

Also please fix flag name CodeCacheMinimumFlushInterval in the comment
in handle_full_code_cache() to the correct one MinCodeCacheFlushingInterval.

In ciEnv.cpp you did only style clean up. Do it only if you change 
something in that code.

Thanks,
Vladimir

On 8/21/13 7:42 AM, Albert Noll wrote:
> Hi all,
>
> could I have reviews for this patch? Please note
> that I do not yet feel very confident with the sweeper,
> so please take a close look.
>
> jbs: https://jbs.oracle.com/bugs/browse/JDK-8020151
> webrev: http://cr.openjdk.java.net/~anoll/8020151/webrev.00/
> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.00/>
>
>
> Many thanks in advance,
> Albert
>
>
> Problem: There can be large performance regressions when the code cache
> fills up. There are
> several reasons for the performance regression: First (1), when the code
> cache is full and methods
> are speculatively disconnected, the oldest methods (based on compilation
> ID) are scheduled for
> flushing. This can result in flushing hot methods. Second (2), when
> compilation is disabled due to a full
> code cache, the number of sweeps can go down. A lower number of sweep
> operations results
> in slower method flushing.
>
> Solution:
> Introduce a hotness counter that is set to a particular value (e.g.,
> 100) when there is an activation
> of the method during stack scanning. The counter is decremented by 1
> every time the sweeper
> is invoked.
>
> ad (1):
>    A VM operation that speculatively disconnects nmethods, selects the
> methods that should be
>    flushed based on the hotness. For example, if 50% of the code cache
> shall be flushed, we flush
>    those methods that have not been active while stack scanning for the
> longest time. Note that
>    while this strategy is more likely to flush cold methods, it is not
> clear to what extent the new
>    strategy fragments the code cache.
>
>    Changes in NMethodSweeper::speculative_disconnect_nmethods(bool is_full)
>
> ad (2)
>    Currently, methods are removed from the code cache if:
>      a) code cache is full
>      b) class is unloaded
>      c) method is replaced by another version (i.e., compiled with a
> different tier)
>      d) deopt
>
>     The current patch adds a 5-th possibility to remove a method from
> the code cache.
>     In particular, if a method has not been active during stack scanning
> for a long-enough
>     amount of time, the method is removed from the code cache. The
> amount of time
>     required to flush the method depends on the available space in the
> code cache.
>
>     Here is one example: If a method was seen on a stack the hotness
> counter
>     is set to 100. A sweep operation takes roughly place every 100ms.
> I.e., it takes
>     100ms * 100 = 10s until the hotness counter reaches 0. The threshold
> that determines
>     if a method should be removed from the code cache is calculated as
> follows:
>
>     threshold = -100 + (CodeCache::reverse_free_ratio() *
> NMethodSweepActivity)
>
>      For example, if 25% of the code cache is free, reverse_free_ratio
> returns 4.
>      The default value of NMethodSweepActivity is 10. As a result,
> threshold = -60.
>      Consequently, all methods that have a hotness value smaller than
> -60 (which
>      means they have not been seen on the stack for 16s) are scheduled
> to be flushed
>      from the code cache. See an illustration of the threshold as a
> function of the available
>      code cache in threshold.pdf
>
>      Note that NMethodSweepActivity is a parameter that can be specified
> via a -XX
>      flag.
>
> Changes in NMethodSweeper::sweep_code_cache()
>
>
> A very preliminary performance evaluation looks promising. I used the
> DaCapo
> benchmarks where a series of benchmarks is executed in the same VM instance.
> See performance.pdf . The x-axis shows the benchmarks. Assume we have 2
> benchmarks
> (BM). The execution sequence is as follows:
>
> BM1 (Run 1-1)
> BM1 (Run 2-1)
> BM2 (Run 1-1)
> BM2 (Run 2-1)
>
> BM1 (Run 1-2)
> BM1 (Run 2-2)
> BM2 (Run 1-2)
> BM2 (Run 2-2)
>
>
> A value larger than 0 on the x-axis indicates that the version including
> the proposed patch is faster.
> I.e., the values are calculated as follows: (T_original / T_with_patch)
> - 1. T is the execution time
> (wall clock time) of the benchmark. ReservedCodeCacheSize is set to
> 50m.  I used three runs and
> the arithmetic average to compare the numbers. I know, we need much more
> data, however,
> I think we can see a trend.
>
> The current patch does not trigger a warning that the code cache is full
> and compilation has been
> disabled.
>
> Please let me know that you think.