RFR(S): 8020151: PSR:PERF Large performance regressions when code cache is filled
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Aug 21 20:58:00 PDT 2013
Thank you, Albert
Very good work. It would be nice to get refworkload and nashorn
performance data.
I thought you will throw out CodeCache::speculatively_disconnect(nm)
code which we have because we didn't know which method is hot/used. We
only invoke it when code cache is almost full and when we need space as
soon as possible. As you pointed we don't get that space until 2
additional traversals:
(_traversals > _last_flush_traversal_id + 2)
Actually it is not 'flush' traversals but the number of stack scans and
codecache sweeps because _traversals is incremented in scan_stacks()
when new sweep is initiated. We need to rename _last_flush_traversal_id.
So why not mark not_entrant immediately since we now know which methods
are hot and which are not? I did see this message:
### Couldn't make progress on some nmethods so stopping sweep
I think we dont' do '_resweep = true;' when we mark disconnected methods
as non_entrant in process_nmethod().
Also CodeCacheFlushingFraction is 2 - removing half of nmethods could be
overkill. Can you investigate how many nmethod we removed in reality?
May be you already have the answer.
Could you also check sweeping invocation? I see that it is done when
(_invocations > 0) in NMethodSweeper::possibly_sweep(). _invocations is
set to NmethodSweepFraction in scan_stacks() when (!sweep_in_progress()
&& _resweep). So is it possible to get _resweep so we can't make progress?
I think you need to initialize _hotness_counter to 100 in constructors
to avoid immediate removal from codecache.
We use anonymous enum to define values like
hc_reset_value/hc_dec_value. And they should be in nmethod class where
counter field is defined. And they can be private and only used in:
+ void dec_hotness_counter() { _hotness_counter -= hc_dec_value; }
+ void reset_hotness_counter() { _hotness_counter = hc_reset_value; }
Why you skip first method and look on not alive (next_nmethod() is used) ?:
+ nmethod* nm = CodeCache::next_nmethod(CodeCache::first());
And why you used next_nmethod() for iteration instead of
alive_nmethod()? I see you added the check nm->is_alive() but why?
Also please fix flag name CodeCacheMinimumFlushInterval in the comment
in handle_full_code_cache() to the correct one MinCodeCacheFlushingInterval.
In ciEnv.cpp you did only style clean up. Do it only if you change
something in that code.
Thanks,
Vladimir
On 8/21/13 7:42 AM, Albert Noll wrote:
> Hi all,
>
> could I have reviews for this patch? Please note
> that I do not yet feel very confident with the sweeper,
> so please take a close look.
>
> jbs: https://jbs.oracle.com/bugs/browse/JDK-8020151
> webrev: http://cr.openjdk.java.net/~anoll/8020151/webrev.00/
> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.00/>
>
>
> Many thanks in advance,
> Albert
>
>
> Problem: There can be large performance regressions when the code cache
> fills up. There are
> several reasons for the performance regression: First (1), when the code
> cache is full and methods
> are speculatively disconnected, the oldest methods (based on compilation
> ID) are scheduled for
> flushing. This can result in flushing hot methods. Second (2), when
> compilation is disabled due to a full
> code cache, the number of sweeps can go down. A lower number of sweep
> operations results
> in slower method flushing.
>
> Solution:
> Introduce a hotness counter that is set to a particular value (e.g.,
> 100) when there is an activation
> of the method during stack scanning. The counter is decremented by 1
> every time the sweeper
> is invoked.
>
> ad (1):
> A VM operation that speculatively disconnects nmethods, selects the
> methods that should be
> flushed based on the hotness. For example, if 50% of the code cache
> shall be flushed, we flush
> those methods that have not been active while stack scanning for the
> longest time. Note that
> while this strategy is more likely to flush cold methods, it is not
> clear to what extent the new
> strategy fragments the code cache.
>
> Changes in NMethodSweeper::speculative_disconnect_nmethods(bool is_full)
>
> ad (2)
> Currently, methods are removed from the code cache if:
> a) code cache is full
> b) class is unloaded
> c) method is replaced by another version (i.e., compiled with a
> different tier)
> d) deopt
>
> The current patch adds a 5-th possibility to remove a method from
> the code cache.
> In particular, if a method has not been active during stack scanning
> for a long-enough
> amount of time, the method is removed from the code cache. The
> amount of time
> required to flush the method depends on the available space in the
> code cache.
>
> Here is one example: If a method was seen on a stack the hotness
> counter
> is set to 100. A sweep operation takes roughly place every 100ms.
> I.e., it takes
> 100ms * 100 = 10s until the hotness counter reaches 0. The threshold
> that determines
> if a method should be removed from the code cache is calculated as
> follows:
>
> threshold = -100 + (CodeCache::reverse_free_ratio() *
> NMethodSweepActivity)
>
> For example, if 25% of the code cache is free, reverse_free_ratio
> returns 4.
> The default value of NMethodSweepActivity is 10. As a result,
> threshold = -60.
> Consequently, all methods that have a hotness value smaller than
> -60 (which
> means they have not been seen on the stack for 16s) are scheduled
> to be flushed
> from the code cache. See an illustration of the threshold as a
> function of the available
> code cache in threshold.pdf
>
> Note that NMethodSweepActivity is a parameter that can be specified
> via a -XX
> flag.
>
> Changes in NMethodSweeper::sweep_code_cache()
>
>
> A very preliminary performance evaluation looks promising. I used the
> DaCapo
> benchmarks where a series of benchmarks is executed in the same VM instance.
> See performance.pdf . The x-axis shows the benchmarks. Assume we have 2
> benchmarks
> (BM). The execution sequence is as follows:
>
> BM1 (Run 1-1)
> BM1 (Run 2-1)
> BM2 (Run 1-1)
> BM2 (Run 2-1)
>
> BM1 (Run 1-2)
> BM1 (Run 2-2)
> BM2 (Run 1-2)
> BM2 (Run 2-2)
>
>
> A value larger than 0 on the x-axis indicates that the version including
> the proposed patch is faster.
> I.e., the values are calculated as follows: (T_original / T_with_patch)
> - 1. T is the execution time
> (wall clock time) of the benchmark. ReservedCodeCacheSize is set to
> 50m. I used three runs and
> the arithmetic average to compare the numbers. I know, we need much more
> data, however,
> I think we can see a trend.
>
> The current patch does not trigger a warning that the code cache is full
> and compilation has been
> disabled.
>
> Please let me know that you think.
More information about the hotspot-compiler-dev
mailing list