RFR(S): 8020151: PSR:PERF Large performance regressions when code cache is filled

Tue Sep 17 00:01:17 PDT 2013

Vladimir, Igor, thanks for reviewing this.

I will take care of the correct ordering of memory operations.

Best,
Albert

On 16.09.2013 23:36, Igor Veresov wrote:
> In nmethod.cpp:
> You definitely need a store-store barrier for non-TSO architectures 
> after the mark_as_seen_on_stack() call on line 1360. Otherwise it 
> still can be reordered by the CPU with respect to the following state 
> assignment. Also neither of these state variables are volatile in 
> nmethod, so even the compiler may reorder the stores.
>
> Otherwise look good.
>
> igor
>
>
>
> On Sep 16, 2013, at 6:20 AM, Albert Noll <albert.noll at oracle.com 
> <mailto:albert.noll at oracle.com>> wrote:
>
>> Hi,
>>
>> thanks again for the review. I addressed your comments.
>> Here is the new webrev:
>>
>> http://cr.openjdk.java.net/~anoll/8020151/webrev.03/ 
>> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.03/>
>>
>> The version is currently in jprt. If you are fine with this version 
>> we can send the
>> binaries to Scott so he can test it. Please let me know if you are 
>> fine with that.
>>
>> Best,
>> Albert
>>
>>
>>
>> On 14.09.2013 00:00, Vladimir Kozlov wrote:
>>> sweeper:
>>>
>>> Nice comment about nmethod removal!
>>>
>>> I thought you will remove next check in mark_active_nmethods(), we 
>>> talked about that:
>>>
>>> if (CompiledIC_lock->is_locked() || Patching_lock->is_locked()) return;
>>>
>>> Swap next lines:
>>>
>>>  223   // Only set hotness counter
>>>  224   } else {
>>>
>>> Next comment does not match the check (flushed_memory > 0), also 
>>> rename it to 'freed_memory':
>>>
>>>  367   // Note that typically several kB are released per sweep 
>>> cycle (16MB).
>>>
>>> Next original comment was correct:
>>>  397   // since the locks acquired below might safepoint.
>>>
>>> It means that safepoint may happen during lock, may be:
>>>
>>>  397   // since safepoint may happen during acquired below locks.
>>>
>>> Move next comment below if ((nm->hotness_counter() < threshold) check.
>>>
>>> I think you need to explain what threshold formula means:
>>>
>>> +        double threshold = -reset_val + 
>>> (CodeCache::reverse_free_ratio() * NmethodSweepActivity);
>>>
>>> The less space we have - the bigger reverse_free_ratio(), when it 
>>> reaches reset_val/NmethodSweepActivity threshold become > 0 and we 
>>> start mark nmethods.
>>>
>>> What happened to sort_nmethods_by_hotness()?
>>>
>>> arguments.cpp: add checks that specified NmethodSweepActivity and 
>>> NmethodSweepFraction has reasonable values.
>>>
>>> Additional comments:
>>>
>>> I talked with Igor and we think that instead of setting and checking 
>>> traversal_count we need to add several additional states to the 
>>> nmethod and have well defined state transition machine.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/13/13 1:11 PM, Albert Noll wrote:
>>>> Hi Valadimir,
>>>>
>>>> thanks for looking at this. See comments inline:
>>>>
>>>> On 13.09.2013 21:16, Vladimir Kozlov wrote:
>>>>> We need to ask PSR (Scott Oaks) to test these changes before we push
>>>>> them. Prepare JPRT build bundles for them.
>>>>>
>>>> Started jprt (Stockholm) more than 5 hours ago - still in the queue.
>>>>> I am glad that we can remove speculatively_disconnect without any
>>>>> performance issues. But we need to ask Scott to test it.
>>>> Yes, we have to have test the patch on the original program which
>>>> exposed the bug.
>>>>>
>>>>> You need to fix an other bug (8022968) filed by Chris Pl. when C2
>>>>> failed to allocate scratch buffer we do not report full codecache.
>>>>> Otherwise we continue to compile and fail without sweeping. Currently
>>>>> handle_full_code_cache() is called only from 
>>>>> ciEnv::register_method().
>>>>> See calls init_scratch_buffer_blob() and cb->initialize() in
>>>>> Compile::init_buffer().
>>>>>
>>>> Yes, I'll look at this on Monday. I think it should be fairly
>>>> independent of this change.
>>>>> Also you need to coordinate your changes with Chris's changes for:
>>>>>
>>>>> 8016270: "CodeCache is full" message and compilation disabled, but
>>>>> codecache is not actually full
>>>>>
>>>>> it switch off compilation gradually based on failed compilation 
>>>>> method
>>>>> size.
>>>>>
>>>> I have to look this bug in more detail.
>>>>> About changes:
>>>>>
>>>>> nmethod.cpp:
>>>>>
>>>>> Adjust comment that race is during sweeping:
>>>>>
>>>>>     // transition its state from 'not_entrant' to 'zombie'
>>>>>     // during CodeCache sweeping without having to wait
>>>>>     // for stack scanning.
>>>>>
>>>>> mark_as_seen_on_stack() has assert which you will hit. I surprise 
>>>>> that
>>>>> you don't not. You can relax the assert to check is_alive() instead.
>>>>>
>>>> Thanks for catching this. That was a last second change.
>>>>> compileBroker.cpp:
>>>>>
>>>>> Why you need this? And why scaling?:
>>>>>
>>>>> +        // Only one thread at a time can do sweeping. Scale the
>>>>> +        // wait time according to the number of compiler threads.
>>>>> +        wait_time = 100 * CICompilerCount;
>>>>>
>>>> I am not sure if we need this. The idea behind is that currently only
>>>> one thread can sweep
>>>> at a time. The scaling has the effect that - on average - every 100ms
>>>> there is at least one
>>>> sweep if the code cache is full.
>>>>> Swap next lines:
>>>>>
>>>>> 1953         NMethodSweeper::possibly_sweep();
>>>>> 1954 NMethodSweeper::log_sweep("disable_compiler");
>>>>>
>>>>>
>>>> I'll change that.
>>>>> I will review sweeper.cpp changes later.
>>>>>
>>>> Many thanks,
>>>> Albert
>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 9/13/13 8:48 AM, Albert Noll wrote:
>>>>>> Hi,
>>>>>>
>>>>>> thanks again for the feedback. This is the updated version.
>>>>>> http://cr.openjdk.java.net/~anoll/8020151/webrev.02/
>>>>>> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.02/>
>>>>>>
>>>>>> @Vladimir:
>>>>>> I hope I could fix the code issues properly. I tried your suggested
>>>>>> change in JavaCalls::call_helper() but it did not lead
>>>>>> to more code reanimation.
>>>>>>
>>>>>> @Chris P.
>>>>>> Thanks again for your feedback. You are right, the algorithm 
>>>>>> could be
>>>>>> made more efficient.
>>>>>> However, I did not consider that code is being re-animated, so it
>>>>>> probably does not help
>>>>>> to reduce fragmentation. We are working on a patch where we have
>>>>>> multiple code heaps.
>>>>>> This should reduce fragmentation, since we can put subs, 
>>>>>> adapters, code
>>>>>> compiled with
>>>>>> different tiers into different heaps. More about this will follow 
>>>>>> most
>>>>>> likely next week.
>>>>>>
>>>>>> I did experiments that show the performance of the current version.
>>>>>> Short version:
>>>>>> The current patch performs, with the used benchmarks, significantly
>>>>>> better than
>>>>>> the default way of handling a full code cache. This version has no
>>>>>> performance regression
>>>>>> when the code cache does not filled up.
>>>>>>
>>>>>> More detailed data can be found here:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8020151
>>>>>>
>>>>>>
>>>>>> Change since last version (webrev.01) :
>>>>>>
>>>>>> Since the standard way of handling a full code cache
>>>>>> (speculatively_disconnect + re-animate)
>>>>>> performs worse (and does not solve the problem) I see no reason 
>>>>>> to keep.
>>>>>> As a result, I removed
>>>>>> that code.
>>>>>>
>>>>>> This is a large change to an import part of Hotspot; detailed and
>>>>>> critical reviews are very welcome.
>>>>>>
>>>>>> Best,
>>>>>> Albert
>>>>>>
>>>>>>
>>>>>> On 04.09.2013 22:33, Vladimir Kozlov wrote:
>>>>>>> On 9/4/13 4:07 AM, Albert Noll wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> here is an updated version of the patch. In summary, changes to 
>>>>>>>> the
>>>>>>>> previous version are:
>>>>>>>>
>>>>>>>> 1) Adapt NmethodSweepFraction based on the code cache size:
>>>>>>>>      Before the patch, NmethodSweepFraction was defined as a 
>>>>>>>> constant
>>>>>>>> value.
>>>>>>>>      I.e., NmethodSweepFraction was 16 for a code cache size of 
>>>>>>>> 32m as
>>>>>>>> well as a
>>>>>>>>      code cache size of 256m. This seems unreasonable, since
>>>>>>>> NmethodSweepFraction
>>>>>>>>      was originally introduce to keep the time spent in the 
>>>>>>>> sweeper
>>>>>>>> (note that sweeping
>>>>>>>>      was originally done at safepoints) at a reasonable level.
>>>>>>>
>>>>>>> Albert,
>>>>>>>
>>>>>>> General question first. Do we still need to sweep in fractions 
>>>>>>> since
>>>>>>> we don't do sweeping at safepoint and it stops for safepoints? The
>>>>>>> only drawback is one Compiler thread is used for sweeping and can't
>>>>>>> compile when it sweeps. But we will address that later by using
>>>>>>> special sweeping thread.
>>>>>>>
>>>>>>>>      The patch adapts NmethodSweepFraction to the
>>>>>>>> ReservedCodeCacheSize.
>>>>>>>> I.e.,
>>>>>>>>      each sweep operation roughly  covers 16m of the code cache.
>>>>>>>
>>>>>>> Why you stop at 16?:
>>>>>>>
>>>>>>> +    } else if (ReservedCodeCacheSize > (256 * M)) {
>>>>>>> +      FLAG_SET_DEFAULT(NmethodSweepFraction, 16);
>>>>>>>
>>>>>>> ReservedCodeCacheSize could be 2GB so one sweep have to process 
>>>>>>> 128Mb.
>>>>>>> Next should cover all cases, I think:
>>>>>>>
>>>>>>> FLAG_SET_DEFAULT(NmethodSweepFraction, 1 + ReservedCodeCacheSize 
>>>>>>> / (16
>>>>>>> * M))
>>>>>>>
>>>>>>>>
>>>>>>>> 2) The parameter NmethodSweepFraction is replaced by a function 
>>>>>>>> that
>>>>>>>> computes the
>>>>>>>>      NmethodSweepFraction based on the code cache size. See
>>>>>>>> 'sweeper.cpp' for a detailed
>>>>>>>>      description
>>>>>>>>
>>>>>>>> 3) The initial hotness counter depends on the code cache size. 
>>>>>>>> More
>>>>>>>> specifically, the initial
>>>>>>>>      hotness counter is 2*ReservedCodeCacheSize. As a result, a 
>>>>>>>> method
>>>>>>>> stays longer in the
>>>>>>>>      code cache if ReservedCodeCacheSize is larger.
>>>>>>>
>>>>>>> I don't like spreading '(ReservedCodeCacheSize / M) * 2' through 
>>>>>>> the
>>>>>>> code. Use NMethodSweeper::get_hotness_counter_reset_val() in 
>>>>>>> nmethod
>>>>>>> constuctors.
>>>>>>>
>>>>>>> Also you need to check for min value if ReservedCodeCacheSize < 
>>>>>>> 1Mb.
>>>>>>>
>>>>>>> Also drop 'get_' from name. We usually don't use 'get_' in accessor
>>>>>>> methods. the same for nmethod::get_hotness_counter().
>>>>>>>
>>>>>>> I already asked you to make hotness_counter_decay member of 
>>>>>>> nmethod so
>>>>>>> that you don't need to pass it to dec_hotness_counter(). Or 
>>>>>>> remove it
>>>>>>> at all since it is always 1:
>>>>>>>
>>>>>>> void dec_hotness_counter() { _hotness_counter--; }
>>>>>>>
>>>>>>> I would prefer all code which modifies _hotness_counter be in 
>>>>>>> nmethod
>>>>>>> class.
>>>>>>>
>>>>>>>>
>>>>>>>> 4) Newly compiled methods are guaranteed to not be evicted by the
>>>>>>>> sweeper for 10
>>>>>>>>      sweep cycles. This ensures the newly compiled methods are not
>>>>>>>> immediately made
>>>>>>>>      not entrant after compialtion.
>>>>>>>>
>>>>>>>> 5) The hotness counter is reset EVERY TIME active methods are 
>>>>>>>> scanned.
>>>>>>>> In the previous version
>>>>>>>>      the hotness counter was only reset after a full sweep 
>>>>>>>> cycle of
>>>>>>>> the
>>>>>>>> code cache. Resetting the
>>>>>>>>     hotness counter more frequently provides a better hotness 
>>>>>>>> coverage
>>>>>>>> of methods.
>>>>>>>>
>>>>>>>> 6) Methods are flushed in blocks of 1m. The algorithm computes the
>>>>>>>> average hotness of a
>>>>>>>>      a nmethod block and evicts the entire block. This should 
>>>>>>>> reduce
>>>>>>>> fragmentation.
>>>>>>>>
>>>>>>>> Please let me know what you think about these changes. Performance
>>>>>>>> results can be found at:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8020151
>>>>>>>> I will continuously provide more results.
>>>>>>>>
>>>>>>>> Here is the new webrev:
>>>>>>>> http://cr.openjdk.java.net/~anoll/8020151/webrev.01/
>>>>>>>> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.01/>
>>>>>>>
>>>>>>> Very nice comment in sweeper.hpp. Could you please extend it?
>>>>>>>
>>>>>>> 2) sweep nmethods
>>>>>>>
>>>>>>> Sweeping is currently done by Compiler thread between 
>>>>>>> compilations or
>>>>>>> at least each 5 sec (NmethodSweepCheckInterval) when CodeCache 
>>>>>>> is full.
>>>>>>>
>>>>>>> The comment is not cleare that nmethods could be marked as 
>>>>>>> not-entrant
>>>>>>> by different code (deoptimization, dependency invalidation, replace
>>>>>>> old nmethod) and not just by sweeper (which does that only for 
>>>>>>> flushed
>>>>>>> nmethods).
>>>>>>>
>>>>>>> Each nmethod's state change happens during separate sweeps. It may
>>>>>>> take at least 3 sweeps before nmethod's space is freed.
>>>>>>>
>>>>>>> 3) code cache flushing
>>>>>>> ... as a VM operation at safepoint.
>>>>>>>
>>>>>>> Flashing VM operation requires safepoint.
>>>>>>>
>>>>>>> As you pointed current code re-animates nmethods only if we get
>>>>>>> compilation request. It means that it depends on profiling 
>>>>>>> counters.
>>>>>>> It could be the case that a method have to be executed in 
>>>>>>> interpreter
>>>>>>> for some time again to hit compilation threshold.
>>>>>>>
>>>>>>> I think we need to re-animate nmethod immediately during java call:
>>>>>>>
>>>>>>> JavaCalls::call_helper()
>>>>>>>
>>>>>>>   if (CompilationPolicy::must_be_compiled(method)) {
>>>>>>>     CompileBroker::compile_method(method, InvocationEntryBci,
>>>>>>>
>>>>>>> CompilationPolicy::policy()->initial_compile_level(),
>>>>>>>                                   methodHandle(), 0,
>>>>>>> "must_be_compiled", CHECK);
>>>>>>> + } else if (UseCodeCacheFlushing) {
>>>>>>> +   nmethod* saved = CodeCache::reanimate_saved_code(method());
>>>>>>> +   if (saved != NULL) {
>>>>>>> +     method->set_code(method, saved);
>>>>>>> +   }
>>>>>>>   }
>>>>>>>
>>>>>>> Can you verify how this work?
>>>>>>>
>>>>>>> I will send additional comments when look through the rest of code.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Albert
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22.08.2013 15:16, Albert Noll wrote:
>>>>>>>>> Hi Igor,
>>>>>>>>>
>>>>>>>>> thanks again for your comments. You are right. I will run some
>>>>>>>>> benchmarks to see if removing
>>>>>>>>> the disconnect logic is feasible or not.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Albert
>>>>>>>>>
>>>>>>>>> On 22.08.2013 12:39, Igor Veresov wrote:
>>>>>>>>>> It's worth a try. But the "hotness" logic is probabilistic and
>>>>>>>>>> imprecise - it will notice only methods that are on stack during
>>>>>>>>>> safepoints. Those are going to be pretty narrow snapshots of
>>>>>>>>>> activity. I suspect that for large flat profiles (like 
>>>>>>>>>> enterprise
>>>>>>>>>> apps and friends) you could be missing methods that are 
>>>>>>>>>> rather warm
>>>>>>>>>> in reality, which will cause recompilation oscillations. 
>>>>>>>>>> Although if
>>>>>>>>>> the statistics are allowed to accumulate enough may be it's 
>>>>>>>>>> going to
>>>>>>>>>> work out, an experiment will tell. The patch is a good start, 
>>>>>>>>>> the
>>>>>>>>>> stack sampling, IMO, is totally the right approach for 
>>>>>>>>>> filtering out
>>>>>>>>>> the hot methods.
>>>>>>>>>>
>>>>>>>>>> The "disconnect" logic on the other hand is sort of precise.
>>>>>>>>>> Although, looking now at the code it's not quite clear to me 
>>>>>>>>>> how it
>>>>>>>>>> works, it doesn't seem to be any patching going on to divert the
>>>>>>>>>> control for the cases when the nmethod is called directly or 
>>>>>>>>>> through
>>>>>>>>>> an IC. So I guess it's not really a full disconnect? Anyways, in
>>>>>>>>>> theory, with some work, we can make the disconnect logic to
>>>>>>>>>> precisely
>>>>>>>>>> measure the time the method is inactive. Which should provide
>>>>>>>>>> precise
>>>>>>>>>> information about the warm/cold methods.
>>>>>>>>>>
>>>>>>>>>> Btw, also just noticed a bunch of flaws in the interaction of 
>>>>>>>>>> the
>>>>>>>>>> disconnect logic and tiered. The nmethod's "reconnection" 
>>>>>>>>>> happens in
>>>>>>>>>> CompileBroker::compile_method(), which firstly will be called by
>>>>>>>>>> tiered only after a rather substantial number of invocations 
>>>>>>>>>> in the
>>>>>>>>>> interpreter (up to 128), and secondly will be subject to all the
>>>>>>>>>> prioritization rules (it probably should not), and also we don't
>>>>>>>>>> check if the comp level of the reanimated nmethod matches the
>>>>>>>>>> request. If the disconnect logic is to stay, the interpreter 
>>>>>>>>>> should
>>>>>>>>>> be able to know if the method has saved code and be able to call
>>>>>>>>>> into
>>>>>>>>>> the runtime immediately to reanimate it.
>>>>>>>>>>
>>>>>>>>>> igor
>>>>>>>>>>
>>>>>>>>>> On Aug 21, 2013, at 10:27 PM, Albert Noll 
>>>>>>>>>> <albert.noll at oracle.com
>>>>>>>>>> <mailto:albert.noll at oracle.com>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Igor,
>>>>>>>>>>>
>>>>>>>>>>> thanks for looking at the patch. Actually, I think - just as
>>>>>>>>>>> Vladimir pointed out - that we can get rid of
>>>>>>>>>>> the "disconnect" logic. We now have the hotness of a method 
>>>>>>>>>>> and if
>>>>>>>>>>> the code cache fills up, we and
>>>>>>>>>>> we decide to schedule the method for removal, we set it to
>>>>>>>>>>> not_entrant.
>>>>>>>>>>> It seems that adding the method to the list of disconnected 
>>>>>>>>>>> methods
>>>>>>>>>>> just buys a little more time until we decide to make the method
>>>>>>>>>>> not-entrant. However, we can have the same effect by setting 
>>>>>>>>>>> the
>>>>>>>>>>> threshold differently.
>>>>>>>>>>>
>>>>>>>>>>> What do you think?
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Albert
>>>>>>>>>>>
>>>>>>>>>>> On 22.08.2013 10:02, Igor Veresov wrote:
>>>>>>>>>>>> May be instead of "(_traversals > _last_flush_traversal_id +
>>>>>>>>>>>> 2)" we
>>>>>>>>>>>> should timestamp a method when it's disconnected, and then 
>>>>>>>>>>>> use a
>>>>>>>>>>>> rule like if a method has been disconnected for k *
>>>>>>>>>>>> reverse_free_ratio() seconds then it's ok to kill it. We 
>>>>>>>>>>>> can also
>>>>>>>>>>>> sort the nmethods that pass that filter by the amount of 
>>>>>>>>>>>> time they
>>>>>>>>>>>> were disconnected and select most likely candidates for 
>>>>>>>>>>>> flushing.
>>>>>>>>>>>> This should allow to basically do disconnect/flush in every
>>>>>>>>>>>> traversal, which should make things faster. Timestamps 
>>>>>>>>>>>> would be
>>>>>>>>>>>> obtained only once per traversal or something like that. 
>>>>>>>>>>>> What do
>>>>>>>>>>>> you think?
>>>>>>>>>>>>
>>>>>>>>>>>> Pretty cool idea to reverse-prioritize disconnects on hotness.
>>>>>>>>>>>>
>>>>>>>>>>>> igor
>>>>>>>>>>>>
>>>>>>>>>>>> On Aug 21, 2013, at 4:42 AM, Albert Noll 
>>>>>>>>>>>> <albert.noll at oracle.com
>>>>>>>>>>>> <mailto:albert.noll at oracle.com>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> could I have reviews for this patch? Please note
>>>>>>>>>>>>> that I do not yet feel very confident with the sweeper,
>>>>>>>>>>>>> so please take a close look.
>>>>>>>>>>>>>
>>>>>>>>>>>>> jbs: https://jbs.oracle.com/bugs/browse/JDK-8020151
>>>>>>>>>>>>> webrev: http://cr.openjdk.java.net/~anoll/8020151/webrev.00/
>>>>>>>>>>>>> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.00/>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Many thanks in advance,
>>>>>>>>>>>>> Albert
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Problem: There can be large performance regressions when 
>>>>>>>>>>>>> the code
>>>>>>>>>>>>> cache fills up. There are
>>>>>>>>>>>>> several reasons for the performance regression: First (1), 
>>>>>>>>>>>>> when
>>>>>>>>>>>>> the code cache is full and methods
>>>>>>>>>>>>> are speculatively disconnected, the oldest methods (based on
>>>>>>>>>>>>> compilation ID) are scheduled for
>>>>>>>>>>>>> flushing. This can result in flushing hot methods. Second 
>>>>>>>>>>>>> (2),
>>>>>>>>>>>>> when compilation is disabled due to a full
>>>>>>>>>>>>> code cache, the number of sweeps can go down. A lower 
>>>>>>>>>>>>> number of
>>>>>>>>>>>>> sweep operations results
>>>>>>>>>>>>> in slower method flushing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Solution:
>>>>>>>>>>>>> Introduce a hotness counter that is set to a particular value
>>>>>>>>>>>>> (e.g., 100) when there is an activation
>>>>>>>>>>>>> of the method during stack scanning. The counter is
>>>>>>>>>>>>> decremented by
>>>>>>>>>>>>> 1 every time the sweeper
>>>>>>>>>>>>> is invoked.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ad (1):
>>>>>>>>>>>>>   A VM operation that speculatively disconnects nmethods, 
>>>>>>>>>>>>> selects
>>>>>>>>>>>>> the methods that should be
>>>>>>>>>>>>>   flushed based on the hotness. For example, if 50% of the 
>>>>>>>>>>>>> code
>>>>>>>>>>>>> cache shall be flushed, we flush
>>>>>>>>>>>>>   those methods that have not been active while stack 
>>>>>>>>>>>>> scanning
>>>>>>>>>>>>> for
>>>>>>>>>>>>> the longest time. Note that
>>>>>>>>>>>>>   while this strategy is more likely to flush cold 
>>>>>>>>>>>>> methods, it is
>>>>>>>>>>>>> not clear to what extent the new
>>>>>>>>>>>>>   strategy fragments the code cache.
>>>>>>>>>>>>>
>>>>>>>>>>>>>   Changes in 
>>>>>>>>>>>>> NMethodSweeper::speculative_disconnect_nmethods(bool
>>>>>>>>>>>>> is_full)
>>>>>>>>>>>>>
>>>>>>>>>>>>> ad (2)
>>>>>>>>>>>>>   Currently, methods are removed from the code cache if:
>>>>>>>>>>>>>     a) code cache is full
>>>>>>>>>>>>>     b) class is unloaded
>>>>>>>>>>>>>     c) method is replaced by another version (i.e., 
>>>>>>>>>>>>> compiled with
>>>>>>>>>>>>> a different tier)
>>>>>>>>>>>>>     d) deopt
>>>>>>>>>>>>>
>>>>>>>>>>>>>    The current patch adds a 5-th possibility to remove a 
>>>>>>>>>>>>> method
>>>>>>>>>>>>> from the code cache.
>>>>>>>>>>>>>    In particular, if a method has not been active during 
>>>>>>>>>>>>> stack
>>>>>>>>>>>>> scanning for a long-enough
>>>>>>>>>>>>>    amount of time, the method is removed from the code 
>>>>>>>>>>>>> cache. The
>>>>>>>>>>>>> amount of time
>>>>>>>>>>>>>    required to flush the method depends on the available 
>>>>>>>>>>>>> space in
>>>>>>>>>>>>> the code cache.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Here is one example: If a method was seen on a stack the
>>>>>>>>>>>>> hotness counter
>>>>>>>>>>>>>    is set to 100. A sweep operation takes roughly place every
>>>>>>>>>>>>> 100ms. I.e., it takes
>>>>>>>>>>>>>    100ms * 100 = 10s until the hotness counter reaches 0. The
>>>>>>>>>>>>> threshold that determines
>>>>>>>>>>>>>    if a method should be removed from the code cache is
>>>>>>>>>>>>> calculated
>>>>>>>>>>>>> as follows:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    threshold = -100 + (CodeCache::reverse_free_ratio() *
>>>>>>>>>>>>> NMethodSweepActivity)
>>>>>>>>>>>>>
>>>>>>>>>>>>>     For example, if 25% of the code cache is free,
>>>>>>>>>>>>> reverse_free_ratio returns 4.
>>>>>>>>>>>>>     The default value of NMethodSweepActivity is 10. As a 
>>>>>>>>>>>>> result,
>>>>>>>>>>>>> threshold = -60.
>>>>>>>>>>>>>     Consequently, all methods that have a hotness value 
>>>>>>>>>>>>> smaller
>>>>>>>>>>>>> than -60 (which
>>>>>>>>>>>>>     means they have not been seen on the stack for 16s) are
>>>>>>>>>>>>> scheduled to be flushed
>>>>>>>>>>>>>     from the code cache. See an illustration of the threshold
>>>>>>>>>>>>> as a
>>>>>>>>>>>>> function of the available
>>>>>>>>>>>>>     code cache in threshold.pdf
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Note that NMethodSweepActivity is a parameter that can be
>>>>>>>>>>>>> specified via a -XX
>>>>>>>>>>>>>     flag.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Changes in NMethodSweeper::sweep_code_cache()
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> A very preliminary performance evaluation looks promising. 
>>>>>>>>>>>>> I used
>>>>>>>>>>>>> the DaCapo
>>>>>>>>>>>>> benchmarks where a series of benchmarks is executed in the
>>>>>>>>>>>>> same VM
>>>>>>>>>>>>> instance.
>>>>>>>>>>>>> See performance.pdf . The x-axis shows the benchmarks. 
>>>>>>>>>>>>> Assume we
>>>>>>>>>>>>> have 2 benchmarks
>>>>>>>>>>>>> (BM). The execution sequence is as follows:
>>>>>>>>>>>>>
>>>>>>>>>>>>> BM1 (Run 1-1)
>>>>>>>>>>>>> BM1 (Run 2-1)
>>>>>>>>>>>>> BM2 (Run 1-1)
>>>>>>>>>>>>> BM2 (Run 2-1)
>>>>>>>>>>>>>
>>>>>>>>>>>>> BM1 (Run 1-2)
>>>>>>>>>>>>> BM1 (Run 2-2)
>>>>>>>>>>>>> BM2 (Run 1-2)
>>>>>>>>>>>>> BM2 (Run 2-2)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> A value larger than 0 on the x-axis indicates that the 
>>>>>>>>>>>>> version
>>>>>>>>>>>>> including the proposed patch is faster.
>>>>>>>>>>>>> I.e., the values are calculated as follows: (T_original /
>>>>>>>>>>>>> T_with_patch) - 1. T is the execution time
>>>>>>>>>>>>> (wall clock time) of the benchmark. ReservedCodeCacheSize 
>>>>>>>>>>>>> is set
>>>>>>>>>>>>> to 50m.  I used three runs and
>>>>>>>>>>>>> the arithmetic average to compare the numbers. I know, we 
>>>>>>>>>>>>> need
>>>>>>>>>>>>> much more data, however,
>>>>>>>>>>>>> I think we can see a trend.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The current patch does not trigger a warning that the code 
>>>>>>>>>>>>> cache
>>>>>>>>>>>>> is full and compilation has been
>>>>>>>>>>>>> disabled.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please let me know that you think.
>>>>>>>>>>>>> <threshold.pdf><performance.pdf>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130917/76eade6e/attachment-0001.html