RFR(S): 8020151: PSR:PERF Large performance regressions when code cache is filled

Wed Sep 4 04:07:22 PDT 2013

Hi,

here is an updated version of the patch. In summary, changes to the 
previous version are:

1) Adapt NmethodSweepFraction based on the code cache size:
     Before the patch, NmethodSweepFraction was defined as a constant value.
     I.e., NmethodSweepFraction was 16 for a code cache size of 32m as 
well as a
     code cache size of 256m. This seems unreasonable, since 
NmethodSweepFraction
     was originally introduce to keep the time spent in the sweeper 
(note that sweeping
     was originally done at safepoints) at a reasonable level.
     The patch adapts NmethodSweepFraction to the ReservedCodeCacheSize. 
I.e.,
     each sweep operation roughly  covers 16m of the code cache.

2) The parameter NmethodSweepFraction is replaced by a function that 
computes the
     NmethodSweepFraction based on the code cache size. See 
'sweeper.cpp' for a detailed
     description

3) The initial hotness counter depends on the code cache size. More 
specifically, the initial
     hotness counter is 2*ReservedCodeCacheSize. As a result, a method 
stays longer in the
     code cache if ReservedCodeCacheSize is larger.

4) Newly compiled methods are guaranteed to not be evicted by the 
sweeper for 10
     sweep cycles. This ensures the newly compiled methods are not 
immediately made
     not entrant after compialtion.

5) The hotness counter is reset EVERY TIME active methods are scanned. 
In the previous version
     the hotness counter was only reset after a full sweep cycle of the 
code cache. Resetting the
    hotness counter more frequently provides a better hotness coverage 
of methods.

6) Methods are flushed in blocks of 1m. The algorithm computes the 
average hotness of a
     a nmethod block and evicts the entire block. This should reduce 
fragmentation.

Please let me know what you think about these changes. Performance 
results can be found at:
https://bugs.openjdk.java.net/browse/JDK-8020151
I will continuously provide more results.

Here is the new webrev:
http://cr.openjdk.java.net/~anoll/8020151/webrev.01/ 
<http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.01/>

Best,
Albert

On 22.08.2013 15:16, Albert Noll wrote:
> Hi Igor,
>
> thanks again for your comments. You are right. I will run some 
> benchmarks to see if removing
> the disconnect logic is feasible or not.
>
> Best,
> Albert
>
> On 22.08.2013 12:39, Igor Veresov wrote:
>> It's worth a try. But the "hotness" logic is probabilistic and 
>> imprecise - it will notice only methods that are on stack during 
>> safepoints. Those are going to be pretty narrow snapshots of 
>> activity. I suspect that for large flat profiles (like enterprise 
>> apps and friends) you could be missing methods that are rather warm 
>> in reality, which will cause recompilation oscillations. Although if 
>> the statistics are allowed to accumulate enough may be it's going to 
>> work out, an experiment will tell. The patch is a good start, the 
>> stack sampling, IMO, is totally the right approach for filtering out 
>> the hot methods.
>>
>> The "disconnect" logic on the other hand is sort of precise. 
>> Although, looking now at the code it's not quite clear to me how it 
>> works, it doesn't seem to be any patching going on to divert the 
>> control for the cases when the nmethod is called directly or through 
>> an IC. So I guess it's not really a full disconnect?  Anyways, in 
>> theory, with some work, we can make the disconnect logic to precisely 
>> measure the time the method is inactive. Which should provide precise 
>> information about the warm/cold methods.
>>
>> Btw, also just noticed a bunch of flaws in the interaction of the 
>> disconnect logic and tiered. The nmethod's "reconnection" happens in 
>> CompileBroker::compile_method(), which firstly will be called by 
>> tiered only after a rather substantial number of invocations in the 
>> interpreter (up to 128), and secondly will be subject to all the 
>> prioritization rules (it probably should not), and also we don't 
>> check if the comp level of the reanimated nmethod matches the 
>> request. If the disconnect logic is to stay, the interpreter should 
>> be able to know if the method has saved code and be able to call into 
>> the runtime immediately to reanimate it.
>>
>> igor
>>
>> On Aug 21, 2013, at 10:27 PM, Albert Noll <albert.noll at oracle.com 
>> <mailto:albert.noll at oracle.com>> wrote:
>>
>>> Hi Igor,
>>>
>>> thanks for looking at the patch. Actually, I think - just as 
>>> Vladimir pointed out - that we can get rid of
>>> the "disconnect" logic. We now have the hotness of a method and if 
>>> the code cache fills up, we and
>>> we decide to schedule the method for removal, we set it to not_entrant.
>>> It seems that adding the method to the list of disconnected methods 
>>> just buys a little more time until we decide to make the method 
>>> not-entrant. However, we can have the same effect by setting the  
>>> threshold differently.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Albert
>>>
>>> On 22.08.2013 10:02, Igor Veresov wrote:
>>>> May be instead of "(_traversals > _last_flush_traversal_id + 2)" we 
>>>> should timestamp a method when it's disconnected, and then use a 
>>>> rule like if a method has been disconnected for k * 
>>>> reverse_free_ratio() seconds then it's ok to kill it. We can also 
>>>> sort the nmethods that pass that filter by the amount of time they 
>>>> were disconnected and select most likely candidates for flushing. 
>>>> This should allow to basically do disconnect/flush in every 
>>>> traversal, which should make things faster. Timestamps would be 
>>>> obtained only once per traversal or something like that. What do 
>>>> you think?
>>>>
>>>> Pretty cool idea to reverse-prioritize disconnects on hotness.
>>>>
>>>> igor
>>>>
>>>> On Aug 21, 2013, at 4:42 AM, Albert Noll <albert.noll at oracle.com 
>>>> <mailto:albert.noll at oracle.com>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> could I have reviews for this patch? Please note
>>>>> that I do not yet feel very confident with the sweeper,
>>>>> so please take a close look.
>>>>>
>>>>> jbs: https://jbs.oracle.com/bugs/browse/JDK-8020151
>>>>> webrev: http://cr.openjdk.java.net/~anoll/8020151/webrev.00/ 
>>>>> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.00/>
>>>>>
>>>>>
>>>>> Many thanks in advance,
>>>>> Albert
>>>>>
>>>>>
>>>>> Problem: There can be large performance regressions when the code 
>>>>> cache fills up. There are
>>>>> several reasons for the performance regression: First (1), when 
>>>>> the code cache is full and methods
>>>>> are speculatively disconnected, the oldest methods (based on 
>>>>> compilation ID) are scheduled for
>>>>> flushing. This can result in flushing hot methods. Second (2), 
>>>>> when compilation is disabled due to a full
>>>>> code cache, the number of sweeps can go down. A lower number of 
>>>>> sweep operations results
>>>>> in slower method flushing.
>>>>>
>>>>> Solution:
>>>>> Introduce a hotness counter that is set to a particular value 
>>>>> (e.g., 100) when there is an activation
>>>>> of the method during stack scanning. The counter is decremented by 
>>>>> 1 every time the sweeper
>>>>> is invoked.
>>>>>
>>>>> ad (1):
>>>>>   A VM operation that speculatively disconnects nmethods, selects 
>>>>> the methods that should be
>>>>>   flushed based on the hotness. For example, if 50% of the code 
>>>>> cache shall be flushed, we flush
>>>>>   those methods that have not been active while stack scanning for 
>>>>> the longest time. Note that
>>>>>   while this strategy is more likely to flush cold methods, it is 
>>>>> not clear to what extent the new
>>>>>   strategy fragments the code cache.
>>>>>
>>>>>   Changes in NMethodSweeper::speculative_disconnect_nmethods(bool 
>>>>> is_full)
>>>>>
>>>>> ad (2)
>>>>>   Currently, methods are removed from the code cache if:
>>>>>     a) code cache is full
>>>>>     b) class is unloaded
>>>>>     c) method is replaced by another version (i.e., compiled with 
>>>>> a different tier)
>>>>>     d) deopt
>>>>>
>>>>>    The current patch adds a 5-th possibility to remove a method 
>>>>> from the code cache.
>>>>>    In particular, if a method has not been active during stack 
>>>>> scanning for a long-enough
>>>>>    amount of time, the method is removed from the code cache. The 
>>>>> amount of time
>>>>>    required to flush the method depends on the available space in 
>>>>> the code cache.
>>>>>
>>>>>    Here is one example: If a method was seen on a stack the 
>>>>> hotness counter
>>>>>    is set to 100. A sweep operation takes roughly place every 
>>>>> 100ms. I.e., it takes
>>>>>    100ms * 100 = 10s until the hotness counter reaches 0. The 
>>>>> threshold that determines
>>>>>    if a method should be removed from the code cache is calculated 
>>>>> as follows:
>>>>>
>>>>>    threshold = -100 + (CodeCache::reverse_free_ratio() * 
>>>>> NMethodSweepActivity)
>>>>>
>>>>>     For example, if 25% of the code cache is free, 
>>>>> reverse_free_ratio returns 4.
>>>>>     The default value of NMethodSweepActivity is 10. As a result, 
>>>>> threshold = -60.
>>>>>     Consequently, all methods that have a hotness value smaller 
>>>>> than -60 (which
>>>>>     means they have not been seen on the stack for 16s) are 
>>>>> scheduled to be flushed
>>>>>     from the code cache. See an illustration of the threshold as a 
>>>>> function of the available
>>>>>     code cache in threshold.pdf
>>>>>
>>>>>     Note that NMethodSweepActivity is a parameter that can be 
>>>>> specified via a -XX
>>>>>     flag.
>>>>>
>>>>> Changes in NMethodSweeper::sweep_code_cache()
>>>>>
>>>>>
>>>>> A very preliminary performance evaluation looks promising. I used 
>>>>> the DaCapo
>>>>> benchmarks where a series of benchmarks is executed in the same VM 
>>>>> instance.
>>>>> See performance.pdf . The x-axis shows the benchmarks. Assume we 
>>>>> have 2 benchmarks
>>>>> (BM). The execution sequence is as follows:
>>>>>
>>>>> BM1 (Run 1-1)
>>>>> BM1 (Run 2-1)
>>>>> BM2 (Run 1-1)
>>>>> BM2 (Run 2-1)
>>>>>
>>>>> BM1 (Run 1-2)
>>>>> BM1 (Run 2-2)
>>>>> BM2 (Run 1-2)
>>>>> BM2 (Run 2-2)
>>>>>
>>>>>
>>>>> A value larger than 0 on the x-axis indicates that the version 
>>>>> including the proposed patch is faster.
>>>>> I.e., the values are calculated as follows: (T_original / 
>>>>> T_with_patch) - 1. T is the execution time
>>>>> (wall clock time) of the benchmark. ReservedCodeCacheSize is set 
>>>>> to 50m.  I used three runs and
>>>>> the arithmetic average to compare the numbers. I know, we need 
>>>>> much more data, however,
>>>>> I think we can see a trend.
>>>>>
>>>>> The current patch does not trigger a warning that the code cache 
>>>>> is full and compilation has been
>>>>> disabled.
>>>>>
>>>>> Please let me know that you think.
>>>>> <threshold.pdf><performance.pdf>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130904/7d93b4d5/attachment-0001.html