RFR(S): 8020151: PSR:PERF Large performance regressions when code cache is filled

Thu Aug 22 06:16:37 PDT 2013

Hi Igor,

thanks again for your comments. You are right. I will run some 
benchmarks to see if removing
the disconnect logic is feasible or not.

Best,
Albert

On 22.08.2013 12:39, Igor Veresov wrote:
> It's worth a try. But the "hotness" logic is probabilistic and 
> imprecise - it will notice only methods that are on stack during 
> safepoints. Those are going to be pretty narrow snapshots of activity. 
> I suspect that for large flat profiles (like enterprise apps and 
> friends) you could be missing methods that are rather warm in reality, 
> which will cause recompilation oscillations. Although if the 
> statistics are allowed to accumulate enough may be it's going to work 
> out, an experiment will tell. The patch is a good start, the stack 
> sampling, IMO, is totally the right approach for filtering out the hot 
> methods.
>
> The "disconnect" logic on the other hand is sort of precise. Although, 
> looking now at the code it's not quite clear to me how it works, it 
> doesn't seem to be any patching going on to divert the control for the 
> cases when the nmethod is called directly or through an IC. So I guess 
> it's not really a full disconnect?  Anyways, in theory, with some 
> work, we can make the disconnect logic to precisely measure the time 
> the method is inactive. Which should provide precise information about 
> the warm/cold methods.
>
> Btw, also just noticed a bunch of flaws in the interaction of the 
> disconnect logic and tiered. The nmethod's "reconnection" happens in 
> CompileBroker::compile_method(), which firstly will be called by 
> tiered only after a rather substantial number of invocations in the 
> interpreter (up to 128), and secondly will be subject to all the 
> prioritization rules (it probably should not), and also we don't check 
> if the comp level of the reanimated nmethod matches the request. If 
> the disconnect logic is to stay, the interpreter should be able to 
> know if the method has saved code and be able to call into the runtime 
> immediately to reanimate it.
>
> igor
>
> On Aug 21, 2013, at 10:27 PM, Albert Noll <albert.noll at oracle.com 
> <mailto:albert.noll at oracle.com>> wrote:
>
>> Hi Igor,
>>
>> thanks for looking at the patch. Actually, I think - just as Vladimir 
>> pointed out - that we can get rid of
>> the "disconnect" logic. We now have the hotness of a method and if 
>> the code cache fills up, we and
>> we decide to schedule the method for removal, we set it to not_entrant.
>> It seems that adding the method to the list of disconnected methods 
>> just buys a little more time until we decide to make the method 
>> not-entrant. However, we can have the same effect by setting the  
>> threshold differently.
>>
>> What do you think?
>>
>> Best,
>> Albert
>>
>> On 22.08.2013 10:02, Igor Veresov wrote:
>>> May be instead of "(_traversals > _last_flush_traversal_id + 2)" we 
>>> should timestamp a method when it's disconnected, and then use a 
>>> rule like if a method has been disconnected for k * 
>>> reverse_free_ratio() seconds then it's ok to kill it. We can also 
>>> sort the nmethods that pass that filter by the amount of time they 
>>> were disconnected and select most likely candidates for flushing. 
>>> This should allow to basically do disconnect/flush in every 
>>> traversal, which should make things faster. Timestamps would be 
>>> obtained only once per traversal or something like that. What do you 
>>> think?
>>>
>>> Pretty cool idea to reverse-prioritize disconnects on hotness.
>>>
>>> igor
>>>
>>> On Aug 21, 2013, at 4:42 AM, Albert Noll <albert.noll at oracle.com 
>>> <mailto:albert.noll at oracle.com>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> could I have reviews for this patch? Please note
>>>> that I do not yet feel very confident with the sweeper,
>>>> so please take a close look.
>>>>
>>>> jbs: https://jbs.oracle.com/bugs/browse/JDK-8020151
>>>> webrev: http://cr.openjdk.java.net/~anoll/8020151/webrev.00/ 
>>>> <http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.00/>
>>>>
>>>>
>>>> Many thanks in advance,
>>>> Albert
>>>>
>>>>
>>>> Problem: There can be large performance regressions when the code 
>>>> cache fills up. There are
>>>> several reasons for the performance regression: First (1), when the 
>>>> code cache is full and methods
>>>> are speculatively disconnected, the oldest methods (based on 
>>>> compilation ID) are scheduled for
>>>> flushing. This can result in flushing hot methods. Second (2), when 
>>>> compilation is disabled due to a full
>>>> code cache, the number of sweeps can go down. A lower number of 
>>>> sweep operations results
>>>> in slower method flushing.
>>>>
>>>> Solution:
>>>> Introduce a hotness counter that is set to a particular value 
>>>> (e.g., 100) when there is an activation
>>>> of the method during stack scanning. The counter is decremented by 
>>>> 1 every time the sweeper
>>>> is invoked.
>>>>
>>>> ad (1):
>>>>   A VM operation that speculatively disconnects nmethods, selects 
>>>> the methods that should be
>>>>   flushed based on the hotness. For example, if 50% of the code 
>>>> cache shall be flushed, we flush
>>>>   those methods that have not been active while stack scanning for 
>>>> the longest time. Note that
>>>>   while this strategy is more likely to flush cold methods, it is 
>>>> not clear to what extent the new
>>>>   strategy fragments the code cache.
>>>>
>>>>   Changes in NMethodSweeper::speculative_disconnect_nmethods(bool 
>>>> is_full)
>>>>
>>>> ad (2)
>>>>   Currently, methods are removed from the code cache if:
>>>>     a) code cache is full
>>>>     b) class is unloaded
>>>>     c) method is replaced by another version (i.e., compiled with a 
>>>> different tier)
>>>>     d) deopt
>>>>
>>>>    The current patch adds a 5-th possibility to remove a method 
>>>> from the code cache.
>>>>    In particular, if a method has not been active during stack 
>>>> scanning for a long-enough
>>>>    amount of time, the method is removed from the code cache. The 
>>>> amount of time
>>>>    required to flush the method depends on the available space in 
>>>> the code cache.
>>>>
>>>>    Here is one example: If a method was seen on a stack the hotness 
>>>> counter
>>>>    is set to 100. A sweep operation takes roughly place every 
>>>> 100ms. I.e., it takes
>>>>    100ms * 100 = 10s until the hotness counter reaches 0. The 
>>>> threshold that determines
>>>>    if a method should be removed from the code cache is calculated 
>>>> as follows:
>>>>
>>>>    threshold = -100 + (CodeCache::reverse_free_ratio() * 
>>>> NMethodSweepActivity)
>>>>
>>>>     For example, if 25% of the code cache is free, 
>>>> reverse_free_ratio returns 4.
>>>>     The default value of NMethodSweepActivity is 10. As a result, 
>>>> threshold = -60.
>>>>     Consequently, all methods that have a hotness value smaller 
>>>> than -60 (which
>>>>     means they have not been seen on the stack for 16s) are 
>>>> scheduled to be flushed
>>>>     from the code cache. See an illustration of the threshold as a 
>>>> function of the available
>>>>     code cache in threshold.pdf
>>>>
>>>>     Note that NMethodSweepActivity is a parameter that can be 
>>>> specified via a -XX
>>>>     flag.
>>>>
>>>> Changes in NMethodSweeper::sweep_code_cache()
>>>>
>>>>
>>>> A very preliminary performance evaluation looks promising. I used 
>>>> the DaCapo
>>>> benchmarks where a series of benchmarks is executed in the same VM 
>>>> instance.
>>>> See performance.pdf . The x-axis shows the benchmarks. Assume we 
>>>> have 2 benchmarks
>>>> (BM). The execution sequence is as follows:
>>>>
>>>> BM1 (Run 1-1)
>>>> BM1 (Run 2-1)
>>>> BM2 (Run 1-1)
>>>> BM2 (Run 2-1)
>>>>
>>>> BM1 (Run 1-2)
>>>> BM1 (Run 2-2)
>>>> BM2 (Run 1-2)
>>>> BM2 (Run 2-2)
>>>>
>>>>
>>>> A value larger than 0 on the x-axis indicates that the version 
>>>> including the proposed patch is faster.
>>>> I.e., the values are calculated as follows: (T_original / 
>>>> T_with_patch) - 1. T is the execution time
>>>> (wall clock time) of the benchmark. ReservedCodeCacheSize is set to 
>>>> 50m.  I used three runs and
>>>> the arithmetic average to compare the numbers. I know, we need much 
>>>> more data, however,
>>>> I think we can see a trend.
>>>>
>>>> The current patch does not trigger a warning that the code cache is 
>>>> full and compilation has been
>>>> disabled.
>>>>
>>>> Please let me know that you think.
>>>> <threshold.pdf><performance.pdf>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130822/57d007c0/attachment-0001.html