RFR (S/M): 8012547: Reclamation of flushed methods can be slow

Fri Apr 26 08:15:34 PDT 2013

On 2013-04-25 20:20, Vladimir Kozlov wrote:
> >>> * Let sweeper make disconnected methods non_entrant even if we don't
> >>> need flushing any more. Makes us less likely to hit a new flush.
> >>
> >> You removed compilation enabling code in
> >> speculative_disconnect_nmethods() so we need to wait next safepoint to
> >> enable it (in scan_stacks()). Should we do enabling compilation during
> >> sweep may be in an other place when we know that some space is
> >> reclaimed to avoid waiting safepoint?
> >
> > Yes, my though was that !CodeCache::needs_flushing() will never be true
> > since this code is only called when we need flushing. And the only 
> place
> > where code cache is actually freed is in scan_stacks when nmethods are
> > flushed. So it should be enough to check for turning on the compiler as
> > the last thing there.
>
> It is not true, it is opposite. scan_stack() only marks nmethods. 
> nmethod::flash() (which frees space) is called only from 
> NMethodSweeper::sweep_code_cache().

You are right. Confusing myself here. But the rest of my argument still 
stands - flushing is only called when need_flushing() is true, and the 
possibility of it being false a moment later in disconnect methods is 
very low.  But I added a check in the end of the sweep_code_cache that 
can turn the compiler on again.

Also changed the flush_target calculation in 
speculative_disconnect_nmethods slightly. Now it doesn't count natives 
and osrs which might prevent some potentially bad behavior in long 
running applications with lots of osrs. (With CodeCacheFlushFraction=8 
or so, and more than 12.5% osrs, the flushing would stop make progress 
in the very long run)

Latest webrev:

http://cr.openjdk.java.net/~neliasso/8012547/webrev.08/ 
<http://cr.openjdk.java.net/%7Eneliasso/8012547/webrev.08/>

Thanks!
//Nils

>
> Thanks,
> Vladimir
>
> On 4/25/13 7:05 AM, Nils Eliasson wrote:
>>
>> On 2013-04-24 22:33, Vladimir Kozlov wrote:
>>> Nils,
>>>
>>> Yes, in general it is better.
>>>
>>> http://cr.openjdk.java.net/~neliasso/8012547/webrev.04/
>>>
>>> On 4/24/13 7:06 AM, Nils Eliasson wrote:
>>>> Hi again,
>>>>
>>>> Further testing revealed that not all issues was fully identified.
>>>> Flushing could still cause compiles to halt even though 
>>>> compilations was
>>>> on, and turning off compilations when the code cache was really full
>>>> made the sweeper run much less often, prolonging the problem.
>>>>
>>>> Change log:
>>>> * Fix the comment Christian mentioned, and some other comments too.
>>>> * find_and_remove_saved_code renamed to reanimate_saved_code to
>>>> distinguish it from remove_saved_code that actually removes it.
>>>
>>> Please, add comments to those CodeCache methods to describe what 
>>> they do.
>>
>> ok - fixed.
>>
>>>> * dont bail out on need_flushing() in compile_method - that causes 
>>>> us to
>>>> stop compiling as soon as we hit CodeCacheFlushingMinimumFreeSpace 
>>>> - and
>>>> that wasn't the idea.
>>>
>>> Then you should not check UseCodeCacheFlushing flag and the comment
>>> should be updated.
>>
>> ok, I removed the flag and the capacity check. When we can hit it -
>> compilation should already be turned off. And if it isn't it will be in
>> a moment.
>>
>>>
>>>> * New product flag CodeCacheFlushingFraction (defaults to 2)
>>>> * _was_full removed. Was a redundant way to say that we stopped
>>>> compilations.
>>>> * _last_was_full renamed to _last_full_flush_time, to emphasize 
>>>> that it
>>>> is the time when we hit the hard limit CodeCacheMinimumSpace and
>>>> compilations are stopped
>>>> * _was_full_traversal renamed to _last_flush_traversal_id. 'full' 
>>>> means
>>>> the hard CodeCacheMinimumFreeSpace limit in all other variable 
>>>> names. 
>>>> * introduced _dead_compile_ids that hold a count for the number of
>>>> compile ID that have been reclaimed. Gives a more accurate way to 
>>>> flush
>>>> the cache.
>>>
>>> Remove space in "jint       _flush_token;" to align it. Also align
>>> comments.
>> Fixed
>>>
>>>> * Simplified scan_stacks
>>>>    - restore flush token when min interval has passed
>>>>    - start compilations again if flushing is no longer needed
>>>> * Speed up sweeper when compilations is turned off. Sweeping is slowed
>>>> down when compiler threads are sleeping on the CodeCache lock, and we
>>>> want to speed up the reclaim process so that we can turn on compiler
>>>> again.
>>>> * Let sweeper make disconnected methods non_entrant even if we don't
>>>> need flushing any more. Makes us less likely to hit a new flush.
>>>
>>> You removed compilation enabling code in
>>> speculative_disconnect_nmethods() so we need to wait next safepoint to
>>> enable it (in scan_stacks()). Should we do enabling compilation during
>>> sweep may be in an other place when we know that some space is
>>> reclaimed to avoid waiting safepoint?
>>
>> Yes, my though was that !CodeCache::needs_flushing() will never be true
>> since this code is only called when we need flushing. And the only place
>> where code cache is actually freed is in scan_stacks when nmethods are
>> flushed. So it should be enough to check for turning on the compiler as
>> the last thing there.
>>
>>>
>>>> * Refactored NMethodSweeper::handle_full_code_cache
>>>>    - flushed and full flushes now uses the same token to synchronize -
>>>> otherwise we might get two back-to-back flushes. And that doesn't help
>>>> anyone.
>>>>    - remove time check - flush_token is restored in scan_stacks where
>>>> the time is already checked.
>>>> * set _resweep flag when flushing.
>>>>
>>>> The code is much easier to read now and the behavior should be less
>>>> surprising.
>>>
>>> Is that '/* */' leftover from your debugging?:
>>>
>>> 238   if (PrintMethodFlushing /*&& Verbose*/) {
>>>
>> Yes, I forgot to qrefresh my patch before doing the webrev.
>>
>>> Thanks,
>>> Vladimir
>>
>> New webrev with your and Christians feedback here:
>>
>> http://cr.openjdk.java.net/~neliasso/8012547/webrev.07
>> <http://cr.openjdk.java.net/%7Eneliasso/8012547/webrev.07>
>>
>> Thanks for all your help!
>> //Nils
>>>
>>>>
>>>> http://cr.openjdk.java.net/~neliasso/8012547/webrev.03/
>>>> <http://cr.openjdk.java.net/%7Eneliasso/8012547/webrev.03/>
>>>>
>>>> Thanks,
>>>> Nils Eliasson
>>>>
>>>>
>>>> On 2013-04-18 15:09, Nils Eliasson wrote:
>>>>> Hi!
>>>>>
>>>>> I have another fix to the code cache sweeper and flushing that 
>>>>> needs a
>>>>> review.
>>>>>
>>>>> The major change is to remove a check in scan_stacks that stops the
>>>>> sweeper when the cache is getting  full. The normal mode is to sweep
>>>>> and record if any change has happened that require another sweep. 
>>>>> This
>>>>> check stops the sweeper early causing some methods that are
>>>>> speculatively disconnected to stay so for an unnecessary long time
>>>>> sometimes causing unnecessary new flushes.
>>>>>
>>>>> Also some refactoring
>>>>> - remove state variable _do_sweep that was unnecessary. It marked 
>>>>> if a
>>>>> sweep was active or not, but just a duplicate way of checking if any
>>>>> methods are being sweept (nmethodsweeper::current != NULL).
>>>>> - rename _rescan to _resweep. When _resweep is set there will be
>>>>> another sweep started when the current ends. That sweep will start
>>>>> with a scan, but it is not only a scan.
>>>>> - rename _advise_to_sweep to _flush_token. Is CASed by the first
>>>>> thread that wants to flush and reset by scan_stacks when the flush is
>>>>> finished and a proper time has passed.
>>>>>
>>>>> http://cr.openjdk.java.net/~neliasso/8012547/
>>>>> <http://cr.openjdk.java.net/%7Eneliasso/8012547/>
>>>>> https://jbs.oracle.com/bugs/browse/JDK-8012547
>>>>>
>>>>> Thanks,
>>>>> Nils E.
>>>>
>>