RFR (S/M): 8012547: Reclamation of flushed methods can be slow

Mon Apr 29 04:15:46 PDT 2013

Right, fixed.

//N

On 2013-04-26 17:34, Vladimir Kozlov wrote:
> Nils,
>
> You should use CompileBroker::run_compilation instead of 'true':
>
> 319     CompileBroker::set_should_compile_new_jobs(true);
>
> Otherwise it looks good.
>
> Thanks,
> Vladimir
>
> On 4/26/13 8:15 AM, Nils Eliasson wrote:
>>
>> On 2013-04-25 20:20, Vladimir Kozlov wrote:
>>> >>> * Let sweeper make disconnected methods non_entrant even if we 
>>> don't
>>> >>> need flushing any more. Makes us less likely to hit a new flush.
>>> >>
>>> >> You removed compilation enabling code in
>>> >> speculative_disconnect_nmethods() so we need to wait next 
>>> safepoint to
>>> >> enable it (in scan_stacks()). Should we do enabling compilation 
>>> during
>>> >> sweep may be in an other place when we know that some space is
>>> >> reclaimed to avoid waiting safepoint?
>>> >
>>> > Yes, my though was that !CodeCache::needs_flushing() will never be 
>>> true
>>> > since this code is only called when we need flushing. And the only
>>> place
>>> > where code cache is actually freed is in scan_stacks when nmethods 
>>> are
>>> > flushed. So it should be enough to check for turning on the 
>>> compiler as
>>> > the last thing there.
>>>
>>> It is not true, it is opposite. scan_stack() only marks nmethods.
>>> nmethod::flash() (which frees space) is called only from
>>> NMethodSweeper::sweep_code_cache().
>>
>> You are right. Confusing myself here. But the rest of my argument still
>> stands - flushing is only called when need_flushing() is true, and the
>> possibility of it being false a moment later in disconnect methods is
>> very low.  But I added a check in the end of the sweep_code_cache that
>> can turn the compiler on again.
>>
>> Also changed the flush_target calculation in
>> speculative_disconnect_nmethods slightly. Now it doesn't count natives
>> and osrs which might prevent some potentially bad behavior in long
>> running applications with lots of osrs. (With CodeCacheFlushFraction=8
>> or so, and more than 12.5% osrs, the flushing would stop make progress
>> in the very long run)
>>
>> Latest webrev:
>>
>> http://cr.openjdk.java.net/~neliasso/8012547/webrev.08/
>> <http://cr.openjdk.java.net/%7Eneliasso/8012547/webrev.08/>
>>
>> Thanks!
>> //Nils
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/25/13 7:05 AM, Nils Eliasson wrote:
>>>>
>>>> On 2013-04-24 22:33, Vladimir Kozlov wrote:
>>>>> Nils,
>>>>>
>>>>> Yes, in general it is better.
>>>>>
>>>>> http://cr.openjdk.java.net/~neliasso/8012547/webrev.04/
>>>>>
>>>>> On 4/24/13 7:06 AM, Nils Eliasson wrote:
>>>>>> Hi again,
>>>>>>
>>>>>> Further testing revealed that not all issues was fully identified.
>>>>>> Flushing could still cause compiles to halt even though
>>>>>> compilations was
>>>>>> on, and turning off compilations when the code cache was really full
>>>>>> made the sweeper run much less often, prolonging the problem.
>>>>>>
>>>>>> Change log:
>>>>>> * Fix the comment Christian mentioned, and some other comments too.
>>>>>> * find_and_remove_saved_code renamed to reanimate_saved_code to
>>>>>> distinguish it from remove_saved_code that actually removes it.
>>>>>
>>>>> Please, add comments to those CodeCache methods to describe what
>>>>> they do.
>>>>
>>>> ok - fixed.
>>>>
>>>>>> * dont bail out on need_flushing() in compile_method - that causes
>>>>>> us to
>>>>>> stop compiling as soon as we hit CodeCacheFlushingMinimumFreeSpace
>>>>>> - and
>>>>>> that wasn't the idea.
>>>>>
>>>>> Then you should not check UseCodeCacheFlushing flag and the comment
>>>>> should be updated.
>>>>
>>>> ok, I removed the flag and the capacity check. When we can hit it -
>>>> compilation should already be turned off. And if it isn't it will 
>>>> be in
>>>> a moment.
>>>>
>>>>>
>>>>>> * New product flag CodeCacheFlushingFraction (defaults to 2)
>>>>>> * _was_full removed. Was a redundant way to say that we stopped
>>>>>> compilations.
>>>>>> * _last_was_full renamed to _last_full_flush_time, to emphasize
>>>>>> that it
>>>>>> is the time when we hit the hard limit CodeCacheMinimumSpace and
>>>>>> compilations are stopped
>>>>>> * _was_full_traversal renamed to _last_flush_traversal_id. 'full'
>>>>>> means
>>>>>> the hard CodeCacheMinimumFreeSpace limit in all other variable
>>>>>> names. 
>>>>>> * introduced _dead_compile_ids that hold a count for the number of
>>>>>> compile ID that have been reclaimed. Gives a more accurate way to
>>>>>> flush
>>>>>> the cache.
>>>>>
>>>>> Remove space in "jint       _flush_token;" to align it. Also align
>>>>> comments.
>>>> Fixed
>>>>>
>>>>>> * Simplified scan_stacks
>>>>>>    - restore flush token when min interval has passed
>>>>>>    - start compilations again if flushing is no longer needed
>>>>>> * Speed up sweeper when compilations is turned off. Sweeping is 
>>>>>> slowed
>>>>>> down when compiler threads are sleeping on the CodeCache lock, 
>>>>>> and we
>>>>>> want to speed up the reclaim process so that we can turn on compiler
>>>>>> again.
>>>>>> * Let sweeper make disconnected methods non_entrant even if we don't
>>>>>> need flushing any more. Makes us less likely to hit a new flush.
>>>>>
>>>>> You removed compilation enabling code in
>>>>> speculative_disconnect_nmethods() so we need to wait next 
>>>>> safepoint to
>>>>> enable it (in scan_stacks()). Should we do enabling compilation 
>>>>> during
>>>>> sweep may be in an other place when we know that some space is
>>>>> reclaimed to avoid waiting safepoint?
>>>>
>>>> Yes, my though was that !CodeCache::needs_flushing() will never be 
>>>> true
>>>> since this code is only called when we need flushing. And the only 
>>>> place
>>>> where code cache is actually freed is in scan_stacks when nmethods are
>>>> flushed. So it should be enough to check for turning on the 
>>>> compiler as
>>>> the last thing there.
>>>>
>>>>>
>>>>>> * Refactored NMethodSweeper::handle_full_code_cache
>>>>>>    - flushed and full flushes now uses the same token to 
>>>>>> synchronize -
>>>>>> otherwise we might get two back-to-back flushes. And that doesn't 
>>>>>> help
>>>>>> anyone.
>>>>>>    - remove time check - flush_token is restored in scan_stacks 
>>>>>> where
>>>>>> the time is already checked.
>>>>>> * set _resweep flag when flushing.
>>>>>>
>>>>>> The code is much easier to read now and the behavior should be less
>>>>>> surprising.
>>>>>
>>>>> Is that '/* */' leftover from your debugging?:
>>>>>
>>>>> 238   if (PrintMethodFlushing /*&& Verbose*/) {
>>>>>
>>>> Yes, I forgot to qrefresh my patch before doing the webrev.
>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>
>>>> New webrev with your and Christians feedback here:
>>>>
>>>> http://cr.openjdk.java.net/~neliasso/8012547/webrev.07
>>>> <http://cr.openjdk.java.net/%7Eneliasso/8012547/webrev.07>
>>>>
>>>> Thanks for all your help!
>>>> //Nils
>>>>>
>>>>>>
>>>>>> http://cr.openjdk.java.net/~neliasso/8012547/webrev.03/
>>>>>> <http://cr.openjdk.java.net/%7Eneliasso/8012547/webrev.03/>
>>>>>>
>>>>>> Thanks,
>>>>>> Nils Eliasson
>>>>>>
>>>>>>
>>>>>> On 2013-04-18 15:09, Nils Eliasson wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> I have another fix to the code cache sweeper and flushing that
>>>>>>> needs a
>>>>>>> review.
>>>>>>>
>>>>>>> The major change is to remove a check in scan_stacks that stops the
>>>>>>> sweeper when the cache is getting  full. The normal mode is to 
>>>>>>> sweep
>>>>>>> and record if any change has happened that require another sweep.
>>>>>>> This
>>>>>>> check stops the sweeper early causing some methods that are
>>>>>>> speculatively disconnected to stay so for an unnecessary long time
>>>>>>> sometimes causing unnecessary new flushes.
>>>>>>>
>>>>>>> Also some refactoring
>>>>>>> - remove state variable _do_sweep that was unnecessary. It marked
>>>>>>> if a
>>>>>>> sweep was active or not, but just a duplicate way of checking if 
>>>>>>> any
>>>>>>> methods are being sweept (nmethodsweeper::current != NULL).
>>>>>>> - rename _rescan to _resweep. When _resweep is set there will be
>>>>>>> another sweep started when the current ends. That sweep will start
>>>>>>> with a scan, but it is not only a scan.
>>>>>>> - rename _advise_to_sweep to _flush_token. Is CASed by the first
>>>>>>> thread that wants to flush and reset by scan_stacks when the 
>>>>>>> flush is
>>>>>>> finished and a proper time has passed.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~neliasso/8012547/
>>>>>>> <http://cr.openjdk.java.net/%7Eneliasso/8012547/>
>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-8012547
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Nils E.
>>>>>>
>>>>
>>