RFR (S/M): 8012547: Reclamation of flushed methods can be slow

Wed Apr 24 07:06:54 PDT 2013

Hi again,

Further testing revealed that not all issues was fully identified. 
Flushing could still cause compiles to halt even though compilations was 
on, and turning off compilations when the code cache was really full 
made the sweeper run much less often, prolonging the problem.

Change log:
* Fix the comment Christian mentioned, and some other comments too.
* find_and_remove_saved_code renamed to reanimate_saved_code to 
distinguish it from remove_saved_code that actually removes it.
* dont bail out on need_flushing() in compile_method - that causes us to 
stop compiling as soon as we hit CodeCacheFlushingMinimumFreeSpace - and 
that wasn't the idea.
* New product flag CodeCacheFlushingFraction (defaults to 2)
* _was_full removed. Was a redundant way to say that we stopped 
compilations.
* _last_was_full renamed to _last_full_flush_time, to emphasize that it 
is the time when we hit the hard limit CodeCacheMinimumSpace and 
compilations are stopped
* _was_full_traversal renamed to _last_flush_traversal_id. 'full' means 
the hard CodeCacheMinimumFreeSpace limit in all other variable names. 
* introduced _dead_compile_ids that hold a count for the number of 
compile ID that have been reclaimed. Gives a more accurate way to flush 
the cache.
* Simplified scan_stacks
   - restore flush token when min interval has passed
   - start compilations again if flushing is no longer needed
* Speed up sweeper when compilations is turned off. Sweeping is slowed 
down when compiler threads are sleeping on the CodeCache lock, and we 
want to speed up the reclaim process so that we can turn on compiler again.
* Let sweeper make disconnected methods non_entrant even if we don't 
need flushing any more. Makes us less likely to hit a new flush.
* Refactored NMethodSweeper::handle_full_code_cache
   - flushed and full flushes now uses the same token to synchronize - 
otherwise we might get two back-to-back flushes. And that doesn't help 
anyone.
   - remove time check - flush_token is restored in scan_stacks where 
the time is already checked.
* set _resweep flag when flushing.

The code is much easier to read now and the behavior should be less 
surprising.

http://cr.openjdk.java.net/~neliasso/8012547/webrev.03/ 
<http://cr.openjdk.java.net/%7Eneliasso/8012547/webrev.03/>

Thanks,
Nils Eliasson

On 2013-04-18 15:09, Nils Eliasson wrote:
> Hi!
>
> I have another fix to the code cache sweeper and flushing that needs a 
> review.
>
> The major change is to remove a check in scan_stacks that stops the 
> sweeper when the cache is getting  full. The normal mode is to sweep 
> and record if any change has happened that require another sweep. This 
> check stops the sweeper early causing some methods that are 
> speculatively disconnected to stay so for an unnecessary long time 
> sometimes causing unnecessary new flushes.
>
> Also some refactoring
> - remove state variable _do_sweep that was unnecessary. It marked if a 
> sweep was active or not, but just a duplicate way of checking if any 
> methods are being sweept (nmethodsweeper::current != NULL).
> - rename _rescan to _resweep. When _resweep is set there will be 
> another sweep started when the current ends. That sweep will start 
> with a scan, but it is not only a scan.
> - rename _advise_to_sweep to _flush_token. Is CASed by the first 
> thread that wants to flush and reset by scan_stacks when the flush is 
> finished and a proper time has passed.
>
> http://cr.openjdk.java.net/~neliasso/8012547/ 
> <http://cr.openjdk.java.net/%7Eneliasso/8012547/>
> https://jbs.oracle.com/bugs/browse/JDK-8012547
>
> Thanks,
> Nils E.