RFR(M): 8244660: Code cache sweeper heuristics is broken

Thu May 14 02:48:20 UTC 2020

Hi Nils,

I have done more DaCapo benchmarking with the patches.
Overall, the result looks good, and your fix indeed reduces sweep frequency
than the current state.
It retains possible performance improvement and does not introduce
unnecessary increase in code cache usage.

All results are available at
https://cr.openjdk.java.net/~manc/8244660_benchmarks/.
I have also included counters for used code cache size and sweeper
statistics in the graphs.
These metrics are collected using this patch:
https://cr.openjdk.java.net/~manc/8244660_benchmarks/hsperfcounters_webrev/
All runs are with "-Xms4g -Xmx4g -XX:-TieredCompilation", because
-TieredCompilation matters a lot for our workload.
Also note that the numbers for throughput/CPU and GC exclude the warmup
iterations. The codecache/sweeper statistics account for all iterations
(including warmups).

Comparing 3 JDK builds:
https://cr.openjdk.java.net/~manc/8244660_benchmarks/20200508-JDKHead-dacapoLarge4G-sweeperPatches.html
base: current state with no pending patches
allFixes: with patches for JDK-8244660, JDK-8244278 and JDK-8244658
sweepAt90: with only the patch for JDK-8244278, so it's the same as the
config I used in previous results in JDK-8244278.
"allFixes" reduced sweep frequency than "base", without introducing much
increase in code cache usage.

Same as above, but with -XX:ReservedCodeCacheSize=40m:
https://cr.openjdk.java.net/~manc/8244660_benchmarks/20200512-JDKHead-dacapoLarge4G-sweeperPatches-CodeCache40MB.html
"allFixes" retains the throughput and CPU improvement for tradesoap,
perhaps even better than not sweeping ("sweepAt90").
Code cache usage for tradesoap is between "base" and not sweeping, which is
OK in my opinion.

I think 1/100 of a 240mb default code cache seems a bit high. During
> startup we produce a lot of L3 code that will be thrown away. We want to
> recycle it fairly quickly, to avoid fragmenting the code cache, but not
> that often that we affect startup.
> I've done some startup measurements, and then we sweep about every other
> second in a benchmark that produces a lot of code.
> What results are you seeing?

The 1/256 capped at 1MB seems OK.
Even with 40MB or 48MB code cache size with -TieredCompilation, it does not
flush too frequently.

Code cache flushing has another heuristic - it might be broken too. But
> it would be interesting too see how it works with the new sweep
> heuristic. If you know that you have enough code cache - turning it off
> is no loss. It only helps when you are running out of code cache.

> When we are doing normal sweeping - we don't deoptimize cold code. That
> is handled my the method flushing - it should only kick in when we start
> to run out of code cache.

I think we should address MethodFlushing in a separate RFE/BUG.

Thanks for explaining this.
I did some benchmarking with -XX:NmethodSweepActivity and
-XX:MinPassesBeforeFlush, on top of the "allFixes" config:
https://cr.openjdk.java.net/~manc/8244660_benchmarks/20200508-JDKHead-dacapoLarge4G-NmethodSweepActivity.html
https://cr.openjdk.java.net/~manc/8244660_benchmarks/20200508-JDKHead-dacapoLarge4G-MinPassesBeforeFlush.html
xalan, jython look better with small values, pmd looks worse.
I'll follow up separately if I find anything wrong with the
flushing/cold-code-deoptimization heuristic

The heuristics for CodeAging may have been negatively affected by the
> transition to handshakes. Also the SetHotnessClosure should be replaced
> by a mechanism using the NMethodEntry barriers.
> I see that we are missing JFR events for MethodFlushing. I have created
> another patch for that.

Although I'm not very familiar with these, thanks for identifying and
fixing these issues!

-Man