RFR(XXS): 8244278: Excessive code cache flushes and sweeps

Wed May 6 09:52:05 UTC 2020

On 2020-05-05 20:57, Man Cao wrote:
> Hi Laurent, Nils,
>
> > Using 40mb code cache looks quite small compared to default values 
> (256mb ?).
> The 40MB code cache size is for -XX:-TieredCompilation, for the DaCapo 
> runs.
> "java -XX:-TieredCompilation -XX:+PrintFlagsFinal |& grep 
> ReservedCodeCacheSize" shows the default size is 48MB for 
> -TieredCompilation.
> So 40MB is not particularly small compared to 48MB.
>
> The Blaze runs use "-XX:+TieredCompilation" and a Google-default code 
> cache size of 480MB (we have doubled the default size).
> This actually tests a case that is closer to the OpenJDK default.
>
> > Could you run 1 quick experiment with default code cache settings to 
> see if there is no regression in the main use case?
> Yes, I just launched an experiment running DaCapo at JDK source tip, 
> only with "-Xms4g -Xmx4g" and some logging flags.
> This is a meaningful experiment as it uses the real OpenJDK default 
> flags, and the most up-to-date source.
>
> > Why do you expect code cache usage would increase a lot? The sweeper
> > still wakes up regularly and cleans the code cache. The code path fixed
> > is just about sweeping extra aggressively under some circumstances. Some
> > nmethod might live a little longer, but they will still be cleaned.
> I found the aggressive sweeping deoptimized a non-trivial number of 
> nmethods, which kept the usage low.
> After my bugfix, the sweeper only wakes up when the usage reaches 
> (100-StartAggressiveSweepingAt)% of code cache capacity, which default 
> to 90%.
> This means many nmethods will not be cleaned until there is pressure 
> in code cache usage.
>
> > One number you could add to your benchmark numbers is the number of
> > nmethods reclaimed and code cache usage. I expect both to remain the 
> same.
> The benchmark result htmls already show the code cache usage (Code 
> Cache Used (MB)).
> You can CTRL-F for "Code Cache" in the browser.
>
> There is a significant increase in code cache usage:
> For DaCapo, up to 13.5MB (for tradesoap) increase for the 40MB 
> ReservedCodeCacheSize.
> For Blaze, the increase is 33MB-80MB for the 480MB ReservedCodeCacheSize.
> The code cache usage metric is measure like this (we added an 
> hsperfdata counter for it), at the end of a benchmark run:
>   size_t result = 0;
>   FOR_ALL_ALLOCABLE_HEAPS(heap) {
>     result += (*heap)->allocated_capacity();
>   }
>   _code_cache_used_size->set_value(result);
>
> I also looked at the logs, it shows that the bugfix eliminated almost 
> all of the code cache flushes. They also contain the number of 
> nmethods reclaimed.
> E.g., for tradesoap:
>
> Without the fix:
> Code cache sweeper statistics:
>    Total sweep time:                1902 ms
>    Total number of full sweeps:     23301
>    Total number of flushed methods: 7080 (thereof 7080 C2 methods)
>    Total size of flushed methods:   20468 kB
> With the fix:
> Code cache sweeper statistics:
>    Total sweep time:                0 ms
>    Total number of full sweeps:     0
>    Total number of flushed methods: 0 (thereof 0 C2 methods)
>    Total size of flushed methods:   0 kB
>

Interesting results and a clear indication that something is broken in 
the sweeper heuristics - Nmethods should still be flushed!

Looking at sweeper.cpp I see something that looks wrong. The _last_sweep 
counter is updated even if no sweep was done. In low code cache usage 
scenarios that means will might never reach the threshold.

I think the curly brace should be moved done like this:

--- a/src/hotspot/share/runtime/sweeper.cpp     Tue May 05 21:28:46 2020 
+0200
+++ b/src/hotspot/share/runtime/sweeper.cpp     Wed May 06 11:46:01 2020 
+0200
@@ -445,16 +445,17 @@
    if (_should_sweep || forced) {
      init_sweeper_log();
      sweep_code_cache();
+
+    // We are done with sweeping the code cache once.
+    _total_nof_code_cache_sweeps++;
+    _last_sweep = _time_counter;
+    // Reset flag; temporarily disables sweeper
+    _should_sweep = false;
+    // If there was enough state change, 'possibly_enable_sweeper()'
+    // sets '_should_sweep' to true
+    possibly_enable_sweeper();
    }

-  // We are done with sweeping the code cache once.
-  _total_nof_code_cache_sweeps++;
-  _last_sweep = _time_counter;
-  // Reset flag; temporarily disables sweeper
-  _should_sweep = false;
-  // If there was enough state change, 'possibly_enable_sweeper()'
-  // sets '_should_sweep' to true
-  possibly_enable_sweeper();
    // Reset _bytes_changed only if there was enough state change. 
_bytes_changed
    // can further increase by calls to 'report_state_change'.
    if (_should_sweep) {

Can you try it out and see if things improve?

Best regards,
Nils Eliasson


> Anyway, just to reiterate, we think the improvement in throughput and 
> CPU usage is well worth the increase in code cache usage.
> If the increase causes any problem, we would advise users to accept 
> the increase and fully provision memory for the 
> entire ReservedCodeCacheSize.
>
> -Man
>
>
> On Tue, May 5, 2020 at 3:18 AM Nils Eliasson <nils.eliasson at oracle.com 
> <mailto:nils.eliasson at oracle.com>> wrote:
>
>     Hi Man,
>
>     Why do you expect code cache usage would increase a lot? The sweeper
>     still wakes up regularly and cleans the code cache. The code path
>     fixed
>     is just about sweeping extra aggressively under some
>     circumstances. Some
>     nmethod might live a little longer, but they will still be cleaned.
>
>     Without your bugfix the sweeper will be notified for every new
>     allocation in the codecache as soon as code cache usages has gone
>     beyond
>     10%. That could in the worst case be one sweep for every allocation.
>
>     One number you could add to your benchmark numbers is the number of
>     nmethods reclaimed and code cache usage. I expect both to remain
>     the same.
>
>     Best regards,
>     Nils
>
>
>
>     On 2020-05-04 21:21, Man Cao wrote:
>     > Hi,
>     >
>     > Thanks for the review!
>     > Yes, the code change is trivial, but runtime behavior change is
>     > considerable.
>     > In particular, throughput and CPU usage could improve, but code
>     cache usage
>     > could increase a lot. In our experience, the improvement in
>     throughput and
>     > CPU is well worth the code cache increase.
>     >
>     > I have attached some benchmarking results in JBS. They are based
>     on JDK11.
>     > We have not rolled out this fix to our production JDK11 yet, as
>     I'd like to
>     > confirm that this large change in runtime behavior is OK with
>     the OpenJDK
>     > community.
>     > We are happy to share some performance numbers from production
>     workload
>     > once we have them.
>     >
>     > -Man
>     >
>     >
>     > On Mon, May 4, 2020 at 4:11 AM Laurent Bourgès
>     <bourges.laurent at gmail.com <mailto:bourges.laurent at gmail.com>>
>     > wrote:
>     >
>     >> Hi,
>     >>
>     >> Do you have performance results to justify your assumption "could
>     >> significantly improve performance" ?
>     >>
>     >> Please share numbers in the jbs bug
>     >>
>     >> Laurent
>     >>
>     >> Le sam. 2 mai 2020 à 07:35, Man Cao <manc at google.com
>     <mailto:manc at google.com>> a écrit :
>     >>
>     >>> Hi all,
>     >>>
>     >>> Can I have reviews for this one-line change that fixes a bug
>     and could
>     >>> significantly improve performance?
>     >>> Webrev: https://cr.openjdk.java.net/~manc/8244278/webrev.00/
>     >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8244278
>     >>>
>     >>> It passes tier-1 tests locally, as well as
>     >>> "vm/mlvm/meth/stress/compiler/deoptimize" (for the original
>     JDK-8046809).
>     >>>
>     >>> -Man
>     >>>
>