RFR(XXS): 8244278: Excessive code cache flushes and sweeps
Nils Eliasson
nils.eliasson at oracle.com
Wed May 6 17:12:26 UTC 2020
After posting this I noticed another problem. The sweeper should wake up
regularly, but now it is only awakened when hitting the SweepAggressive
threshold. This is wrong.
I suggest holding of the fix until all the problems have been ironed out.
Best regards,
Nils Eliasson
On 2020-05-06 11:52, Nils Eliasson wrote:
>
>
> On 2020-05-05 20:57, Man Cao wrote:
>> Hi Laurent, Nils,
>>
>> > Using 40mb code cache looks quite small compared to default values
>> (256mb ?).
>> The 40MB code cache size is for -XX:-TieredCompilation, for the
>> DaCapo runs.
>> "java -XX:-TieredCompilation -XX:+PrintFlagsFinal |& grep
>> ReservedCodeCacheSize" shows the default size is 48MB for
>> -TieredCompilation.
>> So 40MB is not particularly small compared to 48MB.
>>
>> The Blaze runs use "-XX:+TieredCompilation" and a Google-default code
>> cache size of 480MB (we have doubled the default size).
>> This actually tests a case that is closer to the OpenJDK default.
>>
>> > Could you run 1 quick experiment with default code cache settings
>> to see if there is no regression in the main use case?
>> Yes, I just launched an experiment running DaCapo at JDK source tip,
>> only with "-Xms4g -Xmx4g" and some logging flags.
>> This is a meaningful experiment as it uses the real OpenJDK default
>> flags, and the most up-to-date source.
>>
>> > Why do you expect code cache usage would increase a lot? The sweeper
>> > still wakes up regularly and cleans the code cache. The code path
>> fixed
>> > is just about sweeping extra aggressively under some circumstances.
>> Some
>> > nmethod might live a little longer, but they will still be cleaned.
>> I found the aggressive sweeping deoptimized a non-trivial number of
>> nmethods, which kept the usage low.
>> After my bugfix, the sweeper only wakes up when the usage reaches
>> (100-StartAggressiveSweepingAt)% of code cache capacity, which
>> default to 90%.
>> This means many nmethods will not be cleaned until there is pressure
>> in code cache usage.
>>
>> > One number you could add to your benchmark numbers is the number of
>> > nmethods reclaimed and code cache usage. I expect both to remain
>> the same.
>> The benchmark result htmls already show the code cache usage (Code
>> Cache Used (MB)).
>> You can CTRL-F for "Code Cache" in the browser.
>>
>> There is a significant increase in code cache usage:
>> For DaCapo, up to 13.5MB (for tradesoap) increase for the 40MB
>> ReservedCodeCacheSize.
>> For Blaze, the increase is 33MB-80MB for the
>> 480MB ReservedCodeCacheSize.
>> The code cache usage metric is measure like this (we added an
>> hsperfdata counter for it), at the end of a benchmark run:
>> size_t result = 0;
>> FOR_ALL_ALLOCABLE_HEAPS(heap) {
>> result += (*heap)->allocated_capacity();
>> }
>> _code_cache_used_size->set_value(result);
>>
>> I also looked at the logs, it shows that the bugfix eliminated almost
>> all of the code cache flushes. They also contain the number of
>> nmethods reclaimed.
>> E.g., for tradesoap:
>>
>> Without the fix:
>> Code cache sweeper statistics:
>> Total sweep time: 1902 ms
>> Total number of full sweeps: 23301
>> Total number of flushed methods: 7080 (thereof 7080 C2 methods)
>> Total size of flushed methods: 20468 kB
>> With the fix:
>> Code cache sweeper statistics:
>> Total sweep time: 0 ms
>> Total number of full sweeps: 0
>> Total number of flushed methods: 0 (thereof 0 C2 methods)
>> Total size of flushed methods: 0 kB
>>
>
> Interesting results and a clear indication that something is broken in
> the sweeper heuristics - Nmethods should still be flushed!
>
> Looking at sweeper.cpp I see something that looks wrong. The
> _last_sweep counter is updated even if no sweep was done. In low code
> cache usage scenarios that means will might never reach the threshold.
>
> I think the curly brace should be moved done like this:
>
> --- a/src/hotspot/share/runtime/sweeper.cpp Tue May 05 21:28:46
> 2020 +0200
> +++ b/src/hotspot/share/runtime/sweeper.cpp Wed May 06 11:46:01
> 2020 +0200
> @@ -445,16 +445,17 @@
> if (_should_sweep || forced) {
> init_sweeper_log();
> sweep_code_cache();
> +
> + // We are done with sweeping the code cache once.
> + _total_nof_code_cache_sweeps++;
> + _last_sweep = _time_counter;
> + // Reset flag; temporarily disables sweeper
> + _should_sweep = false;
> + // If there was enough state change, 'possibly_enable_sweeper()'
> + // sets '_should_sweep' to true
> + possibly_enable_sweeper();
> }
>
> - // We are done with sweeping the code cache once.
> - _total_nof_code_cache_sweeps++;
> - _last_sweep = _time_counter;
> - // Reset flag; temporarily disables sweeper
> - _should_sweep = false;
> - // If there was enough state change, 'possibly_enable_sweeper()'
> - // sets '_should_sweep' to true
> - possibly_enable_sweeper();
> // Reset _bytes_changed only if there was enough state change.
> _bytes_changed
> // can further increase by calls to 'report_state_change'.
> if (_should_sweep) {
>
> Can you try it out and see if things improve?
>
> Best regards,
> Nils Eliasson
>
>
>> Anyway, just to reiterate, we think the improvement in throughput and
>> CPU usage is well worth the increase in code cache usage.
>> If the increase causes any problem, we would advise users to accept
>> the increase and fully provision memory for the
>> entire ReservedCodeCacheSize.
>>
>> -Man
>>
>>
>> On Tue, May 5, 2020 at 3:18 AM Nils Eliasson
>> <nils.eliasson at oracle.com <mailto:nils.eliasson at oracle.com>> wrote:
>>
>> Hi Man,
>>
>> Why do you expect code cache usage would increase a lot? The sweeper
>> still wakes up regularly and cleans the code cache. The code path
>> fixed
>> is just about sweeping extra aggressively under some
>> circumstances. Some
>> nmethod might live a little longer, but they will still be cleaned.
>>
>> Without your bugfix the sweeper will be notified for every new
>> allocation in the codecache as soon as code cache usages has gone
>> beyond
>> 10%. That could in the worst case be one sweep for every allocation.
>>
>> One number you could add to your benchmark numbers is the number of
>> nmethods reclaimed and code cache usage. I expect both to remain
>> the same.
>>
>> Best regards,
>> Nils
>>
>>
>>
>> On 2020-05-04 21:21, Man Cao wrote:
>> > Hi,
>> >
>> > Thanks for the review!
>> > Yes, the code change is trivial, but runtime behavior change is
>> > considerable.
>> > In particular, throughput and CPU usage could improve, but code
>> cache usage
>> > could increase a lot. In our experience, the improvement in
>> throughput and
>> > CPU is well worth the code cache increase.
>> >
>> > I have attached some benchmarking results in JBS. They are based
>> on JDK11.
>> > We have not rolled out this fix to our production JDK11 yet, as
>> I'd like to
>> > confirm that this large change in runtime behavior is OK with
>> the OpenJDK
>> > community.
>> > We are happy to share some performance numbers from production
>> workload
>> > once we have them.
>> >
>> > -Man
>> >
>> >
>> > On Mon, May 4, 2020 at 4:11 AM Laurent Bourgès
>> <bourges.laurent at gmail.com <mailto:bourges.laurent at gmail.com>>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> Do you have performance results to justify your assumption "could
>> >> significantly improve performance" ?
>> >>
>> >> Please share numbers in the jbs bug
>> >>
>> >> Laurent
>> >>
>> >> Le sam. 2 mai 2020 à 07:35, Man Cao <manc at google.com
>> <mailto:manc at google.com>> a écrit :
>> >>
>> >>> Hi all,
>> >>>
>> >>> Can I have reviews for this one-line change that fixes a bug
>> and could
>> >>> significantly improve performance?
>> >>> Webrev: https://cr.openjdk.java.net/~manc/8244278/webrev.00/
>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8244278
>> >>>
>> >>> It passes tier-1 tests locally, as well as
>> >>> "vm/mlvm/meth/stress/compiler/deoptimize" (for the original
>> JDK-8046809).
>> >>>
>> >>> -Man
>> >>>
>>
>
More information about the hotspot-compiler-dev
mailing list