RFR: 8309639: GenShen: Regression in LRU cache benchmark
Y. Srinivas Ramakrishna
ysr at openjdk.org
Thu Aug 17 21:01:05 UTC 2023
On Wed, 16 Aug 2023 23:31:33 GMT, William Kemper <wkemper at openjdk.org> wrote:
> Shenandoah's adaptive heuristic begins in a state where it is "learning" the application. In this state, it triggers collections somewhat more aggressively in an effort to "learn" the allocation rate and cycle times for the application.
>
> Shenandoah also has the ability to "skip" evacuation phases when marking finds sufficient garbage in regions with no live objects. We call these "abbreviated" cycles. During the course of development for the generational mode, we found that such cycles tend to decrease the average cycle time tracked by the heuristic. This in turn leads to the heuristic triggering a cycle "too late" - because it expects the cycle to take less time than a normal (i.e., not abbreviated) cycle requires. To address this, we modified the heuristic to not include abbreviated cycle times in the average cycle time.
>
> Unfortunately, this change also made the heuristic not count abbreviated cycles as "learning" cycles. Ordinarily, this is not a concern because the application will usually run plenty of "normal" cycles. However, we've discovered that the certain benchmarks and jtreg tests only run abbreviated cycles. This causes the heuristic to stay in the "learning" state for the duration of the benchmark. In this state, the GC is run more aggressively and performance suffers (15-20%).
LGTM. In addition to the code documentation remark, can I also request that the observed change in performance with your favorite benchmark or test be included either here in the PR or in the JBS ticket?
I am just curious how the learning data is used. If that is entirely abbreviated cycles, say, wouldn't you have the same issue? And skipping abbreviated cycle counting after the learning steps would leave you in the same state of under-predicting when a normal cycle does occur. Does the small number of recorded samples then save you because of the potentially larger padding because of the potentially larger standard deviation than if you counted all of the abbreviated cycles?
Adversarial workload scenarios are easy to imagine, but may be real applications don't behave like I was thinking of adversarial scenarios, so the performance data that you share would be useful to allay such fears.
Allow me to do some thinking out loud here. May be there's a flaw in my thinking below.
In general, my instinct would be that including abbreviated cycles would be fine; if they are frequent then the infrequent normal cycles increase the standard deviation (even if it might subsequently decay). Conversely, if abbreviated cycles are infrequent then we should be fine. May be thinking of the benchmarks/tests that gained from the change in heuristic would allow us to determine why the scenarios in which this helped and why. In particular, was the load presented by the benchmark very spiky, and this allowed us to tread in the direction of being conservative. Still, why doesn't a degenerate collection swing us back into the right spot? Does it do that but do we swing back again to complacency? What was the difference between the durations for abbreviated and normal cycles in the cases where we saw this improvement? I'd have expected the standard deviation to have provided a buffer/cushion.
Thanks!
src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 241:
> 239: adjust_penalty(Concurrent_Adjust);
> 240:
> 241: if (_gc_times_learned <= ShenandoahLearningSteps || !(abbreviated && ShenandoahAdaptiveIgnoreShortCycles)) {
Can you leave a documentation comment here recording the intent in English?
As I read it, you include the time in the recorded history if any of the following hold:
1. we haven't reached the required number of learning steps, or
2. we are not an abbreviated cycle, or
3. we haven't been asked to ignore short cycles
To the extent that this is so, I think the test would read easier if you de-Morganized the conditional into:
if (_gc_times_learned <= ShenandoahLearningSteps || !abbreviated || !ShenandoahAdaptiveIgnoreShortCycles)) {
Anyway, something for us to remember why we have this condition would be useful as well; your PR explanation is excellent. May be a condensed form of that would be appropriate as a documentation comment here?
-------------
Marked as reviewed by ysr (Committer).
PR Review: https://git.openjdk.org/shenandoah/pull/307#pullrequestreview-1583439262
PR Review Comment: https://git.openjdk.org/shenandoah/pull/307#discussion_r1297718868
More information about the shenandoah-dev
mailing list