RFR: 8311883: [Genshen] Adaptive tenuring threshold
Y. Srinivas Ramakrishna
ysr at openjdk.org
Tue Jul 11 18:02:31 UTC 2023
On Thu, 15 Jun 2023 20:08:58 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote:
> JDK-8311883 [GenShen] Adaptive tenuring
>
> I am opening this previously draft PR for formal preliminary review. It has already benefited from review feedback from a code walkthrough of an earlier version of the code. Most of that feedback and the corrections thereof are to be found in the comments in this PR. I have addressed a large majority of those comments, and am working on the last one that I plan to address as part of this PR. For the ones that I don't plan to address in this PR, I will create follow up tickets. Those will be added in the responses for the remaining feedback comments recorded in this PR's conversation.
>
> Preliminary testing w/SPECjbb didn't yield reliable performance data from which to infer any performance improvements stemming from enabling adaptive tenuring. I believe that was because of the way SPECjbb is run, which causes excessive degenerate and full gc's. I plan to collect SPECjbb numbers with a fixed lower max HBIR so as to be able to discern performance differences from this change, as well as Extremem workloads. Those will be added here once ready over the next few days.
For the case of `GenShenCensusAtEvac`, there was discussion regarding the appearance of age cohorts older than tenuring threshold for cohorts older than computed tenuring threshold. While objects older than tenuring threshold should be expected in the young generation because of being excluded from the collection set, it was surprising to have found that population in the census if it were not in the collection set. On the other hand, if it were in the collection set, one would have expected those instances to be promoted and not appear in subsequent censuses.
Check what's going on here and fix the problem, or find an explanation for the observed data.
Finally, in this case, since the data is available via region census, would it make sense to add it to the age table? We believe not, since the data may not be accurate because of undercounting the age. (Total this up in the noise vector to see what it looks like?)
For the case of `!GenShenCensusAtEvac`, when older cohorts above the tenuring threshold in a previous epoch are finally included in the collection set and promoted, the mortality rate shows up as`1.0`, causing the new tenuring threshold to shoot up. This may prevent timely promotion of a cohort that might otherwise have been considered eligible for promotion. We should see about avoiding this effect while still allowing the survivor ageing window to expand when necessary.
Consider using the census information collected at marking in the current cycle (and perhaps at evacuation in the previous cycle?) to inform the generation sizing/budgeting for the evacuation cycle to follow. Talk with Kelvin to discuss a suitable API and place to hook this into.
src/hotspot/share/gc/shared/ageTable.hpp line 60:
> 58:
> 59: void add(uint age, size_t oop_size) {
> 60: assert(age > 0 && age < table_size, "invalid age of object");
We now also include age 0 objects in the census for the case of concurrent collectors in the young generation. This slot would have `sizes[0] == 0` for our traditional use cases for generational stop-world collectors.
I'll upstream this change separately in a sibling PR/JBS. TBD.
src/hotspot/share/gc/shenandoah/shenandoahAgeCensus.cpp line 120:
> 118: _global_age_table[_epoch]->clear();
> 119: CENSUS_NOISE(_global_noise[_epoch].clear();)
> 120: if (!GenShenCensusAtEvac) {
In the case of `GenShenCensusAtEvac` this work is done by the calling method explicitly via `ingest` and `compute_...`
This should probably be done in a separate method/API to keep the API/spec of this method clean.
src/hotspot/share/gc/shenandoah/shenandoahAgeCensus.cpp line 174:
> 172: // that all higher ages have a mortality rate that is below a
> 173: // pre-specified threshold. We consider this to be the adaptive
> 174: // tenuring age to be used for the next cohort.
... for the next _cycle_.
src/hotspot/share/gc/shenandoah/shenandoahAgeCensus.cpp line 196:
> 194: // cohorts are considered eligible for tenuring when all older
> 195: // cohorts are.
> 196: if (i > 1 && (prev_pop < GenShenTenuringCohortPopulationThreshold ||
Consider getting a more accurate population for cohort 0 (by adding the population in each region above TAMS), so that we may consider the case of `i = 1` also above (i.e. `i >=1`). This would potentially allo tenuring threshold to go as low as 1 in some cases. This choice will need to be carefully considered.
src/hotspot/share/gc/shenandoah/shenandoahAgeCensus.cpp line 249:
> 247: log_info(gc, age)(UINTX_FORMAT "\t\t " SIZE_FORMAT "\t\t " SIZE_FORMAT "\t\t %.2f " ,
> 248: (uintx)i, prev_pop, cur_pop, mr);
> 249: }
vertical alignment via tabbing. Specify units for volume.
src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 442:
> 440: LogStream ls(lt);
> 441: heap->phase_timings()->print_cycle_on(&ls);
> 442: evac_tracker->print_evacuations_on(&ls, &evac_stats.workers,
Does this need to be an instance method?
src/hotspot/share/gc/shenandoah/shenandoahEvacTracker.cpp line 50:
> 48: _bytes_completed += bytes;
> 49: if (_generational && GenShenCensusAtEvac && young_gen) {
> 50: assert(age > 0, "Error");
Check that age is recorded _after_ incrementing at evacuation.
src/hotspot/share/gc/shenandoah/shenandoahEvacTracker.cpp line 62:
> 60: if (_generational && GenShenCensusAtEvac) {
> 61: // Discard const
> 62: _age_table.merge(((ShenandoahEvacuationStats*)other)->age_table());
Is the cast necessary?
src/hotspot/share/gc/shenandoah/shenandoahEvacTracker.cpp line 69:
> 67: _evacuations_completed = _evacuations_attempted = 0;
> 68: _bytes_completed = _bytes_attempted = 0;
> 69: if (_generational && GenShenCensusAtEvac) {
Avoid GenShen: use Shenandoah.
src/hotspot/share/gc/shenandoah/shenandoahEvacTracker.cpp line 96:
> 94: for (uint i = 0; i < heap->num_regions(); ++i) {
> 95: ShenandoahHeapRegion* r = heap->get_region(i);
> 96: if (r->age() > 0 && r->age() < AgeTable::table_size) {
It turns out that this was previously needed because one could not bound the region age. For now, I am dropping this test and instead bounding region ages, clamping them at `markWord::max_age`; see `ShenandoahHeapRegion::increment_age()`.
Should we reach a situation where we need to use larger ages than this, we'll need to revisit this code.
At this time, region ages pose a number of issues including injection of noise into object demographics and mortality rate calculations which drive our adaptive tenuring algorithm.
src/hotspot/share/gc/shenandoah/shenandoahEvacTracker.cpp line 112:
> 110: } else {
> 111: assert(r->is_old(), "Sanity");
> 112: old_region_ages.add(r->age(), r->get_live_data_words());
We currently don't do anything with ages of old regions as far as I can tell. Perhaps the idea is to use it in some way in the future?
src/hotspot/share/gc/shenandoah/shenandoahEvacTracker.cpp line 121:
> 119: st->cr();
> 120: st->print("Old regions: ");
> 121: old_region_ages.print_on(st, tenuring_threshold);
`tenuring_threshold` in this context doesn't make any sense for old regions.
src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 472:
> 470: _worker_id(worker_id) {
> 471: if (_heap->mode()->is_generational()) {
> 472: _tenuring_threshold = _heap->age_census()->tenuring_threshold();
Is this updated in the most recent stop-world marking, or is it using the one from the most recent concurrent collection? Could an interrupted concurrent marking cause a bogus census to be used here?
Partial censuses should be expunged.
src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 482:
> 480: if (!GenShenCensusAtEvac) {
> 481: // Age table updates
> 482: heap->update_epoch();
A different name for the method to avoid confusion.
src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 491:
> 489: AgeTable* get_age_table();
> 490: void update_epoch();
> 491: void reset_epoch();
`epoch` overload/confusion with prior use. Consider if this is really needed; if so, find better names to avoid confusion with other uses of the term `epoch` or methods thereof.
src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 484:
> 482: // If copying to the old generation, we don't care about recording
> 483: // object age in the census stats.
> 484: _evac_tracker->end_evacuation(thread, size * HeapWordSize, young_gen, young_gen ? ShenandoahHeap::get_object_age(copy_val) : 0);
Revisit this to see why we record age 0 for the case of old gen objects.
src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 549:
> 547: }
> 548:
> 549: uint ShenandoahHeap::get_object_age_concurrent(oop obj) {
Pla ce documentation comments distinguishing between the two methods here too.
src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 468:
> 466: uint youth() {
> 467: return CENSUS_NOISE(_youth)
> 468: NO_CENSUS_NOISE(0);
May nit need in product build.
src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 479:
> 477: CENSUS_NOISE(_youth++;)
> 478: }
> 479: }
Unused: remove.
src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp line 60:
> 58: }
> 59:
> 60: template <class T, ShenandoahGenerationType GENERATION, StringDedupMode STRING_DEDUP>
Consider extracting the GENERATION type from T (closure's type).
src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp line 102:
> 100: // from final- to strong mark.
> 101: if (task->count_liveness()) {
> 102: count_liveness<GENERATION>(live_data, obj, worker_id);
Experiment with pulling count_liveness into the closure `cl` which already knows the generation type.
src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp line 118:
> 116:
> 117: // Age census for young objects
> 118: if (GENERATION == YOUNG && !GenShenCensusAtEvac) {
Fold the two together.
src/hotspot/share/gc/shenandoah/shenandoahPacer.hpp line 30:
> 28: #include "gc/shenandoah/shenandoahNumberSeq.hpp"
> 29: #include "gc/shenandoah/shenandoahPadding.hpp"
> 30: #include "gc/shenandoah/shenandoahSharedVariables.hpp"
Check this again.
src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 38:
> 36: constraint) \
> 37: \
> 38: product(bool, GenShenCensusAtEvac, true, EXPERIMENTAL, \
GenShen -> ShenandoahGenerational, perhaps.
src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 39:
> 37: \
> 38: product(bool, GenShenCensusAtEvac, true, EXPERIMENTAL, \
> 39: "Object age census at evacuation, not mearking") \
"marking"
src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 47:
> 45: "Floor for adaptive tenuring threshold") \
> 46: \
> 47: product(uintx, GenShenMaxTenuringThreshold, 16, EXPERIMENTAL, \
Why not 15?
src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 50:
> 48: "Ceiling for adaptive tenuring threshold") \
> 49: \
> 50: product(double, GenShenTenuringMortalityRateThreshold, 0.3, EXPERIMENTAL, \
Experiment with different values here (0.1 may be more appropriate perhaps?).
Add guidance about which way (lower) to move this parm to keep objects in young gen longer.
-------------
PR Comment: https://git.openjdk.org/shenandoah/pull/289#issuecomment-1610591075
PR Comment: https://git.openjdk.org/shenandoah/pull/289#issuecomment-1610603183
PR Comment: https://git.openjdk.org/shenandoah/pull/289#issuecomment-1610606001
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1251339431
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244597619
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244595952
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244602974
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244593337
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1242896102
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1242898861
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1242898446
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1242900286
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1251338799
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1242770190
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1242769397
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1242913487
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1242915202
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244595089
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1242932802
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1242936397
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244379444
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244377135
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244382294
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244390867
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244386053
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244393342
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244336890
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244398919
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244401685
PR Review Comment: https://git.openjdk.org/shenandoah/pull/289#discussion_r1244412744
More information about the shenandoah-dev
mailing list