RFR: Expand old on demand [v44]
Y. Srinivas Ramakrishna
ysr at openjdk.org
Sat May 13 02:02:50 UTC 2023
On Fri, 12 May 2023 14:31:11 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:
>> This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway.
>>
>> This PR now passes all GHA pre-integration tests and other internal CI tests.
>
> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision:
>
> Report number of old- and young cset regions when preparing to rebuild free set
Skimmed over some of the diffs and left some comments. Haven't gotten into the details yet, but will once the overall structure and shape of the changes becomes more clear. Most of the current comments are of form/structure rather than of the details which I will defer until Monday.
src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 541:
> 539: size_t ShenandoahHeuristics::evac_slack(size_t young_regions_to_be_recycled) {
> 540: assert(false, "evac_slack() only implemented for young Adaptive Heuristics");
> 541: return 0;
This is a bit awkward. We should decide if this is _needed_ anywhere else other than its only currently intended use case. If it isn't needed anywhere else and shouldn't be called, I'd change the assert to a `guarantee(false)` (or `ShouldNotReachHere()`). If it shouldn't be needed, but it makes sense to have a default implementation and let it return 0, I'd remove the assert entirely and let it just return 0. If it turns out that you can't be sure, I'd go for the former for now.
src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.hpp line 176:
> 174: virtual void initialize();
> 175:
> 176: virtual size_t evac_slack(size_t region_to_be_recycled);
Please document the public spec of this method in a comment here. Is the intention here to make this a pure virtual too, or is it a good idea to provide a default implementation for subtypes that don't care?
src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 83:
> 81:
> 82: // Flags are set when promotion failure is detected (by gc thread), and cleared when
> 83: // old generation collection begins (by control thread). Flags are set and cleared at safepoints.
First sentence repeats 78-79. Can these be consolidated into a single block of declarations with a comment block preceding the block? Otherwise, please fix the comment so it doesn't confusingly refer to promotion failure.
src/hotspot/share/gc/shenandoah/shenandoahCollectionSet.hpp line 72:
> 70: // should be subtracted from what's available.
> 71: size_t _young_available_bytes_collected;
> 72: size_t _old_available_bytes_collected;
One typical HotSpot GC terminology is to use the term "free" to denote memory that is available for allocation, "garbage" for memory that is used by dead objects, and "used" for memory that is used by objects whose liveness is unknown (so they can't be collected) or that is wasted to fragmentation (internal or external).
I am wondering if it might make sense to use that terminology for brevity and consistency in method and data member names. In that case, it would see as though the above block might read:
// When a region with free memory is added to the collection set, that memory is no longer
// available for allocation until the region is evacuated.
size_t _young_free_bytes;
size_t _old_free_bytes;
In addition, you will need to state what these data members represent. Something like:
These represent the free space in the young and old generations, respectively.
src/hotspot/share/gc/shenandoah/shenandoahControlThread.hpp line 176:
> 174: bool in_graceful_shutdown();
> 175:
> 176: void service_concurrent_normal_cycle(ShenandoahHeap* heap,
Isn't this delta already there in another PR? I'd rebase this PR wrt that as parent, or land that and refresh this so as to avoid presenting the same diffs in two different PRs where possible.
src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1080:
> 1078: // Return the amount of young-gen memory that is about to be reycled
> 1079: void ShenandoahFreeSet::prepare_to_rebuild(size_t &young_cset_regions, size_t &old_cset_regions) {
> 1080: size_t region_size_bytes = ShenandoahHeapRegion::region_size_bytes();
This variable seems to be unused.
src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1088:
> 1086: // This places regions that have alloc_capacity into the old_collector set if they identify as is_old() or the
> 1087: // mutator set otherwise.
> 1088: find_regions_with_alloc_capacity(young_cset_regions, old_cset_regions);
This method needs a better name, as does `prepare_to_rebuild`.
One way to do this is to write out the API spec for the method clearly in the header file. A natural name should then suggest itself.
I think a natural name is `add_regions_with_free_space(&num_young, &num_old)`.
Confusingly, we pass in young & old region counts as reference parameters, but talk about old collector set and mutator set. I realize we are identifying young with mutator because mutators allocate only from young, but it helps to not cause confusion, but rather stick with one consistent nomenclature here and elsewhere, invoking the other nomenclature (gc and mutator) only when absolutely needed for some specific reason.
src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1091:
> 1089: }
> 1090:
> 1091: void ShenandoahFreeSet::rebuild() {
Why is this method called `rebuild`?
src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 175:
> 173: void flip_to_old_gc(ShenandoahHeapRegion* r);
> 174:
> 175: void adjust_bounds_for_additional_old_collector_free_region(size_t idx);
Where is this defined? The very long name makes me thing it could either be renamed or may have been awkwardly factored out.
src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 179:
> 177: void recompute_bounds();
> 178: void adjust_bounds();
> 179: bool touches_bounds(size_t num) const;
These were there before, so why are they appearing as new in the diffs?
src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 181:
> 179: bool touches_bounds(size_t num) const;
> 180:
> 181: // Used of free set represents the amount of is_mutator_free set that has been consumed since most recent rebuild.
That comment appears to belong where`used()` is declared?
I also see that all these lines are not new, so I am now quite sure why they are showing up as new diffs introduced in this PR.
src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 191:
> 189: // Return the updated value of affiliated_region_count
> 190: size_t decrease_affiliated_region_count(size_t delta);
> 191:
I realize the comments are talking about what is returned in each case, but I'd much rather they were clearer:
// {In,De}crement the region count by delta and return its new value.
I also feel it's overkill to have the method at line 184. Why not just use a default delta of 1, and do away with the 0-arg versions of these methods? This would also fix the current lack of checking for underflow from 0 in the case of the 0-arg decrement method.
The asymmetry between the increase/decrease methods also points out that, although very unlikely, there is nothing that is checking for the reasonableness of the delta in the case of the increment.
src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 228:
> 226: size_t _regular_regions_promoted_in_place;
> 227: size_t _regular_usage_promoted_in_place;
> 228:
Although these names may in many cases be "self-explanatory" to those familiar with the code, I think they need at least a brief 1-line description each.
src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 362:
> 360: // Allocation of old GCLABs (aka PLABs) assures that _old_evac_expended + request-size < _old_evac_reserved. If the allocation
> 361: // is authorized, increment _old_evac_expended by request size. This allocation ignores old_gen->available().
> 362:
This documentation comment seems like a stream-of-consciousness thing. I think we need this comment to move to the right spot.
src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 533:
> 531: void recycle_trash();
> 532: public:
> 533: void rebuild_free_set(bool concurrent);
Please document the method's public specification so a caller is aware as to what they should expect the method to do.
src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 710:
> 708: //
> 709: private:
> 710: // How many bytes to transfer between old and young after we have finished recycling collection set regions?
General comment (no action needed at this time) about region counts. It appears to me that when talking about quantities that might take positive (`surplus`) or negative (`deficit`) values that are suitably bounded, it might make sense, where possible, to have a single variable of type `ssize_t` and make use of signs for that one variable. Is this such an opportunity?
src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 747:
> 745: void set_young_lab_region_flags();
> 746:
> 747: inline void set_old_region_surplus(size_t surplus) { _old_regions_surplus = surplus; };
I'd avoid the unnecessary inconsistency between `set_old_region_surplus` and `_old_regions_surplus`. I prefer both to use the singular form of `region`.
src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 50:
> 48: * of the generation that is consuming the most CPU time. The assumption being
> 49: * that increasing memory will reduce the collection frequency and raise the
> 50: * MMU.
I believe these have already been split out into a separate PR. Let's drop these from this PR by producing a dependent PR to avoid having to look at these diffs again.
src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 102:
> 100: product(uintx, ShenandoahOldGarbageThreshold, 15, EXPERIMENTAL, \
> 101: "How much garbage an old region has to contain before it would " \
> 102: "be taken for collection.") \
"An old region must contain at least this percentage of garbage before being eligible for evacuation."
What happens when the user specifies a number like 0 or 100? Does it have the requisite effect? For 0, I would expect the region to be eligible for evacuation even when there is no garbage in it and, for 100, I would expect it to be collected only when it is fully garbage and not otherwise. Is this a reasonable expectation? Are there tests that check for this?
src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 137:
> 135: "(soft) max heap size.") \
> 136: range(0,100) \
> 137: \
Would it be correct to assume that this behaviour is no longer present or, if present, no longer directly controllable in this manner?
It'd be great to have a PR comment against each of these knob changes motivating the changes to their default values, or of their no longer being available.
src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 206:
> 204: "milliseconds. Setting this to 0 disables the feature.") \
> 205: \
> 206: product(uintx, ShenandoahGuaranteedYoungGCInterval, 30*1000, EXPERIMENTAL, \
Why was this made so small? Are there cases where, say, for a quiescent/idle service instance with little or no load, we must still, for some reason, trigger young gc's twice each minute?
src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 309:
> 307: "evacuation conflicts, at expense of evacuating less on each " \
> 308: "GC cycle. Smaller values increase the risk of evacuation " \
> 309: "failures, which will trigger stop-the-world Full GC passes.") \
The terms `waste` and `conflicts` are not clear to me. Can we write this in a way that someone not familiar with the internals of the collector can understand without having to dig through the code to understand what `waste` and `evacuation conflict` are? Or include just enough for them to be able to use these knobs with some amount of instinct rather than knob-twiddling "because it was there".
It looks to me like a "magic number estimate" that the user produces out of thin air that the collector then uses in its accounting, and then real life unfolds in such a manner that this determines risk vs reward or efficiency of space usage. In that sense this is more like a "magic padding factor". If so, I'd describe in a line or two what makes the actual space usage more than the size of the evacuated objects, why it's difficult to accurately assess ahead of time, and why a larger value potentially wastes space by allowing more headroom while avoiding running out of space while a smaller value wastes less space while running a greater risk of running out of space in tight situations where it attempted an evacuation.
The talk of waste as related to "conflict" is a bit confusing to me at least without knowing what conflict is and why it must lead to wastage of space.
This probably needs more discussion to see why we may not be able to learn something better based on measurement as the collections unfold, based on an initial rough estimate to begin with and then refining it with a suitable padding for uncertainty over time.
src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 341:
> 339: "The maximum proportion of evacuation from old-gen memory, as " \
> 340: "a percent ratio. The default value 75 denotes that no more " \
> 341: "than 75% of the collection set evacuation " \
Is the underlying quantity the number of bytes eligible for evacuation or the number of regions eligible for evacuation? Perhaps make that more clear in the description string.
src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 367:
> 365: range(0, 100) \
> 366: \
> 367: product(uintx, ShenandoahMaxYoungPercentage, 100, EXPERIMENTAL, \
If a user sets MinYoung == MaxYoung == 100, does one get single-generation Shenandoah? Or is that request rejected? What if both are set to 0? Are there sanity checks or tests that would fail today with such settings?
test/hotspot/jtreg/ProblemList.txt line 90:
> 88: gc/shenandoah/TestDynamicSoftMaxHeapSize.java#generational 8306333 generic-all
> 89: gc/stress/gclocker/TestGCLockerWithShenandoah.java#generational 8306341 generic-all
> 90:
Nice!!
test/hotspot/jtreg/ProblemList.txt line 95:
> 93: # gc/shenandoah/oom/TestClassLoaderLeak.java 8306336 generic-all
> 94: # gc/shenandoah/TestDynamicSoftMaxHeapSize.java#generational 8306333 generic-all
> 95: # gc/stress/gclocker/TestGCLockerWithShenandoah.java#generational 8306341 generic-all
I'd just delete these when ready to land, and close the tickets as duplicates of the ticket for this PR.
-------------
Changes requested by ysr (Committer).
PR Review: https://git.openjdk.org/shenandoah/pull/248#pullrequestreview-1425202534
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192832501
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192826572
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192835927
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192844592
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192892333
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192883701
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192884145
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192886058
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192887256
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192887595
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192887717
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192890047
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192890299
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192890578
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192890838
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192891449
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192891752
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192893088
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192894139
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192894384
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192894759
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192899320
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192897901
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192897629
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192899482
PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192899582
More information about the shenandoah-dev
mailing list