From ysr at openjdk.org Thu Feb 1 07:21:11 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 1 Feb 2024 07:21:11 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 16:28:15 GMT, Kelvin Nilsen wrote: >> Several objectives: >> 1. Reduce humongous allocation failures by segregating regular regions from humongous regions >> 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB >> 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations >> 4. Treat collector reserves as available for Mutator allocations after evacuation completes >> 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah >> >> On internal performance pipelines, this change shows: >> >> 1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions. >> 2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%). This system CPU time increase appears to correlate with increased page faults and/or rss. >> 3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood) >> 4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs) >> >> Shenandoah >> ------------------------------------------------------------------------------------------------------- >> +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000 >> Control: 819938.875 (+/-5724.56 ) 40 >> Test: 2185552.625 (+/-26378.64 ) 20 >> >> +166.16% scimark.sparse.large/rss_max p=0.00000 >> Control: 3285226.375 (+/-22812.93 ) 40 >> Test: 8743881.500 (+/-104906.69 ) 20 >> >> +164.78% sunflow/cpu_system p=0.00000 >> Control: 1.280s (+/- 0.10s ) 40 >> Test: 3.390s (+/- 0.13s ) 20 >> >> +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000 >> Control: 3.259 (+/- 1.46 ) 33 >> Test: 8.125 (+/- 2.05 ) 20 >> >> +143.75% pmd/major_page_fault_count p=0.03622 >> Control: 1.000 (+/- 0.00 ) 40 >> Test: 2.438 (+/- 2.59 ) 20 >> >> +80.22% lusearch/minor_page_fault_count p=0.00000 >> Control: 2043930.938 (+/-4777.14 ) 40 >> Test: 3683477.625 (+/-5650.29 ) 20 >> >> +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000 >> Control: 697899.156 (+/-3457.82 ) 40 >> Test: 1050363.812 (+/-175... > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Rename and comments for _capacity_of and _used_by A few more comments. I expect my next round to be the last one. I think we are almost there. Sorry for the delay and for the length of the review comments. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1001: > 999: > 1000: if (VerifyAfterGC) { > 1001: Universe::verify(); This line deletion seems to be the only change now in this file. So this file can be removed from the diffs. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 114: > 112: } > 113: > 114: inline void ShenandoahRegionPartition::shrink_interval_if_boundary_modified(ShenandoahFreeSetPartitionId partition, size_t idx) { const both the parameters. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 36: > 34: enum ShenandoahFreeSetPartitionId : uint8_t { > 35: NotFree, // Region has been retired and is not in any free set: there is no available memory. > 36: Mutator, // Region is in the Mutator free set: available memory is available to mutators. Just want to make sure: "available to mutators" -- is this both for object allocation as well as for possible evacuation as part of the mutator LRB? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 37: > 35: NotFree, // Region has been retired and is not in any free set: there is no available memory. > 36: Mutator, // Region is in the Mutator free set: available memory is available to mutators. > 37: Collector, // Region is in the Collector free set: available memory is reserved for evacuations. When mutators evacuate the target of an LRB, do they use `Mutator` or `Collector`. I assume the former? In that case, I'd say for Collector: `available memory is reserved for collector threads for evacuation`. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 44: > 42: > 43: // This class implements partitioning of regions into distinct sets. Each ShenandoahHeapRegion is either in the Mutator free set, > 44: // the Collector free set, or in neither free set (NotFree). I noticed that you use the term "free partition" quite a lot later, I'd just start using that term early on when talking about these sets. You could, for example, say: // Whenever we say "free partition", we mean any partition other than the "NotFree" partition. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 50: > 48: const size_t _max; // The maximum number of heap regions > 49: const size_t _region_size_bytes; > 50: const ShenandoahFreeSet* _free_set; Interesting: why does the partitioning need a reference to its containing free set? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 54: > 52: > 53: // For each type, we track an interval outside of which a region affiliated with that partition is guaranteed > 54: // not to be found. This makes searches for free space more efficient. For each partition p, _leftmosts[p] I am being a bit pedantic here. Partition is usually identified with the _set of equivalence classes_. Thus a partition is an equivalence relation, and each equivalence class in the partition has, in this case, a distinct partition id (i.e. each region is either in the Mutator equivalence class aka Mutator free set, the Collector equivalence class aka Collector free set, or the NotFree equivalence class aka NotFree set). In your terminology, each equivalence class is a "free set". src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 60: > 58: size_t _rightmosts[NumPartitions]; > 59: > 60: // Allocation for humongous objects needs to find regions that are entirely empty. For each partion p, _leftmosts[p] `_leftmosts_empty` and, similarly, `_rightmosts_empty`. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 63: > 61: // represents the first region belonging to this partition that is completely empty and _rightmosts[p] represents the > 62: // last region that is completely empty. If there are no completely empty regions in this partition, this is represented > 63: // by canonical [_max, 0]. ... is no completely empty region in this partition id, ... ... the canonical ... src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 68: > 66: > 67: // For each partition p, _capacity[p] represents the total amount of memory within the partition at the time > 68: // of the most recent rebuild, _used[p] represents the total amount of memory that has been consumed within this instead of consumed, can we just say used (or allocated)? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 74: > 72: // and _used[p], even though the region may have been removed from the free set. > 73: size_t _capacity[NumPartitions]; > 74: size_t _used[NumPartitions]; In light of your earlier documentation of leftmost/righmost/empty/available etc. then, would it be fair to say that the following statement is always true: for p = NotFree: 1. leftmosts[p] = leftmosts_empty[p] = _max 2. rightmosts_empty[p] = rightmosts_empty[p] = 0 3. capacity[p] = used[p] = region_size Are the "NotFree" entries for these arrays ever used? If not, is there any point in keeping them in a product build? Is there any point in keeping them in a non-product build? Does it have some other role that makes it important to keep it, anyway? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 75: > 73: size_t _capacity[NumPartitions]; > 74: size_t _used[NumPartitions]; > 75: size_t _region_counts[NumPartitions]; If tracked, is this an invariant of these fields? - region_counts[NotFree] == _max - (region_counts[Mutator] + region_counts[Collector]) (This would also make the region_counts[NotFree] unnecessary? See my previous comment.) src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 95: > 93: void make_free(size_t idx, ShenandoahFreeSetPartitionId which_partition, size_t region_capacity); > 94: > 95: // Place region idx into free partition new_partition. Requires that idx is currently not NotFree. Include semantics of region_capacity in comment, e.g.: // Move region idx, with region_capacity bytes of available free space, // from the NotFree partition to the free partition new_partition. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 99: > 97: > 98: // Returns the ShenandoahFreeSetPartitionId affiliation of region idx, NotFree if this region is not currently free. > 99: // This does not enforce that free_set membership implies allocation capacity. I think "NotFree if this region is not currently free" is unnecessary and frankly confusing (why are we mentioning membership in the NotFree partition specially?) I also do not understand (am confused by) the second sentence in the comment. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 103: > 101: > 102: // Returns true iff region idx is in the test_set free_set. Before returning true, asserts that the free > 103: // set is not empty. Requires that test_set != NotFree or NumPartitions. This comment probably needs to be updated. Something simple like: // Is the region, idx, part of which_partition? As it stands, the comment is pretty confusing. In general concise statements of specification are best for documenting APIs. If the presence of APIs needs to be motivated, that should all be done early on in a block comment that motivates the class and why it is what it is. It makes for much terser, clearer, and maintainable documentation. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 112: > 110: // In other words: > 111: // if the requested which_partition is empty: > 112: // leftmost() and leftmost_empty() return _max, rightmost() and rightmost_empty() return 0 There are mutually contradictory statements in the highlighted portion of the documentation above. I suspect the earlier reference to -1 is obsolete and needs to be deleted. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 132: > 130: assert (which_partition > NotFree && which_partition < NumPartitions, "selected free set must be valid"); > 131: return _used[which_partition]; > 132: } The assertions here indicate to me that it is likely my earlier suspicion that many of these fields are not needed for NotFree is true. (See my earlier comment about many of these fields for NotFree.) I feel it may be best to let the enum type system enforce this, rather than use these assertions. NotFree then becomes a sentinel value that is not part of the legal index set that can be passed in here. May be that can be done later, as I realize the type contagion might necessitate more changes (although I think it will conceptually simplify this) by maintaining the disctinction between the tags in the _membership[] array, and the types used for the so-called free partitions. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 142: > 140: assert (which_partition > NotFree && which_partition < NumPartitions, "selected free set must be valid"); > 141: _used[which_partition] = value; > 142: } Same for these assertions. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 171: > 169: }; > 170: > 171: class ShenandoahFreeSet : public CHeapObj { It would be good to have a block comment here motivating this class. It seems (from looking at some of its public APIs) as if it publicly exports only the "mutator view", which I find interesting. The other partitions in `ShenandoahRegionPartition` appears to be for efficiency of the implementation in service of the public APIs for ShenandoahFreeSet. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 174: > 172: private: > 173: ShenandoahHeap* const _heap; > 174: ShenandoahRegionPartition _partitions; I think the use of a plural for the field illustrates the English language interpretation of partition. To be consistent, I'd rename the class name also to the plural `ShenandoahRegionPartitions` as remarked earlier. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 184: > 182: HeapWord* allocate_single(ShenandoahAllocRequest& req, bool& in_new_region); > 183: > 184: // While holding the heap lock, allocate memory for a humongous object which will span multiple contiguous heap `which will` or `which may`? (Is a humongous object allowed to span just a single region as well?) Or are objects humongous only if they won't fit in a region? In which case the "will" is correct. I was confused by tests that use `ShenandoahHumongousThreshold=50` , `=90`, etc. May be in those cases, we go through the `allocate_single()` despite allocating an object (or block) bigger than `ShenandoahHeapRegion::humongous_threshold_words()` ? (That would make the pre-condition of the previous method suspect, though.) src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 187: > 185: // regions. > 186: // > 187: // Precondition: req.size() > ShenandoahHeapRegion::humongous_threshold_words(). `>` or `>=` ? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 221: > 219: // > 220: // Note that we plan to replenish the Collector reserve at the end of update refs, at which time all > 221: // of the regions recycled from the collection set will be available. I see that you are trying to motivate this API. I feel that these comments belong in the caller. The API should not need to motivate where the caller must call this from. The API came about because there was a need for this in its clients. A good API spec should state its actions. Motivating its uses drags in context that detracts from clarity of the class and method. I realize this is a somewhat subjective stance but from experience it makes for better documentation and more maintainable/readable code. The place for such documentation is usually in a block comment motivating the general design of the class and why it offers the APIs that it does, and who its clients are. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 231: > 229: inline size_t available() const { > 230: assert(used() <= capacity(), "must use less than capacity"); > 231: return capacity() - used(); So `ShenandoahFreeSet` publicly exports only the mutator view? src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1069: > 1067: heap->collection_set()->clear(); > 1068: > 1069: // Since Full GC directly manipulates top of certain regions, certain ShenandoahFreeSet abstractions may have been corrupted. Instead of "may have been corrupted", which can be alarming and confusing, I'd state this as: // Full GC doesn't use or maintain the ShenandoahFreeSet abstractions, // so we rebuild the free set from scratch following a Full GC. ------------- PR Review: https://git.openjdk.org/jdk/pull/17561#pullrequestreview-1855229569 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473702305 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473831169 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473809906 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473809223 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473835542 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473810650 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473709592 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473710297 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473712934 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473721984 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473826640 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473828523 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473842682 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473845614 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473848754 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473851167 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473859958 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473860619 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473739467 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473725085 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473726454 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473728080 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473883437 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473738298 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473899033 From ysr at openjdk.org Thu Feb 1 07:21:11 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 1 Feb 2024 07:21:11 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 02:04:43 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 54: > >> 52: >> 53: // For each type, we track an interval outside of which a region affiliated with that partition is guaranteed >> 54: // not to be found. This makes searches for free space more efficient. For each partition p, _leftmosts[p] > > I am being a bit pedantic here. > Partition is usually identified with the _set of equivalence classes_. Thus a partition is an equivalence relation, and each equivalence class in the partition has, in this case, a distinct partition id (i.e. each region is either in the Mutator equivalence class aka Mutator free set, the Collector equivalence class aka Collector free set, or the NotFree equivalence class aka NotFree set). In your terminology, each equivalence class is a "free set". However, upon reading further, I see that you have used "partition" not in the mathematical sense of an equivalence relation on a set, but in the English language sense as a subset of a set. In that case, you can continue to use the terminology you are using, but I'd change the class `ShenandoahRegionPartition` to the plural `ShenandoahRegionPartitions`, since you think of it as the combination of 3 partitions (in the English language sense): a Mutator partition, a Collector partition, and a NotFree partition. Or you could call it `ShenandoahRegionPartitioning`. Indeed, in your comment above, you say "This class represents a partitioning of ...". > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 184: > >> 182: HeapWord* allocate_single(ShenandoahAllocRequest& req, bool& in_new_region); >> 183: >> 184: // While holding the heap lock, allocate memory for a humongous object which will span multiple contiguous heap > > `which will` or `which may`? (Is a humongous object allowed to span just a single region as well?) > > Or are objects humongous only if they won't fit in a region? In which case the "will" is correct. > > I was confused by tests that use `ShenandoahHumongousThreshold=50` , `=90`, etc. > > May be in those cases, we go through the `allocate_single()` despite allocating an object (or block) bigger than `ShenandoahHeapRegion::humongous_threshold_words()` ? (That would make the pre-condition of the previous method suspect, though.) Same remark applies to the precondition comment below (which is correct, but could be made stronger to say `req.size() > ShenandoahHeapRegion::RegionSizeWords` or such? > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 221: > >> 219: // >> 220: // Note that we plan to replenish the Collector reserve at the end of update refs, at which time all >> 221: // of the regions recycled from the collection set will be available. > > I see that you are trying to motivate this API. I feel that these comments belong in the caller. The API should not need to motivate where the caller must call this from. The API came about because there was a need for this in its clients. A good API spec should state its actions. > > Motivating its uses drags in context that detracts from clarity of the class and method. > > I realize this is a somewhat subjective stance but from experience it makes for better documentation and more maintainable/readable code. > > The place for such documentation is usually in a block comment motivating the general design of the class and why it offers the APIs that it does, and who its clients are. So the documentation here might be: // Move cset_regions number of regions from being available to the collector to // being available to the mutator. // // Typical usage is at the end of evacuation, when the collector no longer needs // the regions that were reserved for evacuation, and these can now be // made available for mutator allocation. BTW, why call the number of regions `cset_regions`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473717128 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473736515 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473892964 From ysr at openjdk.org Thu Feb 1 07:21:11 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 1 Feb 2024 07:21:11 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 07:10:17 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 221: >> >>> 219: // >>> 220: // Note that we plan to replenish the Collector reserve at the end of update refs, at which time all >>> 221: // of the regions recycled from the collection set will be available. >> >> I see that you are trying to motivate this API. I feel that these comments belong in the caller. The API should not need to motivate where the caller must call this from. The API came about because there was a need for this in its clients. A good API spec should state its actions. >> >> Motivating its uses drags in context that detracts from clarity of the class and method. >> >> I realize this is a somewhat subjective stance but from experience it makes for better documentation and more maintainable/readable code. >> >> The place for such documentation is usually in a block comment motivating the general design of the class and why it offers the APIs that it does, and who its clients are. > > So the documentation here might be: > > // Move cset_regions number of regions from being available to the collector to > // being available to the mutator. > // > // Typical usage is at the end of evacuation, when the collector no longer needs > // the regions that were reserved for evacuation, and these can now be > // made available for mutator allocation. > > BTW, why call the number of regions `cset_regions`? Also, the concept of partition is itself an internal implementation detail that you have carefully encapsulated in this class. There is no point in leaking that out in the naming of the method. The method can just be called `move_regions_from_collector_to_mutator(size_t num)` and be done? "Partition" here adds no value and can be confusing leakage of abstraction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473896202 From ysr at openjdk.org Thu Feb 1 07:30:02 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 1 Feb 2024 07:30:02 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 06:06:06 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 44: > >> 42: >> 43: // This class implements partitioning of regions into distinct sets. Each ShenandoahHeapRegion is either in the Mutator free set, >> 44: // the Collector free set, or in neither free set (NotFree). > > I noticed that you use the term "free partition" quite a lot later, I'd just start using that term early on when talking about these sets. You could, for example, say: > > // Whenever we say "free partition", we mean any partition other than the "NotFree" partition. Or: // Any partition that is not the "NotFree" partition is a "free partition". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473912384 From ysr at openjdk.org Thu Feb 1 08:11:02 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 1 Feb 2024 08:11:02 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: <4EwZEdlRxSvpaYstvt3imSXBD9lqRDjBxsNw0IhIcVk=.3e826fd9-36fd-4c29-b2a5-663ae40a21c9@github.com> On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment Changes look great; are there any performance numbers to share for the change? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17638#pullrequestreview-1855782924 From shade at openjdk.org Thu Feb 1 11:02:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 1 Feb 2024 11:02:05 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment Well, degenerated cycle can clean up enough space for humongous allocation to succeed? Seems weird to go straight full GC without trying the degen GC first. Is there a substantial performance benefit for doing this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1921059828 From wkemper at openjdk.org Thu Feb 1 14:23:00 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 1 Feb 2024 14:23:00 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master Message-ID: Merges tag jdk-21.0.3+1 ------------- Commit messages: - 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test - 8320943: Files/probeContentType/Basic.java fails on latest Windows 11 - content type mismatch - 8313507: Remove pkcs11/Cipher/TestKATForGCM.java from ProblemList - 8315600: Open source few more headless Swing misc tests - 8274122: java/io/File/createTempFile/SpecialTempFile.java fails in Windows 11 - 8324280: RISC-V: Incorrect implementation in VM_Version::parse_satp_mode - 8324659: GHA: Generic jtreg errors are not reported - 8315761: Open source few swing JList and JMenuBar tests - 8322142: JFR: Periodic tasks aren't orphaned between recordings - 8321480: ISO 4217 Amendment 176 Update - ... and 189 more: https://git.openjdk.org/shenandoah-jdk21u/compare/375769c6...2518d203 The webrev contains the conflicts with master: - merge conflicts: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=19&range=00.conflicts Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/19/files Stats: 30131 lines in 1353 files changed: 15628 ins; 5802 del; 8701 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/19.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/19/head:pull/19 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/19 From mdoerr at openjdk.org Fri Feb 2 03:55:04 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 2 Feb 2024 03:55:04 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes Thanks for the improvements! Tests are still passing on SAP supported platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1922760746 From wkemper at openjdk.org Fri Feb 2 14:15:42 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 2 Feb 2024 14:15:42 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Merges tag jdk-23+8 ------------- Commit messages: - 8324174: assert(m->is_entered(current)) failed: invariant - 8325042: remove unused JVMDITools test files - 8323621: JDK build should exclude snippet class in java.lang.foreign - 8324238: [macOS] java/awt/Frame/ShapeNotSetSometimes/ShapeNotSetSometimes.java fails with the shape has not been applied msg - 8320342: Use PassFailJFrame for TruncatedPopupMenuTest.java - 8324981: Shenandoah: Move commit and soft max heap changed methods into heap - 8303374: Implement JEP 455: Primitive Types in Patterns, instanceof, and switch (Preview) - 8320712: Rewrite BadFactoryTest in pure Java - 8324771: Obsolete RAMFraction related flags - 8324970: Serial: Refactor signature of maintain_old_to_young_invariant - ... and 61 more: https://git.openjdk.org/shenandoah/compare/6d36eb78...5b9b176c The webrev contains the conflicts with master: - merge conflicts: https://webrevs.openjdk.org/?repo=shenandoah&pr=389&range=00.conflicts Changes: https://git.openjdk.org/shenandoah/pull/389/files Stats: 18822 lines in 1229 files changed: 7186 ins; 1716 del; 9920 mod Patch: https://git.openjdk.org/shenandoah/pull/389.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/389/head:pull/389 PR: https://git.openjdk.org/shenandoah/pull/389 From eosterlund at openjdk.org Fri Feb 2 15:37:05 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 2 Feb 2024 15:37:05 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Fri, 2 Feb 2024 03:52:04 GMT, Martin Doerr wrote: > Thanks for the improvements! Tests are still passing on SAP supported platforms. Thank you for running through your tests! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1924112606 From kdnilsen at openjdk.org Fri Feb 2 23:43:05 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 2 Feb 2024 23:43:05 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 02:19:17 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 54: >> >>> 52: >>> 53: // For each type, we track an interval outside of which a region affiliated with that partition is guaranteed >>> 54: // not to be found. This makes searches for free space more efficient. For each partition p, _leftmosts[p] >> >> I am being a bit pedantic here. >> Partition is usually identified with the _set of equivalence classes_. Thus a partition is an equivalence relation, and each equivalence class in the partition has, in this case, a distinct partition id (i.e. each region is either in the Mutator equivalence class aka Mutator free set, the Collector equivalence class aka Collector free set, or the NotFree equivalence class aka NotFree set). In your terminology, each equivalence class is a "free set". > > However, upon reading further, I see that you have used "partition" not in the mathematical sense of an equivalence relation on a set, but in the English language sense as a subset of a set. In that case, you can continue to use the terminology you are using, but I'd change the class `ShenandoahRegionPartition` to the plural `ShenandoahRegionPartitions`, since you think of it as the combination of 3 partitions (in the English language sense): a Mutator partition, a Collector partition, and a NotFree partition. Or you could call it `ShenandoahRegionPartitioning`. Indeed, in your comment above, you say "This class represents a partitioning of ...". I'll go with ShenandoahRegionPartitions. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476874936 From kdnilsen at openjdk.org Fri Feb 2 23:43:04 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 2 Feb 2024 23:43:04 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 01:49:38 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1001: > >> 999: >> 1000: if (VerifyAfterGC) { >> 1001: Universe::verify(); > > This line deletion seems to be the only change now in this file. So this file can be removed from the diffs. Thanks. Adding this line back in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476874711 From kdnilsen at openjdk.org Sat Feb 3 00:01:04 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 00:01:04 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 02:06:12 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 60: > >> 58: size_t _rightmosts[NumPartitions]; >> 59: >> 60: // Allocation for humongous objects needs to find regions that are entirely empty. For each partion p, _leftmosts[p] > > `_leftmosts_empty` and, similarly, `_rightmosts_empty`. Oops. Thanks. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 63: > >> 61: // represents the first region belonging to this partition that is completely empty and _rightmosts[p] represents the >> 62: // last region that is completely empty. If there are no completely empty regions in this partition, this is represented >> 63: // by canonical [_max, 0]. > > ... is no completely empty region in this partition id, ... > > > > ... the canonical ... Thanks. fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476881647 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476881947 From kdnilsen at openjdk.org Sat Feb 3 00:08:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 00:08:03 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 02:28:32 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 68: > >> 66: >> 67: // For each partition p, _capacity[p] represents the total amount of memory within the partition at the time >> 68: // of the most recent rebuild, _used[p] represents the total amount of memory that has been consumed within this > > instead of consumed, can we just say used (or allocated)? Replaced. Thanks. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 112: > >> 110: // In other words: >> 111: // if the requested which_partition is empty: >> 112: // leftmost() and leftmost_empty() return _max, rightmost() and rightmost_empty() return 0 > > There are mutually contradictory statements in the highlighted portion of the documentation above. I suspect the earlier reference to -1 is obsolete and needs to be deleted. Good catch. Thank you. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476883719 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476884451 From kdnilsen at openjdk.org Sat Feb 3 03:02:06 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 03:02:06 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: <8kwTr_bw237-Z58WNoxRWqfVzQlcHqssT_2Lp5Rwi6c=.e8ea536e-3e0a-4e54-a0df-e679d68ae696@github.com> On Thu, 1 Feb 2024 06:37:54 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 132: > >> 130: assert (which_partition > NotFree && which_partition < NumPartitions, "selected free set must be valid"); >> 131: return _used[which_partition]; >> 132: } > > The assertions here indicate to me that it is likely my earlier suspicion that many of these fields are not needed for NotFree is true. (See my earlier comment about many of these fields for NotFree.) I feel it may be best to let the enum type system enforce this, rather than use these assertions. NotFree then becomes a sentinel value that is not part of the legal index set that can be passed in here. > > May be that can be done later, as I realize the type contagion might necessitate more changes (although I think it will conceptually simplify this) by maintaining the disctinction between the tags in the _membership[] array, and the types used for the so-called free partitions. Thanks for sorting through this. You are right. I do not maintain used, capacity for NotFree regions. I've made adjustments to the enum declaration and to the assertions to make this more clear. My efforts to do so may have increased the "dissonance" with mathematical definition of partition. Please let me know if you see a better way to approach this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476926935 From duke at openjdk.org Sat Feb 3 07:59:12 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 3 Feb 2024 07:59:12 GMT Subject: RFR: 8325081: Move '_soft_ref_policy' to 'CollectedHeap' Message-ID: trivial ------------- Commit messages: - move '_soft_ref_policy' to 'CollectedHeap' Changes: https://git.openjdk.org/jdk/pull/17693/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17693&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325081 Stats: 49 lines in 13 files changed: 3 ins; 44 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17693/head:pull/17693 PR: https://git.openjdk.org/jdk/pull/17693 From ysr at openjdk.org Sat Feb 3 08:55:07 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 3 Feb 2024 08:55:07 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v4] In-Reply-To: References: Message-ID: <95jH1WMe6Vm3bgOl_bPPsOQLdwLml-YV6aT1Z0lktmw=.e52e5398-23a1-40dd-baf2-fa844bfa0244@github.com> On Wed, 31 Jan 2024 00:45:20 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 401: >> >>> 399: >>> 400: HeapWord* ShenandoahFreeSet::allocate_single(ShenandoahAllocRequest& req, bool& in_new_region) { >>> 401: shenandoah_assert_heaplocked(); >> >> In addition, another precondition for this method appears to be that req.size() <= humongous size threshold. Perhaps that check should also be disposed of here. (Based on the documentation at the previous review comment above.) > > Added similar documentation here. Thanks. I meant something like: assert(req.size() <= ShenandoahHeapRegion::humongous_threshold_words(), "Can't exceed humongous size"); Unless the precondition is that it shouldn't exceed a region's worth in size, in which case: `<= ShenandoahHeapRegion::region_size_words().` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476987025 From ysr at openjdk.org Sat Feb 3 08:55:06 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 3 Feb 2024 08:55:06 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 16:28:15 GMT, Kelvin Nilsen wrote: >> Several objectives: >> 1. Reduce humongous allocation failures by segregating regular regions from humongous regions >> 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB >> 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations >> 4. Treat collector reserves as available for Mutator allocations after evacuation completes >> 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah >> >> On internal performance pipelines, this change shows: >> >> 1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions. >> 2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%). This system CPU time increase appears to correlate with increased page faults and/or rss. >> 3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood) >> 4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs) >> >> Shenandoah >> ------------------------------------------------------------------------------------------------------- >> +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000 >> Control: 819938.875 (+/-5724.56 ) 40 >> Test: 2185552.625 (+/-26378.64 ) 20 >> >> +166.16% scimark.sparse.large/rss_max p=0.00000 >> Control: 3285226.375 (+/-22812.93 ) 40 >> Test: 8743881.500 (+/-104906.69 ) 20 >> >> +164.78% sunflow/cpu_system p=0.00000 >> Control: 1.280s (+/- 0.10s ) 40 >> Test: 3.390s (+/- 0.13s ) 20 >> >> +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000 >> Control: 3.259 (+/- 1.46 ) 33 >> Test: 8.125 (+/- 2.05 ) 20 >> >> +143.75% pmd/major_page_fault_count p=0.03622 >> Control: 1.000 (+/- 0.00 ) 40 >> Test: 2.438 (+/- 2.59 ) 20 >> >> +80.22% lusearch/minor_page_fault_count p=0.00000 >> Control: 2043930.938 (+/-4777.14 ) 40 >> Test: 3683477.625 (+/-5650.29 ) 20 >> >> +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000 >> Control: 697899.156 (+/-3457.82 ) 40 >> Test: 1050363.812 (+/-175... > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Rename and comments for _capacity_of and _used_by I believe I am done with the last round and have for the most part completed most of the feedback. Sorry for the delay, and thanks for your patience! Can take another look once the feedback has been addressed. Promise to be more prompt on the re-review since I have been over this once already so it should be relatively quick. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 155: > 153: // Remove this region from its free partition, but leave its capacity and used as part of the original free partition's totals. > 154: // When retiring a region, add any remnant of available memory within the region to the used total for the original free partition. > 155: void ShenandoahRegionPartition::retire_within_partition(size_t idx, size_t used_bytes) { Why is the method called `retire_within_partition()` instead of `retire_from_partition()` ? (i.e. why _within_ partition, since it's leaving its free partition?) src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 266: > 264: size_t ShenandoahRegionPartition::leftmost_empty(ShenandoahFreeSetPartitionId which_partition) { > 265: assert (which_partition > NotFree && which_partition < NumPartitions, "selected free partition must be valid"); > 266: for (size_t idx = _leftmosts_empty[which_partition]; idx < _max; idx++) { See next comment. Would it simplify matters to index all these loops with `ssize_t`. I am thinking it should work more cleanly than these multiple variations to deal with underflow in a few places(see comment below). src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 279: > 277: inline size_t ShenandoahRegionPartition::rightmost_empty(ShenandoahFreeSetPartitionId which_partition) { > 278: assert (which_partition > NotFree && which_partition < NumPartitions, "selected free partition must be valid"); > 279: for (intptr_t idx = _rightmosts_empty[which_partition]; idx >= 0; idx--) { `ssize_t` instead of `intptr_t`. I'd change the signatures of all these to use `ssize_t` just to avoid these somewhat awkward constructs. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 287: > 285: _leftmosts_empty[which_partition] = _max; > 286: _rightmosts_empty[which_partition] = 0; > 287: return 0; To my earlier comment of using `ssize_t`, that would allow us to signal failure here by returning a -1. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 441: > 439: // size_t is unsigned, need to dodge underflow when _leftmost = 0 > 440: // Fast-path: try to allocate in the collector view first > 441: for (size_t c = _partitions.rightmost(Collector) + 1; c > _partitions.leftmost(Collector); c--) { Use `ssize_t` for c. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 442: > 440: // Fast-path: try to allocate in the collector view first > 441: for (size_t c = _partitions.rightmost(Collector) + 1; c > _partitions.leftmost(Collector); c--) { > 442: size_t idx = c - 1; Here and further below (lines 457-458), you start at rightmost + 1, check if it's greater than leftmost, then reduce the index by 1 (since you had started to the right of the rightmost) to check. Couldn't you simply start at rightmost, check if it was >= leftmost, and use it? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 457: > 455: > 456: // Try to steal an empty region from the mutator view. > 457: for (size_t c = _partitions.rightmost_empty(Mutator) + 1; c > _partitions.leftmost_empty(Mutator); c--) { `ssize_t` to keep all these loops uniform. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 467: > 465: log_debug(gc, free)("Flipped region " SIZE_FORMAT " to gc for request: " PTR_FORMAT, idx, p2i(&req)); > 466: return result; > 467: } It seems like this can cause potentially many (because of the loop) Mutator regions to flip to Collector (can we call the method `flip_to_collector`?) sometimes even when the request won't be satisfied. Why not flip to Collector only _after_ the allocation is successful? I assume the attempt to allocate would run afoul of assertion checks if it happened before the flip, but I worry about flipping a bunch of stuff unnecessarily and failing to allocate in them after all. Is that futile flipping cause for concern? Can it be avoided (e.g. by repositioning the assertion checks using a proxy variable to signal intent to flip following a successful allocation, then using it to ensure the post-allocation flip, or something similar)? May be such futile flips are uncommon and not a cause for concern? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 550: > 548: // allocate within. This was observed to result in large amounts of available memory being ignored > 549: // following a failed shared allocation request. TLAB requests will generally downsize to absorb all > 550: // memory available within the region even if this is less than the desired size. I don't understand this comment, since you are it seems to me retiring the region below at line 553. (Also see comment elsewhere on calling the method `retire_within_partition`, instead of the more natural (to me) `retire_from_partition`. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 683: > 681: // move some of the mutator regions into the collector partition with the intent of packing collector memory into the > 682: // highest (rightmost) addresses of the heap, with mutator memory consuming the lowest addresses of the heap. > 683: void ShenandoahFreeSet::find_regions_with_alloc_capacity(size_t &cset_regions) { This method seems to belong to a public API of `ShenandoahRegionPartitions`. See also comment at the call site of this method. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 715: > 713: } > 714: > 715: // Move no more than max_xfer_regions from the existing Collector free partitions to the Mutator free partition. I'd avoid the somewhat redundant "Mutator free partition" or "Collector free partition", but merely say "Mutator partition" and "Collector partition". I'd reserve the term "free partition" for a partition that is not a NotFree partition. This allows terseness and precision at the same time. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 761: > 759: void ShenandoahFreeSet::prepare_to_rebuild(size_t &cset_regions) { > 760: shenandoah_assert_heaplocked(); > 761: // This resets all state information, removing all regions from all partitions. I thought it makes them all unavailable, placing them all into the NotFree partition. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 766: > 764: > 765: // This places regions that have alloc_capacity into the mutator partition. > 766: find_regions_with_alloc_capacity(cset_regions); In conjunction with `clear()` above, it looks like we are doing two walks of the _membership array in the implementation as a result of this. Why not just have a single API from `ShenandoahRegionPartitions` that walks over the regions and sorts them into the NotFree or the Mutator partition in one go, rather than one walk to clear and another to then move some into Mutator? Also the method should probably be renamed to `move_alloc_regions_to_mutator()`, which should be moved into `ShenandoahRegionPartitions` class as a public API for this class `ShenandoahFreeSet` to call. ------------- Changes requested by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17561#pullrequestreview-1860712682 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477019821 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477011350 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477011145 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477012364 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477016777 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477008101 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477016903 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477010479 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477020531 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476915853 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476917127 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476914673 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1476916083 From ysr at openjdk.org Sat Feb 3 08:55:07 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 3 Feb 2024 08:55:07 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 07:51:14 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 467: > >> 465: log_debug(gc, free)("Flipped region " SIZE_FORMAT " to gc for request: " PTR_FORMAT, idx, p2i(&req)); >> 466: return result; >> 467: } > > It seems like this can cause potentially many (because of the loop) Mutator regions to flip to Collector (can we call the method `flip_to_collector`?) sometimes even when the request won't be satisfied. Why not flip to Collector only _after_ the allocation is successful? I assume the attempt to allocate would run afoul of assertion checks if it happened before the flip, but I worry about flipping a bunch of stuff unnecessarily and failing to allocate in them after all. Is that futile flipping cause for concern? Can it be avoided (e.g. by repositioning the assertion checks using a proxy variable to signal intent to flip following a successful allocation, then using it to ensure the post-allocation flip, or something similar)? May be such futile flips are uncommon and not a cause for concern? On further thought, when is it that the allocation may fail (lines 463,464)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477017088 From kdnilsen at openjdk.org Sat Feb 3 14:04:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 14:04:03 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v4] In-Reply-To: <7zz-C2deQM2LpSf1Jo5E4C5WYbCyUuNO7CQPMX6Q47s=.0d200023-1b71-462e-bcee-01bcf41b5702@github.com> References: <7zz-C2deQM2LpSf1Jo5E4C5WYbCyUuNO7CQPMX6Q47s=.0d200023-1b71-462e-bcee-01bcf41b5702@github.com> Message-ID: <8ILT34a4sce3LK0nqtsOPVa80WQ5xXr0Zrat2_TUlVM=.a0a2aa14-bae1-4d40-85bb-a3f18e6d637b@github.com> On Wed, 31 Jan 2024 17:33:17 GMT, Y. Srinivas Ramakrishna wrote: >> The problem happens when leftmost is zero. If we decrement idx beyond zero, we would get MAXINT rather than -1, so the test that idx >= leftmost is always true and loop never terminates... I think I discovered this the hard way... :( > > `ssize_t` is signed, unlike `size_t` which is unsigned. Thanks for clarifying. I did know ssize_t. I'm adjusting the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477069379 From kdnilsen at openjdk.org Sat Feb 3 14:07:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 14:07:03 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 08:47:48 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 155: > >> 153: // Remove this region from its free partition, but leave its capacity and used as part of the original free partition's totals. >> 154: // When retiring a region, add any remnant of available memory within the region to the used total for the original free partition. >> 155: void ShenandoahRegionPartition::retire_within_partition(size_t idx, size_t used_bytes) { > > Why is the method called `retire_within_partition()` instead of `retire_from_partition()` ? > > (i.e. why _within_ partition, since it's leaving its free partition?) It's a little subtle. Maybe it needs more documentation. We retire the region so it no longer is within the range searched when new allocations are made. However, its totals (capacity and used) are still counted toward the Mutator or Collector partition's total. We've probably created some of our own pain by calling this a partition instead of a "free set". Advice? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477070065 From kdnilsen at openjdk.org Sat Feb 3 14:24:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 14:24:03 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 07:12:58 GMT, Y. Srinivas Ramakrishna wrote: >> So the documentation here might be: >> >> // Move cset_regions number of regions from being available to the collector to >> // being available to the mutator. >> // >> // Typical usage is at the end of evacuation, when the collector no longer needs >> // the regions that were reserved for evacuation, and these can now be >> // made available for mutator allocation. >> >> BTW, why call the number of regions `cset_regions`? > > Also, the concept of partition is itself an internal implementation detail that you have carefully encapsulated in this class. There is no point in leaking that out in the naming of the method. > > The method can just be called `move_regions_from_collector_to_mutator(size_t num)` and be done? "Partition" here adds no value and can be confusing leakage of abstraction. Agree. Thanks for these improvements. Done. (also, have enhanced comment to clarify the intent of cset_regions argument, which represents the number of regions in the collection set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477072830 From kdnilsen at openjdk.org Sat Feb 3 14:29:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 14:29:03 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 07:15:51 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1069: > >> 1067: heap->collection_set()->clear(); >> 1068: >> 1069: // Since Full GC directly manipulates top of certain regions, certain ShenandoahFreeSet abstractions may have been corrupted. > > Instead of "may have been corrupted", which can be alarming and confusing, I'd state this as: > > // Full GC doesn't use or maintain the ShenandoahFreeSet abstractions, > // so we rebuild the free set from scratch following a Full GC. I'm just going to remove that comment. It raises concerns when none is necessary. "Obviously", we need to rebuild the free set following a full gc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477073747 From kdnilsen at openjdk.org Sat Feb 3 14:33:06 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 14:33:06 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 14:04:53 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 155: >> >>> 153: // Remove this region from its free partition, but leave its capacity and used as part of the original free partition's totals. >>> 154: // When retiring a region, add any remnant of available memory within the region to the used total for the original free partition. >>> 155: void ShenandoahRegionPartition::retire_within_partition(size_t idx, size_t used_bytes) { >> >> Why is the method called `retire_within_partition()` instead of `retire_from_partition()` ? >> >> (i.e. why _within_ partition, since it's leaving its free partition?) > > It's a little subtle. Maybe it needs more documentation. We retire the region so it no longer is within the range searched when new allocations are made. However, its totals (capacity and used) are still counted toward the Mutator or Collector partition's total. > > We've probably created some of our own pain by calling this a partition instead of a "free set". Advice? I'll change the name, as you suggest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477074268 From kbarrett at openjdk.org Sat Feb 3 14:35:00 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 3 Feb 2024 14:35:00 GMT Subject: RFR: 8325081: Move '_soft_ref_policy' to 'CollectedHeap' In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 07:54:54 GMT, Lei Zaakjyu wrote: > trivial In the constructor for CollectedHeap, I'd like the newly added `_soft_ref_policy` to be explicitly initialized, rather than relying on implicit member initialization. That is, add a mem-initializer to the mem-initializer-list of that constructor. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17693#pullrequestreview-1861039681 From kdnilsen at openjdk.org Sat Feb 3 14:41:04 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 14:41:04 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 01:44:29 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 766: > >> 764: >> 765: // This places regions that have alloc_capacity into the mutator partition. >> 766: find_regions_with_alloc_capacity(cset_regions); > > In conjunction with `clear()` above, it looks like we are doing two walks of the _membership array in the implementation as a result of this. Why not just have a single API from `ShenandoahRegionPartitions` that walks over the regions and sorts them into the NotFree or the Mutator partition in one go, rather than one walk to clear and another to then move some into Mutator? > > Also the method should probably be renamed to `move_alloc_regions_to_mutator()`, which should be moved into `ShenandoahRegionPartitions` class as a public API for this class `ShenandoahFreeSet` to call. We walk twice, first to figure out how much memory is available, and how many regions are completely empty. This information eventually feeds into GenShen's transfer of regions between young-gen and old-gen. There is less motivation for that distinction in single-generation Shenandoah because we do not need to make these informed transfers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477075161 From kdnilsen at openjdk.org Sat Feb 3 15:49:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 15:49:03 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 01:36:35 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 761: > >> 759: void ShenandoahFreeSet::prepare_to_rebuild(size_t &cset_regions) { >> 760: shenandoah_assert_heaplocked(); >> 761: // This resets all state information, removing all regions from all partitions. > > I thought it makes them all unavailable, placing them all into the NotFree partition. The wording has been a bit imprecise, possibly made even worse by some global search and replaces on free-set and partition. I've tried to clarify in most recent draft that we consider Collector and Mutator to be partitions, but the NotFree labels means "not in a partition". Maybe you can help me find the right wording here... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477084264 From kdnilsen at openjdk.org Sat Feb 3 15:49:04 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 15:49:04 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 14:38:44 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 766: >> >>> 764: >>> 765: // This places regions that have alloc_capacity into the mutator partition. >>> 766: find_regions_with_alloc_capacity(cset_regions); >> >> In conjunction with `clear()` above, it looks like we are doing two walks of the _membership array in the implementation as a result of this. Why not just have a single API from `ShenandoahRegionPartitions` that walks over the regions and sorts them into the NotFree or the Mutator partition in one go, rather than one walk to clear and another to then move some into Mutator? >> >> Also the method should probably be renamed to `move_alloc_regions_to_mutator()`, which should be moved into `ShenandoahRegionPartitions` class as a public API for this class `ShenandoahFreeSet` to call. > > We walk twice, first to figure out how much memory is available, and how many regions are completely empty. This information eventually feeds into GenShen's transfer of regions between young-gen and old-gen. There is less motivation for that distinction in single-generation Shenandoah because we do not need to make these informed transfers. But even before genshen changes, there were two walks through the regions. This is because the rebuild wants to "optimize" the organization of the mutator free set and the collector free set. Certain regions that may been in the mutator set during previous GC will be in the collector set during the next gc, and vice versa. We strive to arrange that each free set is "tightly packed" over a subrange of the regions, with collector free set at the high end of memory and mutator set at the lower end of memory. With GenShen integration, we will place the old collector set above the collector set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477083826 From kdnilsen at openjdk.org Sat Feb 3 15:53:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 15:53:03 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 01:42:44 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 683: > >> 681: // move some of the mutator regions into the collector partition with the intent of packing collector memory into the >> 682: // highest (rightmost) addresses of the heap, with mutator memory consuming the lowest addresses of the heap. >> 683: void ShenandoahFreeSet::find_regions_with_alloc_capacity(size_t &cset_regions) { > > This method seems to belong to a public API of `ShenandoahRegionPartitions`. See also comment at the call site of this method. There is a public API for prepare_to_rebuild() followed by finish_rebuild(). This public API is exercised by GenShen, which adjusts the sizes of old-gen and young-gen between the two calls. Single-gen shenandoah does not distinguish between these two steps, because it has no notion of adjusting generation sizes. Single-gen shenandoah invokes the public api rebuild(), which simply delegates to these two functions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477085137 From kdnilsen at openjdk.org Sat Feb 3 16:10:04 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 16:10:04 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: <5hm7G2JJ9Kmcc7DHdRPNqeZKJcgQXkf-N1kK1VmDAyI=.7141723b-23bf-4705-9fa9-aeb77e38602f@github.com> On Sat, 3 Feb 2024 15:46:04 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 761: >> >>> 759: void ShenandoahFreeSet::prepare_to_rebuild(size_t &cset_regions) { >>> 760: shenandoah_assert_heaplocked(); >>> 761: // This resets all state information, removing all regions from all partitions. >> >> I thought it makes them all unavailable, placing them all into the NotFree partition. > > The wording has been a bit imprecise, possibly made even worse by some global search and replaces on free-set and partition. I've tried to clarify in most recent draft that we consider Collector and Mutator to be partitions, but the NotFree labels means "not in a partition". Maybe you can help me find the right wording here... I'm adjusting this comment in hopes of making the intent more clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477087368 From duke at openjdk.org Sat Feb 3 16:13:24 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 3 Feb 2024 16:13:24 GMT Subject: RFR: 8325081: Move '_soft_ref_policy' to 'CollectedHeap' [v2] In-Reply-To: References: Message-ID: > trivial Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: construct '_soft_ref_policy' explicitly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17693/files - new: https://git.openjdk.org/jdk/pull/17693/files/d1eff994..be6500d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17693&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17693&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17693/head:pull/17693 PR: https://git.openjdk.org/jdk/pull/17693 From kdnilsen at openjdk.org Sat Feb 3 16:23:05 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 16:23:05 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: <_PQOYVsBqw0NBV7hjtMa9-iX7-KaM24DRZSQVgizphI=.f721790b-5192-4460-ad33-e065a54c4e35@github.com> On Sat, 3 Feb 2024 01:51:24 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 715: > >> 713: } >> 714: >> 715: // Move no more than max_xfer_regions from the existing Collector free partitions to the Mutator free partition. > > I'd avoid the somewhat redundant "Mutator free partition" or "Collector free partition", but merely say "Mutator partition" and "Collector partition". I'd reserve the term "free partition" for a partition that is not a NotFree partition. This allows terseness and precision at the same time. Agree. Thanks. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477088887 From kdnilsen at openjdk.org Sat Feb 3 16:31:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 16:31:03 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 08:08:34 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 287: > >> 285: _leftmosts_empty[which_partition] = _max; >> 286: _rightmosts_empty[which_partition] = 0; >> 287: return 0; > > To my earlier comment of using `ssize_t`, that would allow us to signal failure here by returning a -1. In the interest of stability, I'm inclined to leave this convention as is. Could be persuaded to make the change, but there are probably more than 5 touchpoints that also need to be changed (all invocations, existing documentation, etc.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477090051 From kdnilsen at openjdk.org Sat Feb 3 16:42:05 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 16:42:05 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 08:23:06 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 441: > >> 439: // size_t is unsigned, need to dodge underflow when _leftmost = 0 >> 440: // Fast-path: try to allocate in the collector view first >> 441: for (size_t c = _partitions.rightmost(Collector) + 1; c > _partitions.leftmost(Collector); c--) { > > Use `ssize_t` for c. Thanks. Done. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 457: > >> 455: >> 456: // Try to steal an empty region from the mutator view. >> 457: for (size_t c = _partitions.rightmost_empty(Mutator) + 1; c > _partitions.leftmost_empty(Mutator); c--) { > > `ssize_t` to keep all these loops uniform. Agree. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477094507 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477094575 From kdnilsen at openjdk.org Sat Feb 3 16:52:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 16:52:03 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 08:49:52 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 550: > >> 548: // allocate within. This was observed to result in large amounts of available memory being ignored >> 549: // following a failed shared allocation request. TLAB requests will generally downsize to absorb all >> 550: // memory available within the region even if this is less than the desired size. > > I don't understand this comment, since you are it seems to me retiring the region below at line 553. (Also see comment elsewhere on calling the method `retire_within_partition`, instead of the more natural (to me) `retire_from_partition`. I'm fixing this comment to make more clear. In the current implementation, we only retire a region if the remaining capacity is less than PLAB::min_size(). The previous implementation was observed to retire some regions even when there was 50% of the region still available (in the case that a very large shared alloc failed). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1477097271 From kdnilsen at openjdk.org Sat Feb 3 21:06:50 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 3 Feb 2024 21:06:50 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v7] In-Reply-To: References: Message-ID: <2KNDqLswo1RO4cNekrDD_nwWxz9QsLNxdYlNyayuzfI=.67115614-d5d4-483a-9691-16f14f9be51a@github.com> > Several objectives: > 1. Reduce humongous allocation failures by segregating regular regions from humongous regions > 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB > 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations > 4. Treat collector reserves as available for Mutator allocations after evacuation completes > 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah > > On internal performance pipelines, this change shows: > > 1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions. > 2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%). This system CPU time increase appears to correlate with increased page faults and/or rss. > 3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood) > 4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs) > > Shenandoah > ------------------------------------------------------------------------------------------------------- > +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000 > Control: 819938.875 (+/-5724.56 ) 40 > Test: 2185552.625 (+/-26378.64 ) 20 > > +166.16% scimark.sparse.large/rss_max p=0.00000 > Control: 3285226.375 (+/-22812.93 ) 40 > Test: 8743881.500 (+/-104906.69 ) 20 > > +164.78% sunflow/cpu_system p=0.00000 > Control: 1.280s (+/- 0.10s ) 40 > Test: 3.390s (+/- 0.13s ) 20 > > +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000 > Control: 3.259 (+/- 1.46 ) 33 > Test: 8.125 (+/- 2.05 ) 20 > > +143.75% pmd/major_page_fault_count p=0.03622 > Control: 1.000 (+/- 0.00 ) 40 > Test: 2.438 (+/- 2.59 ) 20 > > +80.22% lusearch/minor_page_fault_count p=0.00000 > Control: 2043930.938 (+/-4777.14 ) 40 > Test: 3683477.625 (+/-5650.29 ) 20 > > +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000 > Control: 697899.156 (+/-3457.82 ) 40 > Test: 1050363.812 (+/-175237.63 ) 20 > > +49.97% scimark.sparse.small/rss_max p=0.00000 > Control: 277075... Kelvin Nilsen has updated the pull request incrementally with 15 additional commits since the last revision: - Correct an invalid assertion - Remove extraneous assertion - Fix comment describing retirement of regions following failed allocation - Use ssize_t for iterating over partition regions - Fix description of move_regions_from_collector_to_mutator - Rename retire_within_partition New name is retire_from_partition - Remove unhelpful comment that might cause undue concern to maintainers - Rename move_regions_from_collector_to_mutator_partition New name is move_regions_from_collector_to_mutator. Hide the partition abstraction from public api. - Change loop iterator to ssize_t from int - Adjust enum ShenandoahFreeSetPartitionId to clarify NotFree is not a partition - ... and 5 more: https://git.openjdk.org/jdk/compare/fb1f5bfe...5e27a585 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17561/files - new: https://git.openjdk.org/jdk/pull/17561/files/fb1f5bfe..5e27a585 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=05-06 Stats: 125 lines in 5 files changed: 12 ins; 13 del; 100 mod Patch: https://git.openjdk.org/jdk/pull/17561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17561/head:pull/17561 PR: https://git.openjdk.org/jdk/pull/17561 From tschatzl at openjdk.org Mon Feb 5 13:42:03 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 5 Feb 2024 13:42:03 GMT Subject: RFR: 8325081: Move '_soft_ref_policy' to 'CollectedHeap' [v2] In-Reply-To: References: Message-ID: <0hDALL2-YckvKJySdlXRA9_3WsnKgKfJhrydobHXW-A=.8b3d566f-3e51-4863-9456-6af61aa95ed0@github.com> On Sat, 3 Feb 2024 16:13:24 GMT, Lei Zaakjyu wrote: >> trivial > > Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: > > construct '_soft_ref_policy' explicitly lgtm (but not trivial) ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17693#pullrequestreview-1862827219 From ysr at openjdk.org Mon Feb 5 15:15:08 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 5 Feb 2024 15:15:08 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 16:28:43 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 287: >> >>> 285: _leftmosts_empty[which_partition] = _max; >>> 286: _rightmosts_empty[which_partition] = 0; >>> 287: return 0; >> >> To my earlier comment of using `ssize_t`, that would allow us to signal failure here by returning a -1. > > In the interest of stability, I'm inclined to leave this convention as is. Could be persuaded to make the change, but there are probably more than 5 touchpoints that also need to be changed (all invocations, existing documentation, etc.) That sounds reasonable; can be addressed later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1478408600 From ysr at openjdk.org Mon Feb 5 15:24:04 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 5 Feb 2024 15:24:04 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: <5hm7G2JJ9Kmcc7DHdRPNqeZKJcgQXkf-N1kK1VmDAyI=.7141723b-23bf-4705-9fa9-aeb77e38602f@github.com> References: <5hm7G2JJ9Kmcc7DHdRPNqeZKJcgQXkf-N1kK1VmDAyI=.7141723b-23bf-4705-9fa9-aeb77e38602f@github.com> Message-ID: On Sat, 3 Feb 2024 16:07:21 GMT, Kelvin Nilsen wrote: >> The wording has been a bit imprecise, possibly made even worse by some global search and replaces on free-set and partition. I've tried to clarify in most recent draft that we consider Collector and Mutator to be partitions, but the NotFree labels means "not in a partition". Maybe you can help me find the right wording here... > > I'm adjusting this comment in hopes of making the intent more clear. ah, ok. I'll reread the new understanding/documentation with this in mind; thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1478426572 From ysr at openjdk.org Mon Feb 5 15:32:04 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 5 Feb 2024 15:32:04 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 15:44:03 GMT, Kelvin Nilsen wrote: >> We walk twice, first to figure out how much memory is available, and how many regions are completely empty. This information eventually feeds into GenShen's transfer of regions between young-gen and old-gen. There is less motivation for that distinction in single-generation Shenandoah because we do not need to make these informed transfers. > > But even before genshen changes, there were two walks through the regions. This is because the rebuild wants to "optimize" the organization of the mutator free set and the collector free set. Certain regions that may have been in the mutator set during previous GC will be in the collector set during the next gc, and vice versa. We strive to arrange that each free set is "tightly packed" over a subrange of the regions, with collector free set at the high end of memory and mutator set at the lower end of memory. With GenShen integration, we will place the old collector set above the collector set. I suppose I'll need to look through this more carefully. In the case of single gen, it still sounded to me like the "clear" really doesn't accomplish anything other than taking stuff out of the free partitions and then the `find_..` sorts them into the new free partitions, and it looked like that could be accomplished by a single walk. If GenShen then wants to break them into two sequences with some other step in between, may be one offers the three API's: one the single-gen optimized one that avoids two walks, and the two APIs `clear` and `find_...` separately for GenShen. But it sounds like you are saying that there is a _need_ for these two walks in the case of single gen as well. Let me discuss this with you offline so I understand this better as I am probably missing something crucial here. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1478437768 From aboldtch at openjdk.org Mon Feb 5 15:50:08 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 5 Feb 2024 15:50:08 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes All my comments have been addressed. As previously mentioned as long as the performance is there, then this looks good. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17495#pullrequestreview-1863145854 From kbarrett at openjdk.org Mon Feb 5 15:59:00 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 5 Feb 2024 15:59:00 GMT Subject: RFR: 8325081: Move '_soft_ref_policy' to 'CollectedHeap' [v2] In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 16:13:24 GMT, Lei Zaakjyu wrote: >> trivial > > Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: > > construct '_soft_ref_policy' explicitly Looks good (and agree with @tschatzl about not trivial, but that doesn't matter now). Also, for future reference, I think the PR description ought to provide more information than was provided here. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17693#pullrequestreview-1863165729 From duke at openjdk.org Mon Feb 5 23:50:00 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Mon, 5 Feb 2024 23:50:00 GMT Subject: RFR: 8325081: Move '_soft_ref_policy' to 'CollectedHeap' [v2] In-Reply-To: References: Message-ID: <2mZDTGNU_wbUwHR9TE2ji1rOwddFz7yXcEFBwzsGSs8=.31aaedbc-7d0d-4d82-ad37-a241e1305e31@github.com> On Sat, 3 Feb 2024 16:13:24 GMT, Lei Zaakjyu wrote: >> trivial > > Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: > > construct '_soft_ref_policy' explicitly thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17693#issuecomment-1927598071 From eosterlund at openjdk.org Mon Feb 5 23:51:34 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 5 Feb 2024 23:51:34 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Mon, 5 Feb 2024 15:47:29 GMT, Axel Boldt-Christmas wrote: > All my comments have been addressed. > > As previously mentioned as long as the performance is there, then this looks good. Thanks for the review, @xmas92. Any other takers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1928143097 From wkemper at openjdk.org Tue Feb 6 00:09:36 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Feb 2024 00:09:36 GMT Subject: RFR: Merge openjdk/jdk:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-23+8 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 72 commits: - Merge tag 'jdk-23+8' into merge-jdk-23+8 Added tag jdk-23+8 for changeset 5b9b176c - 8324174: assert(m->is_entered(current)) failed: invariant Reviewed-by: epeter, dlong, thartmann - 8325042: remove unused JVMDITools test files Reviewed-by: coleenp - 8323621: JDK build should exclude snippet class in java.lang.foreign Reviewed-by: mcimadamore - 8324238: [macOS] java/awt/Frame/ShapeNotSetSometimes/ShapeNotSetSometimes.java fails with the shape has not been applied msg Reviewed-by: azvegint, dnguyen - 8320342: Use PassFailJFrame for TruncatedPopupMenuTest.java Reviewed-by: honkar, aivanov - 8324981: Shenandoah: Move commit and soft max heap changed methods into heap Reviewed-by: shade - 8303374: Implement JEP 455: Primitive Types in Patterns, instanceof, and switch (Preview) Co-authored-by: Jan Lahoda Co-authored-by: Maurizio Cimadamore Co-authored-by: Gavin Bierman Co-authored-by: Brian Goetz Co-authored-by: Raffaello Giulietti Co-authored-by: Aggelos Biboudis Reviewed-by: vromero, jlahoda - 8320712: Rewrite BadFactoryTest in pure Java Reviewed-by: jpai, sundar - 8324771: Obsolete RAMFraction related flags Reviewed-by: dholmes, mbaesken, tschatzl - ... and 62 more: https://git.openjdk.org/shenandoah/compare/1ecdc046...b775a88f ------------- Changes: https://git.openjdk.org/shenandoah/pull/389/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=389&range=01 Stats: 18823 lines in 1229 files changed: 7186 ins; 1717 del; 9920 mod Patch: https://git.openjdk.org/shenandoah/pull/389.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/389/head:pull/389 PR: https://git.openjdk.org/shenandoah/pull/389 From wkemper at openjdk.org Tue Feb 6 00:13:58 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Feb 2024 00:13:58 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.3+1 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 200 commits: - Merge remote-tracking branch 'shenandoah-jdk21u/master' into merge-jdk-21.0.3+1 - 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test Backport-of: 6997bfc68def7f80fbf6a7486a4b9f61225fc471 - 8320943: Files/probeContentType/Basic.java fails on latest Windows 11 - content type mismatch Backport-of: 87516e29dc5015c4cab2c07c5539ad30f2768667 - 8313507: Remove pkcs11/Cipher/TestKATForGCM.java from ProblemList Backport-of: e8471f6bbe692a0d1e293f9e09aaa4f32312eb6a - 8315600: Open source few more headless Swing misc tests Backport-of: b05198a4f354934bc344fe9cbc19d98fd8bc3977 - 8274122: java/io/File/createTempFile/SpecialTempFile.java fails in Windows 11 Backport-of: 4a142c3b0831d60b3d5540f58973e8ad3d1304bf - 8324280: RISC-V: Incorrect implementation in VM_Version::parse_satp_mode Backport-of: e7fdac9d5ce56d2f589df59a7fd2869e35ba2991 - 8324659: GHA: Generic jtreg errors are not reported Backport-of: c313d451a513eb08de0b295c1ce66d0d849d2374 - 8315761: Open source few swing JList and JMenuBar tests Backport-of: bb6b3f2486b07a6ccdeea18519453e6d9c05c2c3 - 8322142: JFR: Periodic tasks aren't orphaned between recordings Backport-of: 1551928502c8ed96350e7b4f1316ea35587407fe - ... and 190 more: https://git.openjdk.org/shenandoah-jdk21u/compare/66665613...4e4e70b0 ------------- Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/19/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=19&range=01 Stats: 30106 lines in 1346 files changed: 15608 ins; 5805 del; 8693 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/19.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/19/head:pull/19 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/19 From duke at openjdk.org Tue Feb 6 01:08:57 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Tue, 6 Feb 2024 01:08:57 GMT Subject: Integrated: 8325081: Move '_soft_ref_policy' to 'CollectedHeap' In-Reply-To: References: Message-ID: On Sat, 3 Feb 2024 07:54:54 GMT, Lei Zaakjyu wrote: > trivial This pull request has now been integrated. Changeset: e0fd3f4d Author: Lei Zaakjyu Committer: Kim Barrett URL: https://git.openjdk.org/jdk/commit/e0fd3f4dababad7189b9e02b37a40ea1a3907554 Stats: 50 lines in 14 files changed: 4 ins; 44 del; 2 mod 8325081: Move '_soft_ref_policy' to 'CollectedHeap' Reviewed-by: kbarrett, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/17693 From kdnilsen at openjdk.org Tue Feb 6 01:37:54 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 6 Feb 2024 01:37:54 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment I dusted off the workloads that had originally motivated this change. Some of the context within which the code executes is different. At the time, we were doing 64 degens before a full gc. Here is the test I ran 5 times: for i in 48g 42g 36g 32g 31g 30g 29g 28g 27g 26g 25g 24g do echo Run TradiShen tip with memory size $i with 4s customer period >&2 echo Run TradiShen tip with memory size $i with 4s customer period ~/github/jdk.2-1-2024/build/linux-x86_64-server-release/jdk/bin/java \ -XX:+UnlockExperimentalVMOptions \ -XX:+UseTransparentHugePages \ -XX:-ShenandoahPacing \ -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms$i -Xmx$i \ -XX:+UseShenandoahGC \ -Xlog:"gc*=info,ergo" \ -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \ -XX:+UnlockDiagnosticVMOptions \ -jar ~/github/heapothesys/Extremem/target/extremem-1.0-SNAPSHOT.jar \ -dInitializationDelay=45s -dDictionarySize=16000000 -dNumCustomers=28000000 \ -dNumProducts=64000 -dCustomerThreads=2000 -dCustomerPeriod=4s -dCustomerThinkTime=1s \ -dKeywordSearchCount=4 -dServerThreads=5 -dServerPeriod=5s -dProductNameLength=10 \ -dBrowsingHistoryQueueCount=5 \ -dSalesTransactionQueueCount=5 \ -dProductDescriptionLength=64 -dProductReplacementPeriod=25s -dProductReplacementCount=5 \ -dCustomerReplacementPeriod=30s -dCustomerReplacementCount=1000 -dBrowsingExpiration=1m \ -dPhasedUpdates=true \ -dPhasedUpdateInterval=60s \ -dSimulationDuration=20m -dResponseTimeMeasurements=100000 echo Run Humongous Failure Handling with one degen and memory size $i with 4s customer period >&2 echo Run Humongous Failure Handling with one degen and memory size $i with 4s customer period ~/gitfarm/shen.humongous-alloc-failure-handling/build/linux-x86_64-server-release/jdk/bin/java \ -XX:+UnlockExperimentalVMOptions \ -XX:+UseTransparentHugePages \ -XX:-ShenandoahPacing \ -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms$i -Xmx$i \ -XX:+UseShenandoahGC \ -Xlog:"gc*=info,ergo" \ -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \ -XX:+UnlockDiagnosticVMOptions \ -jar ~/github/heapothesys/Extremem/target/extremem-1.0-SNAPSHOT.jar \ -dInitializationDelay=45s -dDictionarySize=16000000 -dNumCustomers=28000000 \ -dNumProducts=64000 -dCustomerThreads=2000 -dCustomerPeriod=4s -dCustomerThinkTime=1s \ -dKeywordSearchCount=4 -dServerThreads=5 -dServerPeriod=5s -dProductNameLength=10 \ -dBrowsingHistoryQueueCount=5 \ -dSalesTransactionQueueCount=5 \ -dProductDescriptionLength=64 -dProductReplacementPeriod=25s -dProductReplacementCount=5 \ -dCustomerReplacementPeriod=30s -dCustomerReplacementCount=1000 -dBrowsingExpiration=1m \ -dPhasedUpdates=true \ -dPhasedUpdateInterval=60s \ -dSimulationDuration=20m -dResponseTimeMeasurements=100000 done My shen.humongous-alloc-failure-handling branch had an experimental delta from what is here. The idea was to do only one degen, and then upgrade to full on a humongous alloc failure: ```diff --git a/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp b/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp index a4b2adc8f5a..42721059b7e 100644 --- a/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp +++ b/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp @@ -108,7 +108,11 @@ void ShenandoahControlThread::run_service() { // If a humongous allocation has failed, then the heap is likely in need of compaction, so run // a full gc (which compacts regions) instead of a degenerated gc (which does not compact regions). - if (ShenandoahDegeneratedGC && heuristics->should_degenerate_cycle() && !humongous_alloc_failure_pending) { + + // Experiment: If we had a humongous_alloc_failure, make sure we try at least one degen before going to full. + if (ShenandoahDegeneratedGC && + ((humongous_alloc_failure_pending && heap->shenandoah_policy()->consecutive_degenerated_gc_count() == 0) || + (!humongous_alloc_failure_pending && heuristics->should_degenerate_cycle()))) { heuristics->record_allocation_failure_gc(); policy->record_alloc_failure_to_degenerated(degen_point); mode = stw_degenerated; The results are summarized in the attached spreadsheet. [humongous-alloc-failure-handling.xlsx](https://github.com/openjdk/jdk/files/14173614/humongous-alloc-failure-handling.xlsx) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1928614499 From kdnilsen at openjdk.org Tue Feb 6 01:55:56 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 6 Feb 2024 01:55:56 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment Analysis of the spreadsheet results: 1. I repeated the experiment 5 times because the results were so "noisy" 2. Turns out, the results have almost nothing to do with how many degens we do before upgrading to full following a humongous allocation failure. The reaons is because humongous allocation failures are very rare (on these workloads). 3. The biggest problem is "upgrading degen to full gc". Whenever we see more than a few upgrades to full gc, the P50 latency explodes on us. 1. Often, we see the same exat workload configuration run without any upgrades to full gc and perform much better. Consider, for example 25g heap, Mainline. Three runs ran without upgrades to full gc. P50 latency was 1_147, 1_145, and 1_144 microseconds respectively. Two runs experienced upgrades to full gc. Those had p50 latencies of 223_236_560 and 86_091_599 microseconds respectively. 2. Note that there's not much middle ground here. If the workload avoids upgrading to full, the results are good. If it experiences a single upgrade to full, it is likely to spiral into a lot more upgrades to full. 3. My assessment: upgrade to full is generally counterproductive. We take a "long pause" to do degenerated work, and then we throw that work away and start all over with another even longer pause to do full gc. Meanwhile, client requests continue to accumulate. Based on these measurements, I'm inclined to recommend the following: 1. If a humongous allocation request fails, we should ask ourselves "How much humongous memory was available at the most recent freeset rebuild?" If that amount of memory is greater or equal to the size of the requested humongous allocation, we should degenerate (always). If it is smaller, we should do Full GC. 2. We should never upgrade to full from degen. The heuristic about "non-productive" degen is confusing us. What typically happens is we get into an unproductive cycle consisting of the following: 1. Concurrent GC fails to allocate 2. Degenerated GC takes multiple seconds to complete 3. We resume execution with multiple seconds of pentup demand for allocation 4. Concurrent GC triggers immediately, but may degenerate again if we are "near the edge of the cliff" due to the burst of "catch-up" work that we are trying to perform 5. All the memory just allocated during concurrent GC is "floating garbage". None of it can be reclaimed by degenerated GC. 6. Degen is unproductive. 7. We upgrade to Full GC. 8. Full GC reclaims all the floating garbage. 9. When we resume mutator work, the pentup demand is even larger than the previous scenario because we now have pent up demand from both STW degen phase and STW full GC phase. 10. So we repeat this cycle over and over and over again. I'm going to try an experiment where the only time we do full gc is if there is humongous allocation failure and humongous memory from most recently built free set is too small to satisfy the alloc request. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1928639977 From kdnilsen at openjdk.org Tue Feb 6 02:00:53 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 6 Feb 2024 02:00:53 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment Out of cycle alloc failure might as well do full GC. I would keep that behavior in my experiment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1928644014 From ysr at openjdk.org Tue Feb 6 03:33:58 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 6 Feb 2024 03:33:58 GMT Subject: RFR: 8323634: Shenandoah: Document behavior of EvacOOM protocol [v5] In-Reply-To: References: <6ciSyKdz9hA6RBOZeDicFetK_G4AUBpx40YX7yT1O1M=.870e1ba1-6f4b-48e9-8360-dab141a3041d@github.com> Message-ID: On Wed, 24 Jan 2024 17:53:39 GMT, Kelvin Nilsen wrote: >> The protocol for handling OOM during evacuation is subtle and critical for correct operation. This PR does NOT change behavior. It provides improved documentation of existing behavior. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix spelling error and mismatched parentheses. I like the longer block comment you wrote in the .hpp file describing the protocol because it provides fuller context and defines the intent of the protocol in greater detail. I am not sure which I would go with as both descriptions look good to me in their own way. The existing one has the benefit of being both concrete and concise. With that in mind, I also left a few comments on the original documentation on the left side. I am good with whatever you choose to use. The smaller individual documentation comments for each method look good to me, irrespective. Sorry for the long delay in getting back on this. The protocol is subtle, and I like your ideas about potentially improving it in the future. src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp line 46: > 44: void ShenandoahEvacOOMCounter::clear() { > 45: assert(unmasked_count() == 0, "sanity"); > 46: Atomic::release_store_fence(&_bits, (jint)0); Leaving a comment here but it applies to the comment at line 40 above, which reads: // NOTE: It's ok to simply decrement, even with mask set, because unmasked value is positive. May be leave a block comment at the start of the method at line 37 that states: // Decrement the counter atomically, leaving the OOM bit unchanged at its original state. Then, the comment at current line 40, could : // The value is necessarily positive before we decrement, as we assert above, because // this thread incremented it earlier. Since we atomically decrement a positive value, // the state of the OOM bit is left unchanged at its original value. src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp line 52: > 50: // associated with this counter. After all _num_counters OOM bits have been set, all threads newly attempting to enter_evacuation > 51: // will be informed that they cannot allocate for evacation. Threads that entered evacuation before the OOM bit was set may > 52: // continue to allocate for evacuation until they exit_evacuation. This can simply state: // Set the OOM bit, and optionally decrement the counter I don't think you need to describe how this fits into the OOM protocol, at least not here. That confuses the documentation and the reader. That can be put in the caller or in a block comment describing the protocol. src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp line 62: > 60: jint other = Atomic::cmpxchg(&_bits, threads_in_evac, newval); > 61: if (other == threads_in_evac) { > 62: // Success: return so we can wait for other threads to stop allocating. I would simplify this comment to: // Successfully set the OOM bit (and optionally decremented the counter of threads_in_evac) The context of what happens after we return should be described in the caller, not here. src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp line 65: > 63: break; > 64: } else { > 65: // Failure: try again with updated new value. Adding comment here, but applies to `ShenandoahEvacOOMCounter::try_increment()` below, lines 71-89, as a block comment before line 71; one could document it as: // Unless OOM bit is set, increment the counter and return true. // If OOM bit is set, simply return false without incrementing the counter. The context of what the caller does, should be described in the caller, not in this method. src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 85: > 83: * OOM-during-evac-handler. The handler allows multiple threads to enter and exit > 84: * evacuation path, but on OOME it requires all threads that experienced OOME to wait > 85: * for current threads to leave, and blocks other threads from entering. The counter state After the period on line 85, I'd add one sentence: // The counter not only tracks the number of threads in the evacuation path, // but also whether any thread has encountered an OOM-during-evac. It thus // captures all of the state needed to track the execution of the protocol. src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 87: > 85: * for current threads to leave, and blocks other threads from entering. The counter state > 86: * is striped across multiple cache lines to reduce contention when many threads attempt > 87: * to enter or leave the protocol at the same time. At the end of the period at line 87, I'd add: // As a result, the protocol needs special steps, in the event of an OOM-during-evac, // to ensure that all of the striped counters are zero before the protocol can terminate. // Once the protocol terminates with the OOM bit set, no threads will attempt // further allocations for evacuation, so any unresolved forwarding pointer uniquely // to either its new already-forwarded location or to its original to-space location. src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 130: > 128: * safepoint. Marking by Full GC will finish updating references that might > 129: * be inconsistent within the heap, and will then compact all live memory within > 130: * the heap. I like the longer comment you wrote because it provides fuller context and defines the intent of the protocol in greater detail. I am not sure which I would go with as both descriptions look good to me in their own way. With that in mind, I also left a few comments on the original documentation on the left side. I am good with whatever you choose to use. src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 136: > 134: * Maintain a count of how many threads are on an evac-path (which is allocating for evacuation) > 135: * > 136: * Upon entry of the evac-path, entering thread will attempt to increase the counter, "atomically increment the counter, if the OOM-bit isn't set." src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 147: > 145: * > 146: * > 147: * Upon exit, exiting thread will decrease the counter using atomic dec. atomically decrement the counter; rather than "decrease the counter using atomic dec." src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 172: > 170: * make the protocol more efficient. > 171: * > 172: * TODO: make refinements to the OOM-during-evac protocol so that it is less disruptive and more efficient. May be all of this and the remainder of this comment in terms of improvements from line 162 above up to line 203 below should instead go in a JBS ticket, include here only a terse TODO with a pointer to the ticket for details: // TODO: JDK-XXXX will investigate potential performance/efficiency improvements to this protocol. src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 212: > 210: _oom_not_evacuating > 211: }; > 212: volatile ShenandoahEvacuationState _evacuation_state; Leave a single line of documentation here stating that this is an auxiliary field introduced just for the sake of checking an invariant of the protocol. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17385#pullrequestreview-1842713670 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479084937 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479059992 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479060963 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479063865 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479121544 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479122394 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479174040 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479158182 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479123287 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479102766 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1479066055 From wkemper at openjdk.org Tue Feb 6 17:09:24 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Feb 2024 17:09:24 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 2 Feb 2024 14:09:54 GMT, William Kemper wrote: > Merges tag jdk-23+8 This pull request has now been integrated. Changeset: 55068c2e Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/55068c2ee4a7d3189b930158780774a1bb36636d Stats: 18823 lines in 1229 files changed: 7186 ins; 1717 del; 9920 mod Merge ------------- PR: https://git.openjdk.org/shenandoah/pull/389 From wkemper at openjdk.org Tue Feb 6 22:02:14 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Feb 2024 22:02:14 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master [v3] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.3+1 William Kemper has updated the pull request incrementally with one additional commit since the last revision: Fix wrong API usage in wrongly resolved conflict ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/19/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/19/files/4e4e70b0..8dfe163f Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=19&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=19&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/19.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/19/head:pull/19 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/19 From kdnilsen at openjdk.org Wed Feb 7 01:42:10 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 01:42:10 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v8] In-Reply-To: References: Message-ID: > Several objectives: > 1. Reduce humongous allocation failures by segregating regular regions from humongous regions > 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB > 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations > 4. Treat collector reserves as available for Mutator allocations after evacuation completes > 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah > > On internal performance pipelines, this change shows: > > 1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions. > 2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%). This system CPU time increase appears to correlate with increased page faults and/or rss. > 3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood) > 4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs) > > Shenandoah > ------------------------------------------------------------------------------------------------------- > +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000 > Control: 819938.875 (+/-5724.56 ) 40 > Test: 2185552.625 (+/-26378.64 ) 20 > > +166.16% scimark.sparse.large/rss_max p=0.00000 > Control: 3285226.375 (+/-22812.93 ) 40 > Test: 8743881.500 (+/-104906.69 ) 20 > > +164.78% sunflow/cpu_system p=0.00000 > Control: 1.280s (+/- 0.10s ) 40 > Test: 3.390s (+/- 0.13s ) 20 > > +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000 > Control: 3.259 (+/- 1.46 ) 33 > Test: 8.125 (+/- 2.05 ) 20 > > +143.75% pmd/major_page_fault_count p=0.03622 > Control: 1.000 (+/- 0.00 ) 40 > Test: 2.438 (+/- 2.59 ) 20 > > +80.22% lusearch/minor_page_fault_count p=0.00000 > Control: 2043930.938 (+/-4777.14 ) 40 > Test: 3683477.625 (+/-5650.29 ) 20 > > +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000 > Control: 697899.156 (+/-3457.82 ) 40 > Test: 1050363.812 (+/-175237.63 ) 20 > > +49.97% scimark.sparse.small/rss_max p=0.00000 > Control: 277075... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Combine first two passes over freeset during rebuild ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17561/files - new: https://git.openjdk.org/jdk/pull/17561/files/5e27a585..07fe812a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=06-07 Stats: 87 lines in 3 files changed: 73 ins; 6 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/17561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17561/head:pull/17561 PR: https://git.openjdk.org/jdk/pull/17561 From dlong at openjdk.org Wed Feb 7 02:59:58 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 7 Feb 2024 02:59:58 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes I saw an earlier version, but I plan to look at your latest, soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1931181620 From eosterlund at openjdk.org Wed Feb 7 07:04:58 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 7 Feb 2024 07:04:58 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Wed, 7 Feb 2024 02:56:54 GMT, Dean Long wrote: > I saw an earlier version, but I plan to look at your latest, soon. Thank you @dean-long I appreciate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1931405479 From kdnilsen at openjdk.org Wed Feb 7 16:26:54 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 16:26:54 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 490: > 488: bool is_humongous = words > ShenandoahHeapRegion::humongous_threshold_words(); > 489: > 490: if (try_set_alloc_failure_gc(is_humongous)) { I think we can replace this with assert(words <= ShenandoahHeapRegion::humongous_ghreshold_words() We do not evacuate humongous objects. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17638#discussion_r1481751566 From kdnilsen at openjdk.org Wed Feb 7 18:25:15 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 18:25:15 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v9] In-Reply-To: References: Message-ID: <-uub8b16AndOotBHCR4JnNe5kS8pR7qw1WYYVXa2nXY=.c4adf2dc-3d98-41f0-b9d5-9a3b36b82935@github.com> > Several objectives: > 1. Reduce humongous allocation failures by segregating regular regions from humongous regions > 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB > 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations > 4. Treat collector reserves as available for Mutator allocations after evacuation completes > 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah > > On internal performance pipelines, this change shows: > > 1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions. > 2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%). This system CPU time increase appears to correlate with increased page faults and/or rss. > 3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood) > 4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs) > > Shenandoah > ------------------------------------------------------------------------------------------------------- > +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000 > Control: 819938.875 (+/-5724.56 ) 40 > Test: 2185552.625 (+/-26378.64 ) 20 > > +166.16% scimark.sparse.large/rss_max p=0.00000 > Control: 3285226.375 (+/-22812.93 ) 40 > Test: 8743881.500 (+/-104906.69 ) 20 > > +164.78% sunflow/cpu_system p=0.00000 > Control: 1.280s (+/- 0.10s ) 40 > Test: 3.390s (+/- 0.13s ) 20 > > +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000 > Control: 3.259 (+/- 1.46 ) 33 > Test: 8.125 (+/- 2.05 ) 20 > > +143.75% pmd/major_page_fault_count p=0.03622 > Control: 1.000 (+/- 0.00 ) 40 > Test: 2.438 (+/- 2.59 ) 20 > > +80.22% lusearch/minor_page_fault_count p=0.00000 > Control: 2043930.938 (+/-4777.14 ) 40 > Test: 3683477.625 (+/-5650.29 ) 20 > > +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000 > Control: 697899.156 (+/-3457.82 ) 40 > Test: 1050363.812 (+/-175237.63 ) 20 > > +49.97% scimark.sparse.small/rss_max p=0.00000 > Control: 277075... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17561/files - new: https://git.openjdk.org/jdk/pull/17561/files/07fe812a..7d5c1fc6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17561/head:pull/17561 PR: https://git.openjdk.org/jdk/pull/17561 From kdnilsen at openjdk.org Wed Feb 7 18:33:11 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 18:33:11 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Mon, 5 Feb 2024 15:29:22 GMT, Y. Srinivas Ramakrishna wrote: >> But even before genshen changes, there were two walks through the regions. This is because the rebuild wants to "optimize" the organization of the mutator free set and the collector free set. Certain regions that may have been in the mutator set during previous GC will be in the collector set during the next gc, and vice versa. We strive to arrange that each free set is "tightly packed" over a subrange of the regions, with collector free set at the high end of memory and mutator set at the lower end of memory. With GenShen integration, we will place the old collector set above the collector set. > > I suppose I'll need to look through this more carefully. In the case of single gen, it still sounded to me like the "clear" really doesn't accomplish anything other than taking stuff out of the free partitions and then the `find_..` sorts them into the new free partitions, and it looked like that could be accomplished by a single walk. > > If GenShen then wants to break them into two sequences with some other step in between, may be one offers the three API's: one the single-gen optimized one that avoids two walks, and the two APIs `clear` and `find_...` separately for GenShen. But it sounds like you are saying that there is a _need_ for these two walks in the case of single gen as well. Let me discuss this with you offline so I understand this better as I am probably missing something crucial here. Thanks! In the most recent commit, I have merged the first two passes through the free set. We no longer have a separate clear pass that precedes moving regions with available capacity into the mutator free set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1481913157 From kdnilsen at openjdk.org Wed Feb 7 18:33:11 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 18:33:11 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 05:20:56 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 36: > >> 34: enum ShenandoahFreeSetPartitionId : uint8_t { >> 35: NotFree, // Region has been retired and is not in any free set: there is no available memory. >> 36: Mutator, // Region is in the Mutator free set: available memory is available to mutators. > > Just want to make sure: "available to mutators" -- is this both for object allocation as well as for possible evacuation as part of the mutator LRB? For personal clarification: when the mutator LRB needs to evacuate an object, it uses the collector set. Each mutator has three TLABS: one for mutator allocations, one for young-gen evacuations, and one for old-gen evacuations. Let me know if you think we need more documentation around this. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 37: > >> 35: NotFree, // Region has been retired and is not in any free set: there is no available memory. >> 36: Mutator, // Region is in the Mutator free set: available memory is available to mutators. >> 37: Collector, // Region is in the Collector free set: available memory is reserved for evacuations. > > When mutators evacuate the target of an LRB, do they use `Mutator` or `Collector`. I assume the former? In that case, I'd say for Collector: `available memory is reserved for collector threads for evacuation`. actually, the collector reserve is for all evacuation, whether performed by collector worker threads or by mutator threads doing LRB handling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1481915638 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1481916799 From kdnilsen at openjdk.org Wed Feb 7 18:39:10 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 18:39:10 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 05:22:21 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 50: > >> 48: const size_t _max; // The maximum number of heap regions >> 49: const size_t _region_size_bytes; >> 50: const ShenandoahFreeSet* _free_set; > > Interesting: why does the partitioning need a reference to its containing free set? There are a few places where ShenandoahRegionPartitions invokes services provided by ShenandoahFreeSet, such as calling _free_set->alloc_capacity() when enforcing certain assertions and when adjusting interval boundaries. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1481922483 From kdnilsen at openjdk.org Wed Feb 7 18:47:59 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 18:47:59 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 07:27:46 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 44: >> >>> 42: >>> 43: // This class implements partitioning of regions into distinct sets. Each ShenandoahHeapRegion is either in the Mutator free set, >>> 44: // the Collector free set, or in neither free set (NotFree). >> >> I noticed that you use the term "free partition" quite a lot later, I'd just start using that term early on when talking about these sets. You could, for example, say: >> >> // Whenever we say "free partition", we mean any partition other than the "NotFree" partition. > > Or: > > // Any partition that is not the "NotFree" partition is a "free partition". Thanks. I've made this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1481929525 From kdnilsen at openjdk.org Wed Feb 7 18:47:58 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 18:47:58 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Wed, 7 Feb 2024 18:29:41 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 36: >> >>> 34: enum ShenandoahFreeSetPartitionId : uint8_t { >>> 35: NotFree, // Region has been retired and is not in any free set: there is no available memory. >>> 36: Mutator, // Region is in the Mutator free set: available memory is available to mutators. >> >> Just want to make sure: "available to mutators" -- is this both for object allocation as well as for possible evacuation as part of the mutator LRB? > > For personal clarification: when the mutator LRB needs to evacuate an object, it uses the collector set. Each mutator has three TLABS: one for mutator allocations, one for young-gen evacuations, and one for old-gen evacuations. Let me know if you think we need more documentation around this. (actually, the old-gen TLAB is not in single-generation Shenandoah, only in GenShen.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1481928053 From kdnilsen at openjdk.org Wed Feb 7 18:48:00 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 18:48:00 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 05:51:14 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 74: > >> 72: // and _used[p], even though the region may have been removed from the free set. >> 73: size_t _capacity[NumPartitions]; >> 74: size_t _used[NumPartitions]; > > In light of your earlier documentation of leftmost/righmost/empty/available etc. then, would it be fair to say that the following statement is always true: > > for p = NotFree: > 1. leftmosts[p] = leftmosts_empty[p] = _max > 2. rightmosts_empty[p] = rightmosts_empty[p] = 0 > 3. capacity[p] = used[p] = region_size > > Are the "NotFree" entries for these arrays ever used? > > If not, is there any point in keeping them in a product build? Is there any point in keeping them in a non-product build? Does it have some other role that makes it important to keep it, anyway? In most recent change, I shrunk the sizes of the arrays to not include an entry for NotFree. We only maintain entries for Mutator and Collector. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 75: > >> 73: size_t _capacity[NumPartitions]; >> 74: size_t _used[NumPartitions]; >> 75: size_t _region_counts[NumPartitions]; > > If tracked, is this an invariant of these fields? > > - region_counts[NotFree] == _max - (region_counts[Mutator] + region_counts[Collector]) > > (This would also make the region_counts[NotFree] unnecessary? See my previous comment.) This is not tracked, and no longer relevant because I removed region_counts[NotFree]. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1481931675 PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1481932447 From kdnilsen at openjdk.org Wed Feb 7 18:56:57 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 18:56:57 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 06:17:25 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 95: > >> 93: void make_free(size_t idx, ShenandoahFreeSetPartitionId which_partition, size_t region_capacity); >> 94: >> 95: // Place region idx into free partition new_partition. Requires that idx is currently not NotFree. > > Include semantics of region_capacity in comment, e.g.: > > > // Move region idx, with region_capacity bytes of available free space, > // from the NotFree partition to the free partition new_partition. Thanks. I've adjusted this comment to make the intent more clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1481941805 From wkemper at openjdk.org Wed Feb 7 19:16:59 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 7 Feb 2024 19:16:59 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master [v4] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.3+1 William Kemper has updated the pull request incrementally with one additional commit since the last revision: Finally fix these funny tests ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/19/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/19/files/8dfe163f..3b76ed75 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=19&range=03 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=19&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/19.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/19/head:pull/19 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/19 From kdnilsen at openjdk.org Wed Feb 7 21:00:57 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 21:00:57 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: <2-le3X33u0wR8EuyrQnG0rn2YtiMUDWydFCq0-R9U4s=.1ec26721-bb1c-4d50-894e-277aacc2170d@github.com> On Thu, 1 Feb 2024 03:01:08 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 171: > >> 169: }; >> 170: >> 171: class ShenandoahFreeSet : public CHeapObj { > > It would be good to have a block comment here motivating this class. > It seems (from looking at some of its public APIs) as if it publicly exports only the "mutator view", which I find interesting. > > The other partitions in `ShenandoahRegionPartition` appears to be for efficiency of the implementation in service of the public APIs for ShenandoahFreeSet. Thanks. I've added a block comment to describe ShenandoahFreeSet and have enhanced the comment that describes ShenandoahRegionPartition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1482087963 From kdnilsen at openjdk.org Wed Feb 7 21:18:58 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 21:18:58 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 02:39:33 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 187: > >> 185: // regions. >> 186: // >> 187: // Precondition: req.size() > ShenandoahHeapRegion::humongous_threshold_words(). > > `>` or `>=` ? >. See the only invocation from ShenandoahFreeSet::allocate(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1482106597 From kdnilsen at openjdk.org Wed Feb 7 21:25:12 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 21:25:12 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v10] In-Reply-To: References: Message-ID: > Several objectives: > 1. Reduce humongous allocation failures by segregating regular regions from humongous regions > 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB > 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations > 4. Treat collector reserves as available for Mutator allocations after evacuation completes > 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah > > On internal performance pipelines, this change shows: > > 1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions. > 2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%). This system CPU time increase appears to correlate with increased page faults and/or rss. > 3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood) > 4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs) > > Shenandoah > ------------------------------------------------------------------------------------------------------- > +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000 > Control: 819938.875 (+/-5724.56 ) 40 > Test: 2185552.625 (+/-26378.64 ) 20 > > +166.16% scimark.sparse.large/rss_max p=0.00000 > Control: 3285226.375 (+/-22812.93 ) 40 > Test: 8743881.500 (+/-104906.69 ) 20 > > +164.78% sunflow/cpu_system p=0.00000 > Control: 1.280s (+/- 0.10s ) 40 > Test: 3.390s (+/- 0.13s ) 20 > > +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000 > Control: 3.259 (+/- 1.46 ) 33 > Test: 8.125 (+/- 2.05 ) 20 > > +143.75% pmd/major_page_fault_count p=0.03622 > Control: 1.000 (+/- 0.00 ) 40 > Test: 2.438 (+/- 2.59 ) 20 > > +80.22% lusearch/minor_page_fault_count p=0.00000 > Control: 2043930.938 (+/-4777.14 ) 40 > Test: 3683477.625 (+/-5650.29 ) 20 > > +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000 > Control: 697899.156 (+/-3457.82 ) 40 > Test: 1050363.812 (+/-175237.63 ) 20 > > +49.97% scimark.sparse.small/rss_max p=0.00000 > Control: 277075... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Respond to review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17561/files - new: https://git.openjdk.org/jdk/pull/17561/files/7d5c1fc6..b2ba4cf2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=08-09 Stats: 43 lines in 2 files changed: 27 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/17561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17561/head:pull/17561 PR: https://git.openjdk.org/jdk/pull/17561 From kdnilsen at openjdk.org Wed Feb 7 21:25:12 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 21:25:12 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 02:55:49 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 184: >> >>> 182: HeapWord* allocate_single(ShenandoahAllocRequest& req, bool& in_new_region); >>> 183: >>> 184: // While holding the heap lock, allocate memory for a humongous object which will span multiple contiguous heap >> >> `which will` or `which may`? (Is a humongous object allowed to span just a single region as well?) >> >> Or are objects humongous only if they won't fit in a region? In which case the "will" is correct. >> >> I was confused by tests that use `ShenandoahHumongousThreshold=50` , `=90`, etc. >> >> May be in those cases, we go through the `allocate_single()` despite allocating an object (or block) bigger than `ShenandoahHeapRegion::humongous_threshold_words()` ? (That would make the pre-condition of the previous method suspect, though.) > > Same remark applies to the precondition comment below (which is correct, but could be made stronger to say `req.size() > ShenandoahHeapRegion::RegionSizeWords` or such? Thanks for prodding with these questions. My comment was not accurate. I've endeavored to fix the comment. A humongous object may span 1 or more regions. The extra memory within the region that is not used to represent the humongous object is wasted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1482111007 From kdnilsen at openjdk.org Wed Feb 7 21:25:12 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Feb 2024 21:25:12 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 02:58:51 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comments for _capacity_of and _used_by > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 231: > >> 229: inline size_t available() const { >> 230: assert(used() <= capacity(), "must use less than capacity"); >> 231: return capacity() - used(); > > So `ShenandoahFreeSet` publicly exports only the mutator view? I think of this as "public to the mutator" and "friendly public to the collector". I've tried to clarify with new comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1482111723 From wkemper at openjdk.org Wed Feb 7 21:48:04 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 7 Feb 2024 21:48:04 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master [v5] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.3+1 William Kemper has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/19/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/19/files/3b76ed75..af5fd615 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=19&range=04 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=19&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/19.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/19/head:pull/19 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/19 From dlong at openjdk.org Thu Feb 8 09:21:00 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Feb 2024 09:21:00 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/cpu/aarch64/aarch64.ad line 2224: > 2222: // This is the unverified entry point. > 2223: C2_MacroAssembler _masm(&cbuf); > 2224: __ ic_check(CodeEntryAlignment); I'm not sure we want to increase the alignement to CodeEntryAlignment here. I believe C2 already aligns the root block to CodeEntryAlignment. @theRealAph, what do you think? src/hotspot/share/opto/output.cpp line 3416: > 3414: } else { > 3415: if (!target->is_static()) { > 3416: _code_offsets.set_value(CodeOffsets::Entry, _first_block_size - MacroAssembler::ic_check_size()); This looks tricky. I think it means CodeOffsets::Entry starts after the alignment padding NOPs. If that's true then the `ic_check` functions could use a comment explaining that alignment needs to come first, not last. A comment here wouldn't hurt either. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482646992 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482643531 From dlong at openjdk.org Thu Feb 8 09:23:57 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Feb 2024 09:23:57 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 295: > 293: // inline cache check; done before the frame is built. > 294: int LIR_Assembler::check_icache() { > 295: return __ ic_check(CodeEntryAlignment); Do we really want to remove the optimization that skips alignment for small methods? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482653192 From dlong at openjdk.org Thu Feb 8 09:28:01 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Feb 2024 09:28:01 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1016: > 1014: Register tmp1 = rscratch1; > 1015: Register tmp2 = r10; > 1016: It would be nice if we could still call `verify_oop(receiver)` here, but I see that would complicate `ic_check_size()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482659111 From tschatzl at openjdk.org Thu Feb 8 12:52:07 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 8 Feb 2024 12:52:07 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes First, I'm a bit hesitant to approve this because while it looks good to me, I'm no expert in that code and in particular how it interacts with other things all around. Also I only looked at x86 assembler code (and a bit aarch64), so this is not a complete review anyway. The only comments I can contribute is some additional refactoring, so I'm keeping this review as Comment. src/hotspot/share/code/compiledIC.hpp line 57: > 55: }; > 56: > 57: // A CompiledICData* is a helper object for the inline cache implementation. Suggestion: // A CompiledICData is a helper object for the inline cache implementation. src/hotspot/share/code/compiledMethod.cpp line 457: > 455: if (clean_all || !cm->is_in_use() || cm->is_unloading() || cm->method()->code() != cm) { > 456: cdc->set_to_clean(); > 457: } Maybe a single `clean_if_nmethod_is_unloaded()` with the destination address as a parameter would avoid the code duplication with the other variant; otherwise another static helper function would be great. src/hotspot/share/code/nmethod.cpp line 2310: > 2308: } else { > 2309: CompiledDirectCall::at(call_site); > 2310: } Since `CompiledICLocker` does not lock anyway if it is safe (using the `is_safe` method), isn't the `if` superfluous here (and just keeping the `else` part does the same)? src/hotspot/share/oops/oopsHierarchy.hpp line 182: > 180: class Method; > 181: class ConstantPool; > 182: // class CHeapObj Not sure why this is here, but maybe also remove this line. src/hotspot/share/runtime/sharedRuntime.cpp line 1369: > 1367: } else { > 1368: // Callsite is a direct call - set it to the destination method > 1369: CompiledICLocker ml(caller_nm); Not sure if it makes sense to factor out the `CompiledICLocker ml(caller_nm); call in both if branches. src/hotspot/share/runtime/sharedRuntime.cpp line 1665: > 1663: > 1664: // Check relocations for the matching call to 1) avoid false positives, > 1665: // and 2) determine the type. To me the comment is confusing (probably pre-existing): There does not seem to be a result of the code checking the relocations containing the "type" of the following code. I.e. the only reason this seems to be done here is reason 1), reason 2) seems obsolete. src/hotspot/share/runtime/sharedRuntime.cpp line 1803: > 1801: RelocIterator iter(caller, callsite_addr, callsite_addr + 1); > 1802: if (!iter.next()) { > 1803: // No reloc entry found; not a static or opt virutal call Suggestion: // No reloc entry found; not a static or optimized virtual call (actually s/virutal/virtual is as fine. Or maybe use `opt_virtual` as in line 1798 above) src/hotspot/share/runtime/sharedRuntime.cpp line 1807: > 1805: } > 1806: > 1807: assert(iter.has_current(), "must have a reloc at java call site"); This assert seems to be superfluous after the `if (!iter.next())` bailout just above. src/hotspot/share/runtime/sharedRuntime.cpp line 1837: > 1835: CompiledMethod* caller_nm = cb->as_compiled_method(); > 1836: > 1837: for (;;) { I think the comment in line 1826 about // Transitioning IC caches may require transition stubs. If we run out // of transition stubs, we have to drop locks and perform a safepoint // that refills them. is out of date and should be removed. At least the code that generates the IC stubs below has been removed... ------------- PR Review: https://git.openjdk.org/jdk/pull/17495#pullrequestreview-1869869238 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482897763 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482895770 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482870537 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482845871 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482825184 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482810188 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482820114 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482822437 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1482796779 From wkemper at openjdk.org Thu Feb 8 14:20:50 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 14:20:50 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master Message-ID: Merges tag jdk-21.0.3+2 ------------- Commit messages: - 8325096: Test java/security/cert/CertPathBuilder/akiExt/AKISerialNumber.java is failing - 8321410: Shenandoah: Remove ShenandoahSuspendibleWorkers flag - 8322957: Generational ZGC: Relocation selection must join the STS - 8313670: Simplify shared lib name handling code in some tests - 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests - 8323637: Capture hotspot replay files in GHA - 8324937: GHA: Avoid multiple test suites per job - 8320052: Zero: Use __atomic built-ins for atomic RMW operations - 8319382: com/sun/jdi/JdwpAllowTest.java shows failures on AIX if prefixLen of mask is larger than 32 in IPv6 case - 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if - ... and 201 more: https://git.openjdk.org/shenandoah-jdk21u/compare/375769c6...57956950 The webrev contains the conflicts with master: - merge conflicts: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=20&range=00.conflicts Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/20/files Stats: 31479 lines in 1396 files changed: 16522 ins; 6158 del; 8799 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/20.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/20/head:pull/20 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/20 From wkemper at openjdk.org Thu Feb 8 17:24:10 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 17:24:10 GMT Subject: Integrated: Merge openjdk/jdk21u-dev:master In-Reply-To: References: Message-ID: On Thu, 1 Feb 2024 14:16:49 GMT, William Kemper wrote: > Merges tag jdk-21.0.3+1 This pull request has now been integrated. Changeset: 64284052 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/64284052c4ed99eebd5a0c8d167e577c4465e13f Stats: 30110 lines in 1347 files changed: 15608 ins; 5805 del; 8697 mod Merge ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/19 From wkemper at openjdk.org Thu Feb 8 17:37:38 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 17:37:38 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.3+2 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'master' into merge-jdk-21.0.3+2 - 8325096: Test java/security/cert/CertPathBuilder/akiExt/AKISerialNumber.java is failing Backport-of: ac1cd3194910793b02e86c2c0dedaa321f137d4e - 8321410: Shenandoah: Remove ShenandoahSuspendibleWorkers flag Backport-of: 2830dd2a7d3b933fbddca64ca0ac7a91e7ab0775 - 8322957: Generational ZGC: Relocation selection must join the STS Reviewed-by: eosterlund, stefank Backport-of: ba23025cd8a9c1af37afea6444ce5ea2ff41e5af - 8313670: Simplify shared lib name handling code in some tests Reviewed-by: mdoerr, lucy Backport-of: 6dba2026d72de6a67aa0209749ded8174b088904 - 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests Reviewed-by: shade Backport-of: a03954e6c57369446ef77136966662780e4b1c4e - 8323637: Capture hotspot replay files in GHA Backport-of: c84c0ab52d5e08a693f7ad7d9a4772d8c1eeeaa8 - 8324937: GHA: Avoid multiple test suites per job Backport-of: 1aba78f2720b581f18fc2cec5e84deba6b2bcd41 - 8320052: Zero: Use __atomic built-ins for atomic RMW operations Backport-of: 020c9007f8e9cc4b46a58d7955284f43a6ac913b - 8319382: com/sun/jdi/JdwpAllowTest.java shows failures on AIX if prefixLen of mask is larger than 32 in IPv6 case Backport-of: 22642ff0aac71eceb71f6a9eebb2988a9bd5f091 - ... and 3 more: https://git.openjdk.org/shenandoah-jdk21u/compare/64284052...83cb3dc1 ------------- Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/20/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=20&range=01 Stats: 1350 lines in 66 files changed: 895 ins; 356 del; 99 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/20.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/20/head:pull/20 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/20 From wkemper at openjdk.org Thu Feb 8 18:00:26 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 18:00:26 GMT Subject: Withdrawn: 8324067: GenShen: Isolate regulator thread to generational mode In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:51:52 GMT, William Kemper wrote: > Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/shenandoah/pull/388 From kdnilsen at openjdk.org Thu Feb 8 18:25:54 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Feb 2024 18:25:54 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment There is a separate issue going that has been uncovered by testing of this PR. We have observed that VM_HandshakeAllThreads::doit() very rarely iterates very slowly over threads. This has been observed with HandShakeForDeflation causing a pause of over 280s. I will file a JBS ticket for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1934700201 From wkemper at openjdk.org Thu Feb 8 18:28:24 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 18:28:24 GMT Subject: RFR: 8325516: Shenandoah: Move heap change tracking into ShenandoahHeap Message-ID: Shenandoah sets a flag when new regions are allocated or retired. This flag currently resides in the control thread. Moving it into the heap reduces code duplication with upcoming generational mode changes. ------------- Commit messages: - Move heap changed tracking into ShenandoahHeap Changes: https://git.openjdk.org/jdk/pull/17777/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17777&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325516 Stats: 24 lines in 4 files changed: 11 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17777/head:pull/17777 PR: https://git.openjdk.org/jdk/pull/17777 From kdnilsen at openjdk.org Thu Feb 8 18:37:04 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Feb 2024 18:37:04 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment Meanwhile, I've done some analysis of the attached GC log to better understand tradeoffs between doing degenerated GC vs. Full GC when mutator experiences out-of-memory during allocation request. The command used for this run is: echo Run TradiShen tip with memory size $i with 4s customer period >&2 echo Run TradiShen tip with memory size $i with 4s customer period ~/github/jdk.2-1-2024/build/linux-x86_64-server-release/jdk/bin/java \ -XX:+UnlockExperimentalVMOptions \ -XX:+UseTransparentHugePages \ -XX:-ShenandoahPacing \ -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms$i -Xmx$i \ -XX:+UseShenandoahGC \ -Xlog:"gc*=info,ergo" \ -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \ -XX:+UnlockDiagnosticVMOptions \ -jar ~/github/heapothesys/Extremem/target/extremem-1.0-SNAPSHOT.jar \ -dInitializationDelay=45s -dDictionarySize=16000000 -dNumCustomers=28000000 \ -dNumProducts=64000 -dCustomerThreads=2000 -dCustomerPeriod=4s -dCustomerThinkTime=1s \ -dKeywordSearchCount=4 -dServerThreads=5 -dServerPeriod=5s -dProductNameLength=10 \ -dBrowsingHistoryQueueCount=5 \ -dSalesTransactionQueueCount=5 \ -dProductDescriptionLength=64 -dProductReplacementPeriod=25s -dProductReplacementCount=5 \ -dCustomerReplacementPeriod=30s -dCustomerReplacementCount=1000 -dBrowsingExpiration=1m \ -dPhasedUpdates=true \ -dPhasedUpdateInterval=60s \ -dSimulationDuration=20m -dResponseTimeMeasurements=100000 with $i equal to 27g [capture.27g.tradishen.log](https://github.com/openjdk/jdk/files/14213319/capture.27g.tradishen.log) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1934715864 From wkemper at openjdk.org Thu Feb 8 18:37:24 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 18:37:24 GMT Subject: RFR: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp Message-ID: The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. ------------- Commit messages: - Oops, picked an extra cherry - Reduce unnecessary includes from shenandoahControlThread.cpp Changes: https://git.openjdk.org/jdk/pull/17778/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17778&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325517 Stats: 21 lines in 5 files changed: 9 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17778/head:pull/17778 PR: https://git.openjdk.org/jdk/pull/17778 From shade at openjdk.org Thu Feb 8 18:41:02 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Feb 2024 18:41:02 GMT Subject: RFR: 8325516: Shenandoah: Move heap change tracking into ShenandoahHeap In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 18:24:09 GMT, William Kemper wrote: > Shenandoah sets a flag when new regions are allocated or retired. This flag currently resides in the control thread. Moving it into the heap reduces code duplication with upcoming generational mode changes. The move looks good, but I think it introduces a bug. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 837: > 835: > 836: // This is called from allocation path, and thus should be fast. > 837: if (_heap_changed.is_set()) { Um. The old code tests `is_unset`, then does `set`. This one introduces a bug: it would be set when already set. Should maybe be just `_heap_changed.try_set()`? ------------- PR Review: https://git.openjdk.org/jdk/pull/17777#pullrequestreview-1870974321 PR Review Comment: https://git.openjdk.org/jdk/pull/17777#discussion_r1483424226 From shade at openjdk.org Thu Feb 8 18:48:55 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Feb 2024 18:48:55 GMT Subject: RFR: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp In-Reply-To: References: Message-ID: <2JsPO3kyP8J1DVDCa6f0Qa0OPKaQpltEucPuIh3oXH4=.1be99f87-87dd-4eee-a131-a7661a253e92@github.com> On Thu, 8 Feb 2024 18:32:46 GMT, William Kemper wrote: > The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. Looks generally okay. The usual rule of thumb is to see if any symbols from the include are actually used in the compilation unit. If used, keep the explicit header. Check this? At least one case is obvious from reading the code, see below. Note that build might still succeed without the explicit header due to transitive header dependencies, but it is fragile and would yield build failures later. src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 47: > 45: #include "memory/metaspaceStats.hpp" > 46: #include "memory/universe.hpp" > 47: #include "runtime/atomic.hpp" `Atomic` is directly used, so the include should be left here. src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp line 46: > 44: > 45: > 46: enum StringDedupMode { I think the only use is in `ShenandoahMark`, right? Therefore, this declaration should go to `shenandoahMark.hpp`, so we do not have to include `shenandoahUtils.hpp` there? ------------- PR Review: https://git.openjdk.org/jdk/pull/17778#pullrequestreview-1870981172 PR Review Comment: https://git.openjdk.org/jdk/pull/17778#discussion_r1483430902 PR Review Comment: https://git.openjdk.org/jdk/pull/17778#discussion_r1483428174 From wkemper at openjdk.org Thu Feb 8 19:35:03 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 19:35:03 GMT Subject: RFR: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp In-Reply-To: <2JsPO3kyP8J1DVDCa6f0Qa0OPKaQpltEucPuIh3oXH4=.1be99f87-87dd-4eee-a131-a7661a253e92@github.com> References: <2JsPO3kyP8J1DVDCa6f0Qa0OPKaQpltEucPuIh3oXH4=.1be99f87-87dd-4eee-a131-a7661a253e92@github.com> Message-ID: On Thu, 8 Feb 2024 18:43:08 GMT, Aleksey Shipilev wrote: >> The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 47: > >> 45: #include "memory/metaspaceStats.hpp" >> 46: #include "memory/universe.hpp" >> 47: #include "runtime/atomic.hpp" > > `Atomic` is directly used, so the include should be left here. Yes, sorry. The branch I cherry-picked from no longer uses `Atomic` directly. I'll put it back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17778#discussion_r1483501194 From wkemper at openjdk.org Thu Feb 8 19:39:03 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 19:39:03 GMT Subject: RFR: 8325516: Shenandoah: Move heap change tracking into ShenandoahHeap In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 18:36:59 GMT, Aleksey Shipilev wrote: >> Shenandoah sets a flag when new regions are allocated or retired. This flag currently resides in the control thread. Moving it into the heap reduces code duplication with upcoming generational mode changes. > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 837: > >> 835: >> 836: // This is called from allocation path, and thus should be fast. >> 837: if (_heap_changed.is_set()) { > > Um. The old code tests `is_unset`, then does `set`. This one introduces a bug: it would be set when already set. Should maybe be just `_heap_changed.try_set()`? Yikes. Good catch. I originally had `try_set` here, but then wanted to make it the same as the original code and wrote a typo! I'll go with `try_set`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17777#discussion_r1483504596 From wkemper at openjdk.org Thu Feb 8 19:42:15 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 19:42:15 GMT Subject: RFR: 8325516: Shenandoah: Move heap change tracking into ShenandoahHeap [v2] In-Reply-To: References: Message-ID: <46dwarW2KJ0uuYqALOLjkO8uxC5zUfd9kIwabaGcXmU=.dbac851b-f6a8-4230-b030-774bebb86a5a@github.com> > Shenandoah sets a flag when new regions are allocated or retired. This flag currently resides in the control thread. Moving it into the heap reduces code duplication with upcoming generational mode changes. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Use try_set for updating heap changed flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17777/files - new: https://git.openjdk.org/jdk/pull/17777/files/c2654552..e4b5ed74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17777&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17777&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17777/head:pull/17777 PR: https://git.openjdk.org/jdk/pull/17777 From kdnilsen at openjdk.org Thu Feb 8 19:53:04 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Feb 2024 19:53:04 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment Some key observations, which are highlighted in the attached excel spreadsheet: 1. Concurrent GCs take roughly twice as long as degenerated GCs, and use half as many worker threads. (out-of-cycle degen: 3.105s, fully conc gcs: 6.44s and 6.341s) 2. The sum of degen time plus 1/2 conc gc time that precedes degen is nearly constant: 3.368s, with std dev: 0.291 3. Note that a longer concurrent GC effort is accompanied by a shorter degen effort because degen GC leverages progress already made by concurrent GC. 4. Full GC requires 50% more time than a degen GC (5.026s avg vs 3.368s avg). This is presumably because full GC compacts everything, whereas degen GC uses garbage-first heuristics to only evacuate regions that are "convenient". 5. Full GC typically reclaims 88% more garbage than conc gc and degen. This is presumably because full GC reclaims all the floating garbage for allocations that occur following the start of a concurrent GC effort. 6. When Full GC is preceded by a failed concurrent effort and then a failed degen effort, the total duration of the GC cycle is much longer than a typical degen cycle. The time for a conc/degen cycle ranges from 3.105s (for out-of-cycle degen with no concurrent phase) to 6.44s for concurrent phase without any degen. In contrast, duration of a full GC cycle ranges from 7.431 w (when 2.393s of concurrent gc effort transitions directly to Full) to 10.329s (when 2.483s of conc gc is followed by 2.636s of degen gc, which experiences bad progress and upgrades to full gc). 7. Under heavy allocation load, full GC tends to self-perpetuate. This is caused by multiple factors: 1. The very long GC duration means we're reclaiming memory less efficiently, even though each full GC yields greater available memory 2. Full GCs, on average, are yielding 1,043 MB/s of wall-clock time spent in GC compared to 931 MB/s for concurrent and degen GC. However, degen offers a higher peak yield of 1,770 MB/s compared to the peak yield of 1,181 MB/s for full GC. 3. In terms of CPU time dedicated to GC, Full GC averages 1,239 MB/s vs. 1,529 MB/s for degen. Peak performance also favors degen, with max of 2,029 MB/s vs. full GC max of 1,460 MB/s. 4. A final strike against Full GC is that it leaves the heap in a state that is very difficult to recover from. Specifically, following Full GC, there is no garbage in the heap. If we immediately trigger concurrent GC, it will be unproductive because there is no garbage to be found in the heap, and any floating garbage created following the start of concurrent GC will not be found until the next concurrent GC cycle. This is observed in the log, with GC(85) through GC(91) each upgrading to Ful [full-vs-degen.xlsx](https://github.com/openjdk/jdk/files/14213994/full-vs-degen.xlsx) l GC. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1934831931 From wkemper at openjdk.org Thu Feb 8 19:53:03 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 19:53:03 GMT Subject: RFR: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 18:32:46 GMT, William Kemper wrote: > The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. I only removed a couple of headers that were obviously not needed directly in the control thread (I think there are a couple more I can trim). I didn't try to find and declare all the direct/first-level dependencies (`log.hpp`, for example). ------------- PR Comment: https://git.openjdk.org/jdk/pull/17778#issuecomment-1934831924 From wkemper at openjdk.org Thu Feb 8 19:53:04 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 19:53:04 GMT Subject: RFR: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp In-Reply-To: <2JsPO3kyP8J1DVDCa6f0Qa0OPKaQpltEucPuIh3oXH4=.1be99f87-87dd-4eee-a131-a7661a253e92@github.com> References: <2JsPO3kyP8J1DVDCa6f0Qa0OPKaQpltEucPuIh3oXH4=.1be99f87-87dd-4eee-a131-a7661a253e92@github.com> Message-ID: On Thu, 8 Feb 2024 18:40:31 GMT, Aleksey Shipilev wrote: >> The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. > > src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp line 46: > >> 44: >> 45: >> 46: enum StringDedupMode { > > I think the only use is in `ShenandoahMark`, right? Therefore, this declaration should go to `shenandoahMark.hpp`, so we do not have to include `shenandoahUtils.hpp` there? Yes, this is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17778#discussion_r1483517109 From kdnilsen at openjdk.org Thu Feb 8 20:35:55 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Feb 2024 20:35:55 GMT Subject: RFR: 8325516: Shenandoah: Move heap change tracking into ShenandoahHeap [v2] In-Reply-To: <46dwarW2KJ0uuYqALOLjkO8uxC5zUfd9kIwabaGcXmU=.dbac851b-f6a8-4230-b030-774bebb86a5a@github.com> References: <46dwarW2KJ0uuYqALOLjkO8uxC5zUfd9kIwabaGcXmU=.dbac851b-f6a8-4230-b030-774bebb86a5a@github.com> Message-ID: On Thu, 8 Feb 2024 19:42:15 GMT, William Kemper wrote: >> Shenandoah sets a flag when new regions are allocated or retired. This flag currently resides in the control thread. Moving it into the heap reduces code duplication with upcoming generational mode changes. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Use try_set for updating heap changed flag Marked as reviewed by kdnilsen (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/17777#pullrequestreview-1871209272 From kdnilsen at openjdk.org Thu Feb 8 20:38:53 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Feb 2024 20:38:53 GMT Subject: RFR: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 18:32:46 GMT, William Kemper wrote: > The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. Marked as reviewed by kdnilsen (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/17778#pullrequestreview-1871214656 From ysr at openjdk.org Thu Feb 8 21:14:53 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Feb 2024 21:14:53 GMT Subject: RFR: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp In-Reply-To: References: Message-ID: <9vtrT2eeWQ_jnPRrVthQyPI-slrSeTTkpVG3JkNzxQo=.5c3b428a-0881-4357-9a4b-58eb8a2b4582@github.com> On Thu, 8 Feb 2024 18:32:46 GMT, William Kemper wrote: > The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. Looks good modulo Aleksey's suggestion on checking use of header file's content in compilation unit rather than relying on serendipitous (and fragile) transitive dependencies. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17778#pullrequestreview-1871265523 From ysr at openjdk.org Thu Feb 8 21:22:56 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Feb 2024 21:22:56 GMT Subject: RFR: 8325516: Shenandoah: Move heap change tracking into ShenandoahHeap [v2] In-Reply-To: <46dwarW2KJ0uuYqALOLjkO8uxC5zUfd9kIwabaGcXmU=.dbac851b-f6a8-4230-b030-774bebb86a5a@github.com> References: <46dwarW2KJ0uuYqALOLjkO8uxC5zUfd9kIwabaGcXmU=.dbac851b-f6a8-4230-b030-774bebb86a5a@github.com> Message-ID: On Thu, 8 Feb 2024 19:42:15 GMT, William Kemper wrote: >> Shenandoah sets a flag when new regions are allocated or retired. This flag currently resides in the control thread. Moving it into the heap reduces code duplication with upcoming generational mode changes. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Use try_set for updating heap changed flag LGTM; good cleanup! Minor doc suggestion. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 298: > 296: bool _gc_state_changed; > 297: ShenandoahSharedBitmap _gc_state; > 298: ShenandoahSharedFlag _heap_changed; The documentation at `has_changed()` is good, but may be a single small documentation comment here, perhaps something like: ShenandoahSharedFlag _heap_changed; // set if heap has changed since last check or may be more elaborate: // tracks if new regions have been allocated or retired since last check ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17777#pullrequestreview-1871273842 PR Review Comment: https://git.openjdk.org/jdk/pull/17777#discussion_r1483600601 From dlong at openjdk.org Thu Feb 8 21:27:07 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Feb 2024 21:27:07 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: <8HWf9TJDmMH3OM8EfOn2joR2-CnOtzaawAfuOQigB6w=.9c0489a9-79e8-43e1-b274-2c7f48b8c95b@github.com> On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 770: > 768: // Method might have been compiled since the call site was patched to > 769: // interpreted; if that is the case treat it as a miss so we can get > 770: // the call site corrected. This comment is still relevant, isn't it? src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1555: > 1553: __ far_jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub())); > 1554: > 1555: // Verified entry point must be aligned Keep this comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483603879 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483607156 From dlong at openjdk.org Thu Feb 8 21:30:59 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Feb 2024 21:30:59 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/cpu/arm/arm.ad line 880: > 878: void MachUEPNode::emit(CodeBuffer &cbuf, PhaseRegAlloc *ra_) const { > 879: C2_MacroAssembler _masm(&cbuf); > 880: __ ic_check(CodeEntryAlignment); Do we care about CodeEntryAlignment here if the old code didn't? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483610763 From wkemper at openjdk.org Thu Feb 8 21:32:29 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 21:32:29 GMT Subject: RFR: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp [v2] In-Reply-To: References: Message-ID: > The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Even more include clean up - Move StringDedupeMode into the only header that needs it ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17778/files - new: https://git.openjdk.org/jdk/pull/17778/files/fd24ad00..5212e221 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17778&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17778&range=00-01 Stats: 26 lines in 5 files changed: 10 ins; 13 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17778/head:pull/17778 PR: https://git.openjdk.org/jdk/pull/17778 From dlong at openjdk.org Thu Feb 8 21:35:01 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Feb 2024 21:35:01 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: <04ntORhQBIEdsks9kJSNyR2iqdsTfwmAFWG36z5ilOo=.6b91fe2f-3573-4e31-97fb-62ead3720487@github.com> On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/cpu/arm/sharedRuntime_arm.cpp line 831: > 829: __ jump(SharedRuntime::get_ic_miss_stub(), relocInfo::runtime_call_type, Rtemp); > 830: __ align(CodeEntryAlignment); > 831: Keep the CodeEntryAlignment and VEP comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483614619 From wkemper at openjdk.org Thu Feb 8 21:36:08 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 21:36:08 GMT Subject: RFR: 8325516: Shenandoah: Move heap change tracking into ShenandoahHeap [v3] In-Reply-To: References: Message-ID: > Shenandoah sets a flag when new regions are allocated or retired. This flag currently resides in the control thread. Moving it into the heap reduces code duplication with upcoming generational mode changes. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Improve comment for heap_changed field ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17777/files - new: https://git.openjdk.org/jdk/pull/17777/files/e4b5ed74..9208bbba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17777&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17777&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17777/head:pull/17777 PR: https://git.openjdk.org/jdk/pull/17777 From dlong at openjdk.org Thu Feb 8 21:41:00 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Feb 2024 21:41:00 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: <0fQX2S37e-tHVt-IwRk8X95k0JYUmsBYr37oobhIagc=.92e45b75-b673-4022-b752-6359ee672976@github.com> On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1360: > 1358: Register data = rax; > 1359: Register temp = LP64_ONLY(rscratch1) NOT_LP64(rbx); > 1360: It would be nice if VerifyOops check could be added back, maybe as a follow-up RFE? src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1463: > 1461: __ jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub())); > 1462: > 1463: // verified entry must be aligned for code patching. This comment still seems relevant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483619612 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483621209 From dlong at openjdk.org Thu Feb 8 21:44:05 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Feb 2024 21:44:05 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/cpu/x86/x86_32.ad line 1386: > 1384: void MachUEPNode::emit(CodeBuffer &cbuf, PhaseRegAlloc *ra_) const { > 1385: MacroAssembler masm(&cbuf); > 1386: masm.ic_check(CodeEntryAlignment); I think there may still be a problem with OptoBreakpoint messing up the alignment. I'll try to reproduce it on x64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483623693 From dlong at openjdk.org Thu Feb 8 21:47:08 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Feb 2024 21:47:08 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: <4Xcj3NXRRxEvGn3fIVpGYlZ5vCwyx7mdxaAEQ5myKKo=.aa1309ef-5289-476b-a285-306b307e1b85@github.com> On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/cpu/x86/x86_64.ad line 1487: > 1485: { > 1486: MacroAssembler masm(&cbuf); > 1487: masm.ic_check(CodeEntryAlignment); I'm concerned about OptoBreakpoint and friends messing up the alignment: https://github.com/openjdk/jdk/blob/10beb3184e14e2714cd836029680a8b2d3fd5011/src/hotspot/share/opto/output.cpp#L317 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483626164 From wkemper at openjdk.org Thu Feb 8 21:56:06 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 21:56:06 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment So, it sounds like there is not much value in going straight to a full GC? I don't think we can really know ahead of time whether or not a degenerated cycle will free up enough contiguous regions to satisfy the humongous allocation request. We could make an educated guess based on heap fragmentation and occupancy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1934992101 From dlong at openjdk.org Thu Feb 8 21:57:09 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Feb 2024 21:57:09 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/share/c1/c1_LIRAssembler.cpp line 612: > 610: // init offsets > 611: offsets()->set_value(CodeOffsets::OSR_Entry, _masm->offset()); > 612: _masm->align(CodeEntryAlignment); So we used to unconditionally align here, but never set CodeOffsets::Entry, meaning we got the default offset of 0 and the entry had to execute the alignment NOPs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483634924 From kdnilsen at openjdk.org Thu Feb 8 22:03:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Feb 2024 22:03:03 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment I ran an experiment in which I discourage full GC. Attached is the diff from this branch, and an spreadsheet summarizing the performance results. HIghlights: 1. I overrode the default ShenandoahFullGCThreshold with 64 on the command line. 2. should_degenerate_cycle() only returns false if a humongous alloc failed and the cumulative requested humongous size > the amount of humongous memory available at the most recent free-set rebuild operation, or if consecutive_degen_count > ShenandoahFullGCThreshold. (This mostly preserves existing behavior, but gives users the option of increasing ShenandoahFullGCThreshold as I did.) 3. Across almost all latency metrics, the new code performs better than original code. 4. Full GCs decreased to a maximum of 4 on any run, whereas the original max full gc was 104. Average number of full GCs decreased to ~3 from typically more than 50. 5. We do quite a few more degens and fewer successful concurrent GCs. 6. More testing should be done with more diverse workloads, and after we fix the existing problem with VM_HandshakeAllThreads::do_it(). See attached diff file and excel spreadhsheet for more detail. [discourage-full-experiment.xlsx](https://github.com/openjdk/jdk/files/14215043/discourage-full-experiment.xlsx) [discourage-full-experiment.diff.txt](https://github.com/openjdk/jdk/files/14215058/discourage-full-experiment.diff.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1934998244 PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1935000164 From wkemper at openjdk.org Thu Feb 8 22:16:04 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 22:16:04 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment Not sure I understand: > cumulative requested humongous size > the amount of humongous memory available at the most recent free-set rebuild operation Don't we already know that there isn't enough contiguous memory for _this_ request? Wouldn't the value of the accumulated failed humongous request sizes be greater than the size for the _current_ request? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1935013516 From kdnilsen at openjdk.org Thu Feb 8 22:16:04 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Feb 2024 22:16:04 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: <0WwNFVurpYG9bIC9l7TGad9s6EYD9LyYr1tERSWuMUw=.d421f22e-4979-44f9-a028-0f3c5d867cde@github.com> On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment There isn't enough memory right now. But there may have been at the end of the most recent GC. The question: Is a normal GC likely to reclaim enough contiguous memory to satisfy the humongous allocation request. If the previous normal GC was successful, then the new one is also likely to be successful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1935015624 From wkemper at openjdk.org Thu Feb 8 22:31:39 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 22:31:39 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master [v3] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.3+2 William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove uses of removed flag from generational code ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/20/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/20/files/83cb3dc1..19626567 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=20&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=20&range=01-02 Stats: 3 lines in 3 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/20.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/20/head:pull/20 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/20 From ysr at openjdk.org Thu Feb 8 22:32:03 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Feb 2024 22:32:03 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: <0WwNFVurpYG9bIC9l7TGad9s6EYD9LyYr1tERSWuMUw=.d421f22e-4979-44f9-a028-0f3c5d867cde@github.com> References: <0WwNFVurpYG9bIC9l7TGad9s6EYD9LyYr1tERSWuMUw=.d421f22e-4979-44f9-a028-0f3c5d867cde@github.com> Message-ID: On Thu, 8 Feb 2024 22:13:09 GMT, Kelvin Nilsen wrote: > There isn't enough memory right now. But there may have been at the end of the most recent GC. The question: Is a normal GC likely to reclaim enough contiguous memory to satisfy the humongous allocation request. If the previous normal GC was successful, then the new one is also likely to be successful. In other words, you are saying here that if this test passes then there is a high likelihood that the failure to allocate the humongous request at this time is because that non-humongous allocations led to this space getting fragmented (or reduced), and that the space will reappear in contiguous form as soon as the ongoing concurrent gc (albeit degenerated now?) completes without taking recourse to a full gc. Does this then lead to any policy parameter change in terms of the maintenance & preservation of contiguous regions in the next epoch between GCs? In other words, asking if this has any interactions with your changes in https://github.com/openjdk/jdk/pull/17561, potentially changing the performance equation in specific directions, and if those have been considered in the data generated above for this PR. Also wondering if the decision of full vs degenerate might also want to be driven by recent history of length of full vs degenerated, both successful & unsuccessful (especially if a map from degeneration points is manitained, although that might be overkill)? As you can tell, I am just waving my hands here, but that could also conceivably constitute a signal that informs the decision, perhaps... ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1935034421 From ysr at openjdk.org Thu Feb 8 22:35:05 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Feb 2024 22:35:05 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment In regards to: > ... that the space will reappear in contiguous form as soon as the ongoing concurrent gc (albeit degenerated now?) completes without taking recourse to a full gc. I may be misunderstanding here, but what if the failure to honor the request doesn't cause degeneration but is held up while the concurrent cycle completes, after which said space may become available, albeit penalizing this allocation request with a much longer stall, while not subjecting other threads to the pause. May be that isn't worthwhile, or would require a bigger change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1935038610 From kdnilsen at openjdk.org Thu Feb 8 22:35:05 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Feb 2024 22:35:05 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment Thanks Ramki. I agree with your "rewording" of "what I am saying". If anything, the changes proposed in https://github.com/openjdk/jdk/pull/17561 will reduce the likelihood that we experience a humongous alloc failure, and in the rare case that we do, will increase the likelihood that normal concurrent GC (with degen if necessary) will recover sufficient contiguous memory to satisfy the humongous allocation request. I do agree that there's further room for smart adaptive behavior as you suggest. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1935039077 From kdnilsen at openjdk.org Thu Feb 8 22:39:02 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Feb 2024 22:39:02 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment Ramki: what you are suggesting is what has been described as ShenandoahPacing and/or Throttling. That can avoid the need for degeneration, especially when we surge the number of worker threads, for example. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1935042994 From ysr at openjdk.org Thu Feb 8 22:43:04 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Feb 2024 22:43:04 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment > If anything, the changes proposed in #17561 will reduce the likelihood that we experience a humongous alloc failure, and in the rare case that we do, will increase the likelihood that normal concurrent GC (with degen if necessary) will recover sufficient contiguous memory to satisfy the humongous allocation request. Does it seem worthwhile then to run perf numbers with these changes along with the changes in https://github.com/openjdk/jdk/pull/17561 so as to get sharper performance numbers that include both changes (as well as each individually as you did here and in the other one). Each seems worthwhile on its own, and I am curious of the composition of the two... ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1935046203 From kdnilsen at openjdk.org Thu Feb 8 22:43:04 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Feb 2024 22:43:04 GMT Subject: RFR: 8324995: Shenandoah: Skip to full gc for humongous allocation failures [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:50:06 GMT, William Kemper wrote: >> Shenandoah degenerated cycles do not compact regions. When a humongous allocation fails, it is likely due to fragmentation which is better addressed by a full gc. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in comment Agree with testing combination of capabilities. Also want to get the handshake issue out of the picture. I'm looking at that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17638#issuecomment-1935047344 From wkemper at openjdk.org Thu Feb 8 22:59:20 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Feb 2024 22:59:20 GMT Subject: RFR: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp [v3] In-Reply-To: References: Message-ID: <76dpaDUNzMPYSTHpgL-1S-BLUzB6ajgbEpkmCG7or7U=.790f031a-a904-497c-9933-870e5888705e@github.com> > The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Try to fix zero build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17778/files - new: https://git.openjdk.org/jdk/pull/17778/files/5212e221..52272b90 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17778&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17778&range=01-02 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17778/head:pull/17778 PR: https://git.openjdk.org/jdk/pull/17778 From dlong at openjdk.org Fri Feb 9 01:34:00 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Feb 2024 01:34:00 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: <4Xcj3NXRRxEvGn3fIVpGYlZ5vCwyx7mdxaAEQ5myKKo=.aa1309ef-5289-476b-a285-306b307e1b85@github.com> References: <4Xcj3NXRRxEvGn3fIVpGYlZ5vCwyx7mdxaAEQ5myKKo=.aa1309ef-5289-476b-a285-306b307e1b85@github.com> Message-ID: On Thu, 8 Feb 2024 21:44:14 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/cpu/x86/x86_64.ad line 1487: > >> 1485: { >> 1486: MacroAssembler masm(&cbuf); >> 1487: masm.ic_check(CodeEntryAlignment); > > I'm concerned about OptoBreakpoint and friends messing up the alignment: > https://github.com/openjdk/jdk/blob/10beb3184e14e2714cd836029680a8b2d3fd5011/src/hotspot/share/opto/output.cpp#L317 It's fine, the breakpoint is insert *after* the prologue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483776166 From dlong at openjdk.org Fri Feb 9 01:41:06 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Feb 2024 01:41:06 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:08:01 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > ARM32 fixes src/hotspot/share/runtime/vmStructs.cpp line 217: > 215: volatile_nonstatic_field(CompiledICData, _speculated_klass, uintptr_t) \ > 216: nonstatic_field(CompiledICData, _itable_defc_klass, Klass*) \ > 217: nonstatic_field(CompiledICData, _itable_refc_klass, Klass*) \ I don't think it makes sense to export these fields and types until SA is changed to make use of them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1483780505 From shade at openjdk.org Fri Feb 9 07:46:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 Feb 2024 07:46:06 GMT Subject: RFR: 8325516: Shenandoah: Move heap change tracking into ShenandoahHeap [v3] In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 21:36:08 GMT, William Kemper wrote: >> Shenandoah sets a flag when new regions are allocated or retired. This flag currently resides in the control thread. Moving it into the heap reduces code duplication with upcoming generational mode changes. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Improve comment for heap_changed field Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17777#pullrequestreview-1871811738 From wkemper at openjdk.org Fri Feb 9 07:46:06 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Feb 2024 07:46:06 GMT Subject: Integrated: 8325516: Shenandoah: Move heap change tracking into ShenandoahHeap In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 18:24:09 GMT, William Kemper wrote: > Shenandoah sets a flag when new regions are allocated or retired. This flag currently resides in the control thread. Moving it into the heap reduces code duplication with upcoming generational mode changes. This pull request has now been integrated. Changeset: cc276ff0 Author: William Kemper Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/cc276ff0dfa0a568ebf0a66e0762a6de19fa6a49 Stats: 24 lines in 4 files changed: 11 ins; 11 del; 2 mod 8325516: Shenandoah: Move heap change tracking into ShenandoahHeap Reviewed-by: shade, kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/17777 From shade at openjdk.org Fri Feb 9 07:49:56 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 Feb 2024 07:49:56 GMT Subject: RFR: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp [v3] In-Reply-To: <76dpaDUNzMPYSTHpgL-1S-BLUzB6ajgbEpkmCG7or7U=.790f031a-a904-497c-9933-870e5888705e@github.com> References: <76dpaDUNzMPYSTHpgL-1S-BLUzB6ajgbEpkmCG7or7U=.790f031a-a904-497c-9933-870e5888705e@github.com> Message-ID: On Thu, 8 Feb 2024 22:59:20 GMT, William Kemper wrote: >> The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Try to fix zero build Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17778#pullrequestreview-1871818451 From eosterlund at openjdk.org Fri Feb 9 08:43:21 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 08:43:21 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v7] In-Reply-To: References: Message-ID: > ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. > > The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. > > With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. > > I have tested the changes from tier1-7, and run through full aurora performance tests. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/runtime/sharedRuntime.cpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17495/files - new: https://git.openjdk.org/jdk/pull/17495/files/6dd64b50..01733b85 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17495&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17495&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17495.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17495/head:pull/17495 PR: https://git.openjdk.org/jdk/pull/17495 From eosterlund at openjdk.org Fri Feb 9 08:43:22 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 08:43:22 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 11:52:51 GMT, Thomas Schatzl wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/share/oops/oopsHierarchy.hpp line 182: > >> 180: class Method; >> 181: class ConstantPool; >> 182: // class CHeapObj > > Not sure why this is here, but maybe also remove this line. Okay, removing. > src/hotspot/share/runtime/sharedRuntime.cpp line 1369: > >> 1367: } else { >> 1368: // Callsite is a direct call - set it to the destination method >> 1369: CompiledICLocker ml(caller_nm); > > Not sure if it makes sense to factor out the `CompiledICLocker ml(caller_nm); call in both if branches. Sure, I can do that. > src/hotspot/share/runtime/sharedRuntime.cpp line 1665: > >> 1663: >> 1664: // Check relocations for the matching call to 1) avoid false positives, >> 1665: // and 2) determine the type. > > To me the comment is confusing (probably pre-existing): There does not seem to be a result of the code checking the relocations containing the "type" of the following code. I.e. the only reason this seems to be done here is reason 1), reason 2) seems obsolete. Yeah I don't really understand what the comment is trying to say at all. I'll remove it. > src/hotspot/share/runtime/sharedRuntime.cpp line 1807: > >> 1805: } >> 1806: >> 1807: assert(iter.has_current(), "must have a reloc at java call site"); > > This assert seems to be superfluous after the `if (!iter.next())` bailout just above. Good point. Removing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484017440 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484016694 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484015431 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484015615 From eosterlund at openjdk.org Fri Feb 9 08:43:22 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 08:43:22 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v7] In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 11:09:30 GMT, Thomas Schatzl wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/runtime/sharedRuntime.cpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > > src/hotspot/share/runtime/sharedRuntime.cpp line 1837: > >> 1835: CompiledMethod* caller_nm = cb->as_compiled_method(); >> 1836: >> 1837: for (;;) { > > I think the comment in line 1826 about > > > // Transitioning IC caches may require transition stubs. If we run out > // of transition stubs, we have to drop locks and perform a safepoint > // that refills them. > > is out of date and should be removed. At least the code that generates the IC stubs below has been removed... Good catch. I'll remove the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484014998 From eosterlund at openjdk.org Fri Feb 9 08:48:05 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 08:48:05 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 12:10:45 GMT, Thomas Schatzl wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/share/code/nmethod.cpp line 2310: > >> 2308: } else { >> 2309: CompiledDirectCall::at(call_site); >> 2310: } > > Since `CompiledICLocker` does not lock anyway if it is safe (using the `is_safe` method), isn't the `if` superfluous here (and just keeping the `else` part does the same)? That is true for DefaultICProtectionBehaviour::lock, but not ZGC and Shenandoah. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484022364 From eosterlund at openjdk.org Fri Feb 9 09:05:20 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 09:05:20 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v8] In-Reply-To: References: Message-ID: > ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. > > The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. > > With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. > > I have tested the changes from tier1-7, and run through full aurora performance tests. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/code/compiledIC.hpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17495/files - new: https://git.openjdk.org/jdk/pull/17495/files/01733b85..dfdcdcc3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17495&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17495&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17495.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17495/head:pull/17495 PR: https://git.openjdk.org/jdk/pull/17495 From eosterlund at openjdk.org Fri Feb 9 09:05:21 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 09:05:21 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 12:27:26 GMT, Thomas Schatzl wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/share/code/compiledMethod.cpp line 457: > >> 455: if (clean_all || !cm->is_in_use() || cm->is_unloading() || cm->method()->code() != cm) { >> 456: cdc->set_to_clean(); >> 457: } > > Maybe a single `clean_if_nmethod_is_unloaded()` with the destination address as a parameter would avoid the code duplication with the other variant; otherwise another static helper function would be great. Good point. I'll try on a template thing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484040893 From eosterlund at openjdk.org Fri Feb 9 09:10:05 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 09:10:05 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 09:16:36 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/share/opto/output.cpp line 3416: > >> 3414: } else { >> 3415: if (!target->is_static()) { >> 3416: _code_offsets.set_value(CodeOffsets::Entry, _first_block_size - MacroAssembler::ic_check_size()); > > This looks tricky. I think it means CodeOffsets::Entry starts after the alignment padding NOPs. If that's true then the `ic_check` functions could use a comment explaining that alignment needs to come first, not last. A comment here wouldn't hurt either. Yes that's exactly it. I found that I got a less fortunate nop encoding that actually did show up as a tiny regression. It was fixed by not running the nops. I'll write a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484046000 From ysr at openjdk.org Fri Feb 9 09:13:06 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 9 Feb 2024 09:13:06 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: <0CleNsMawP-3GdzvatbX1uI1QbCerZjcFeNkBz8L75Y=.43ef158e-b3e9-4e3a-87c1-7c380c309694@github.com> On Wed, 7 Feb 2024 18:41:23 GMT, Kelvin Nilsen wrote: >> For personal clarification: when the mutator LRB needs to evacuate an object, it uses the collector set. Each mutator has three TLABS: one for mutator allocations, one for young-gen evacuations, and one for old-gen evacuations. Let me know if you think we need more documentation around this. > > (actually, the old-gen TLAB is not in single-generation Shenandoah, only in GenShen.) Thanks, I suppose when the mutator is doing evacuation as part of LRB, it's actually helping the collector. To that extent, using the collector partition for that makes total sense. I just wanted to know what the situation was. This makes sense, and existing documentation is sufficient now that this aspect of terminology is clear. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1484048898 From ysr at openjdk.org Fri Feb 9 09:16:06 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 9 Feb 2024 09:16:06 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: References: Message-ID: <0gptMILpctS90_4UgjMDtdgpxpHh4_SIzD5CmFsWmZY=.6e64c51e-afe4-4176-b8d9-d4dd432a49c6@github.com> On Wed, 7 Feb 2024 18:30:48 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 37: >> >>> 35: NotFree, // Region has been retired and is not in any free set: there is no available memory. >>> 36: Mutator, // Region is in the Mutator free set: available memory is available to mutators. >>> 37: Collector, // Region is in the Collector free set: available memory is reserved for evacuations. >> >> When mutators evacuate the target of an LRB, do they use `Mutator` or `Collector`. I assume the former? In that case, I'd say for Collector: `available memory is reserved for collector threads for evacuation`. > > actually, the collector reserve is for all evacuation, whether performed by collector worker threads or by mutator threads doing LRB handling. Thanks, makes sense! Might be worthwhile if possible to mention this here somewhere may be. For example, may be: Mutator, // available for object allocation by mutator Collector, // available for evacuations by collector (including mutator LRB) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1484052864 From ysr at openjdk.org Fri Feb 9 09:21:05 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 9 Feb 2024 09:21:05 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v6] In-Reply-To: <2-le3X33u0wR8EuyrQnG0rn2YtiMUDWydFCq0-R9U4s=.1ec26721-bb1c-4d50-894e-277aacc2170d@github.com> References: <2-le3X33u0wR8EuyrQnG0rn2YtiMUDWydFCq0-R9U4s=.1ec26721-bb1c-4d50-894e-277aacc2170d@github.com> Message-ID: On Wed, 7 Feb 2024 20:58:35 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 171: >> >>> 169: }; >>> 170: >>> 171: class ShenandoahFreeSet : public CHeapObj { >> >> It would be good to have a block comment here motivating this class. >> It seems (from looking at some of its public APIs) as if it publicly exports only the "mutator view", which I find interesting. >> >> The other partitions in `ShenandoahRegionPartition` appears to be for efficiency of the implementation in service of the public APIs for ShenandoahFreeSet. > > Thanks. I've added a block comment to describe ShenandoahFreeSet and have enhanced the comment that describes ShenandoahRegionPartition. This is great, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1484058472 From eosterlund at openjdk.org Fri Feb 9 09:32:01 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 09:32:01 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 09:25:33 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1016: > >> 1014: Register tmp1 = rscratch1; >> 1015: Register tmp2 = r10; >> 1016: > > It would be nice if we could still call `verify_oop(receiver)` here, but I see that would complicate `ic_check_size()`. Yeah I thought the same. As it's important for correctness that ic_check_size is accurate, I was hoping to have as few different modes in it as possible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484070401 From eosterlund at openjdk.org Fri Feb 9 09:32:02 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 09:32:02 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v8] In-Reply-To: <8HWf9TJDmMH3OM8EfOn2joR2-CnOtzaawAfuOQigB6w=.9c0489a9-79e8-43e1-b274-2c7f48b8c95b@github.com> References: <8HWf9TJDmMH3OM8EfOn2joR2-CnOtzaawAfuOQigB6w=.9c0489a9-79e8-43e1-b274-2c7f48b8c95b@github.com> Message-ID: <6x7p-prxRRkFH8IRVsD3Ie5pp_Vvwk5R175c3tdE02s=.b7cbc61c-e4bb-4a32-a30f-1f2589f3027a@github.com> On Thu, 8 Feb 2024 21:21:59 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/code/compiledIC.hpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 770: > >> 768: // Method might have been compiled since the call site was patched to >> 769: // interpreted; if that is the case treat it as a miss so we can get >> 770: // the call site corrected. > > This comment is still relevant, isn't it? Unfortunately it is. I think it should be done in the subsequent patch_callers_callsite that has this exact same purpose for code dispatching through the VEP. But I think it's outside of scope for this PR to change that. I'm putting the comment back. > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1555: > >> 1553: __ far_jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub())); >> 1554: >> 1555: // Verified entry point must be aligned > > Keep this comment? Yeah, will do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484067761 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484070144 From eosterlund at openjdk.org Fri Feb 9 09:37:10 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 09:37:10 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: <0fQX2S37e-tHVt-IwRk8X95k0JYUmsBYr37oobhIagc=.92e45b75-b673-4022-b752-6359ee672976@github.com> References: <0fQX2S37e-tHVt-IwRk8X95k0JYUmsBYr37oobhIagc=.92e45b75-b673-4022-b752-6359ee672976@github.com> Message-ID: On Thu, 8 Feb 2024 21:37:17 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1360: > >> 1358: Register data = rax; >> 1359: Register temp = LP64_ONLY(rscratch1) NOT_LP64(rbx); >> 1360: > > It would be nice if VerifyOops check could be added back, maybe as a follow-up RFE? Good idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484075750 From eosterlund at openjdk.org Fri Feb 9 09:43:09 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 09:43:09 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v8] In-Reply-To: <0fQX2S37e-tHVt-IwRk8X95k0JYUmsBYr37oobhIagc=.92e45b75-b673-4022-b752-6359ee672976@github.com> References: <0fQX2S37e-tHVt-IwRk8X95k0JYUmsBYr37oobhIagc=.92e45b75-b673-4022-b752-6359ee672976@github.com> Message-ID: On Thu, 8 Feb 2024 21:38:37 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/code/compiledIC.hpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > > src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1463: > >> 1461: __ jump(RuntimeAddress(SharedRuntime::get_ic_miss_stub())); >> 1462: >> 1463: // verified entry must be aligned for code patching. > > This comment still seems relevant. Good point, resurrecting the comment. > src/hotspot/share/c1/c1_LIRAssembler.cpp line 612: > >> 610: // init offsets >> 611: offsets()->set_value(CodeOffsets::OSR_Entry, _masm->offset()); >> 612: _masm->align(CodeEntryAlignment); > > So we used to unconditionally align here, but never set CodeOffsets::Entry, meaning we got the default offset of 0 and the entry had to execute the alignment NOPs. Yes. I found those nops to sometimes be expensive. So in the new model, it's a bug for the VEP to not already be aligned, from the UEP, so we can skip executing the padding nops. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484078846 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484081379 From eosterlund at openjdk.org Fri Feb 9 09:43:09 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 09:43:09 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: <4Xcj3NXRRxEvGn3fIVpGYlZ5vCwyx7mdxaAEQ5myKKo=.aa1309ef-5289-476b-a285-306b307e1b85@github.com> Message-ID: <9TObmpq_n39x7PFzb_jJYVvnAfGTcgiZio-GjlsVR7k=.4f4059b2-fe0e-4165-99b6-9604ae9611cc@github.com> On Fri, 9 Feb 2024 01:31:25 GMT, Dean Long wrote: >> src/hotspot/cpu/x86/x86_64.ad line 1487: >> >>> 1485: { >>> 1486: MacroAssembler masm(&cbuf); >>> 1487: masm.ic_check(CodeEntryAlignment); >> >> I'm concerned about OptoBreakpoint and friends messing up the alignment: >> https://github.com/openjdk/jdk/blob/10beb3184e14e2714cd836029680a8b2d3fd5011/src/hotspot/share/opto/output.cpp#L317 > > It's fine, the breakpoint is insert *after* the prologue. Yeah, that did have me confused as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484079876 From eosterlund at openjdk.org Fri Feb 9 09:43:10 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 09:43:10 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 01:38:00 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/share/runtime/vmStructs.cpp line 217: > >> 215: volatile_nonstatic_field(CompiledICData, _speculated_klass, uintptr_t) \ >> 216: nonstatic_field(CompiledICData, _itable_defc_klass, Klass*) \ >> 217: nonstatic_field(CompiledICData, _itable_refc_klass, Klass*) \ > > I don't think it makes sense to export these fields and types until SA is changed to make use of them. Good point. I thought at least Graal needs them, but they have their own VM structs file where they are exposed. I'll remove this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484082651 From eosterlund at openjdk.org Fri Feb 9 09:56:10 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 09:56:10 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 09:18:20 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/cpu/aarch64/aarch64.ad line 2224: > >> 2222: // This is the unverified entry point. >> 2223: C2_MacroAssembler _masm(&cbuf); >> 2224: __ ic_check(CodeEntryAlignment); > > I'm not sure we want to increase the alignement to CodeEntryAlignment here. I believe C2 already aligns the root block to CodeEntryAlignment. @theRealAph, what do you think? I can change it to InteriorEntryAlignment which is what the VEP block is aligned to. I think it's still good that the root block is aligned to CodeEntryAlignment though. I heard some HW sheds some tears when their instruction cache lines also contain data that doesn't correspond to any actual instructions. But we can definitely align to InteriorEntryAlignment here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484096298 From rehn at openjdk.org Fri Feb 9 13:32:06 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 9 Feb 2024 13:32:06 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: <2digqyA8oO2r8CnmyFKAjlz6oYo_ReR1zQR-1TfkcOE=.b82a146a-4705-4afc-a53c-7b74448d3e11@github.com> On Fri, 9 Feb 2024 09:53:29 GMT, Erik ?sterlund wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 2224: >> >>> 2222: // This is the unverified entry point. >>> 2223: C2_MacroAssembler _masm(&cbuf); >>> 2224: __ ic_check(CodeEntryAlignment); >> >> I'm not sure we want to increase the alignement to CodeEntryAlignment here. I believe C2 already aligns the root block to CodeEntryAlignment. @theRealAph, what do you think? > > I can change it to InteriorEntryAlignment which is what the VEP block is aligned to. I think it's still good that the root block is aligned to CodeEntryAlignment though. I heard some HW sheds some tears when their instruction cache lines also contain data that doesn't correspond to any actual instructions. But we can definitely align to InteriorEntryAlignment here. To add to what Erik is saying here, we have notice that some CPU's have L1I and L1D exclusive, a cache-line can just be in of them. So you can end up with it bouncing between them. Secondly, if an I-fetch gets data not intended to be decoded, the cpu can get upset in some cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484319136 From wkemper at openjdk.org Fri Feb 9 14:15:21 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Feb 2024 14:15:21 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Merges tag jdk-23+9 ------------- Commit messages: - 8325304: Several classes in java.util.jar and java.util.zip don't specify the behaviour for null arguments - 8325367: Rename nsk_list.h - 8323681: SA PointerFinder code should support G1 - 8325189: Enable this-escape javac warning in java.base - 8325302: Files.move(REPLACE_EXISTING) throws NoSuchFileException on deleted target - 8325268: Add policy statement to langtools makefiles concerning warnings - 8325109: Sort method modifiers in canonical order - 8324881: ObjectSynchronizer::inflate(Thread* current...) is invoked for non-current thread - 8325416: Parallel: Refactor CheckForUnmarkedOops - 8325306: Rename static huge pages to explicit huge pages - ... and 77 more: https://git.openjdk.org/shenandoah/compare/5b9b176c...1fb9e3d6 The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/shenandoah/pull/390/files Stats: 8811 lines in 464 files changed: 5233 ins; 1660 del; 1918 mod Patch: https://git.openjdk.org/shenandoah/pull/390.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/390/head:pull/390 PR: https://git.openjdk.org/shenandoah/pull/390 From eosterlund at openjdk.org Fri Feb 9 14:24:09 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 14:24:09 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 09:21:00 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 295: > >> 293: // inline cache check; done before the frame is built. >> 294: int LIR_Assembler::check_icache() { >> 295: return __ ic_check(CodeEntryAlignment); > > Do we really want to remove the optimization that skips alignment for small methods? It's not obvious to me how likely this optimization is to kick in, and if it does, how many bytes it really saves. It optimizes accessor methods, when the UEP can be squeezed down to 4 instructions. The previous inline_cache_check was >= 2 instructions, there is a jump to skip the far jump, and the far_jump is >= 1 instructions. So it seems like it would kick in when far jumps can just branch and the compressed class encoding is simple enough. However, with the new ic_check we always have at least 5 instructions, sometimes 7. So it seemed to me like the intended optimization wouldn't apply any longer anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484387445 From eosterlund at openjdk.org Fri Feb 9 14:30:11 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 14:30:11 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: Message-ID: <0sYzdHTXEglk5fD3YdSK_GOT27U-hcs5ybjFONloiLQ=.6bb975f7-411c-4cbf-9dd9-db5b8a4e7fcb@github.com> On Thu, 8 Feb 2024 21:28:26 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> ARM32 fixes > > src/hotspot/cpu/arm/arm.ad line 880: > >> 878: void MachUEPNode::emit(CodeBuffer &cbuf, PhaseRegAlloc *ra_) const { >> 879: C2_MacroAssembler _masm(&cbuf); >> 880: __ ic_check(CodeEntryAlignment); > > Do we care about CodeEntryAlignment here if the old code didn't? Perhaps not, but it seems weird not to align. But maybe I should use InteriorEntryAlignment here, so it's symmetric to AArch64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484390940 From eosterlund at openjdk.org Fri Feb 9 14:30:12 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 14:30:12 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v8] In-Reply-To: <04ntORhQBIEdsks9kJSNyR2iqdsTfwmAFWG36z5ilOo=.6b91fe2f-3573-4e31-97fb-62ead3720487@github.com> References: <04ntORhQBIEdsks9kJSNyR2iqdsTfwmAFWG36z5ilOo=.6b91fe2f-3573-4e31-97fb-62ead3720487@github.com> Message-ID: <9qAWSeejCa2vE1t1zu-cegZbEGOuZKNWod0wgXGMh-A=.74bed6a1-30b8-464e-a779-2bd73b7e7ee1@github.com> On Thu, 8 Feb 2024 21:32:22 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/code/compiledIC.hpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > > src/hotspot/cpu/arm/sharedRuntime_arm.cpp line 831: > >> 829: __ jump(SharedRuntime::get_ic_miss_stub(), relocInfo::runtime_call_type, Rtemp); >> 830: __ align(CodeEntryAlignment); >> 831: > > Keep the CodeEntryAlignment and VEP comment? Okay, I will fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484394428 From eosterlund at openjdk.org Fri Feb 9 14:35:43 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 14:35:43 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints In-Reply-To: References: <9GolX3m7SkG4Fs0KTN5qMRxVK47eAhPLdmlaO3oGSKc=.736c3053-abc5-45ce-bd54-85c4f70c3fc9@github.com> Message-ID: On Mon, 29 Jan 2024 09:38:52 GMT, Thomas Schatzl wrote: >> On linux, the time for "Purge Unlinked NMethods" goes down when I comment out `delete ic->data();` and ignore the memory leak. (MacOS seems to be ok with it.) >> Adding trace code to `purge_ic_callsites` shows that we often have 0 or 2 ICData instances, sometimes up to 30 ones. >> It would be good to think a bit about the allocation scheme. Some ideas would be >> >> - Allocate ICData in an array per nmethod instead of individually. That should help to some degree and also improve data locality (and hence cache efficiency). Would also save iterating over the relocations when purging unlinked NMethods. It's not very complex. >> - Instead of freeing ICData instances, we could enqueue them and either reuse or free them during a concurrent phase. This may be a bit complicated. Not sure if it's worth it. >> - Allocate in Metaspace? > >> On linux, the time for "Purge Unlinked NMethods" goes down when I comment out delete ic->data(); and ignore the memory leak. (MacOS seems to be ok with it.) >>Adding trace code to purge_ic_callsites shows that we often have 0 or 2 ICData instances, sometimes up to 30 ones. >>It would be good to think a bit about the allocation scheme. Some ideas would be > >> Allocate ICData in an array per nmethod instead of individually. That should help to some degree and also improve data locality (and hence cache efficiency). Would also save iterating over the relocations when purging unlinked NMethods. It's not very complex. >> Instead of freeing ICData instances, we could enqueue them and either reuse or free them during a concurrent phase. This may be a bit complicated. Not sure if it's worth it. >> Allocate in Metaspace? > > Sorry for being unresponsive for a bit. > > Yes, the issue is the new `delete ic->data()`; but also the iteration over the relocinfo here is almost as expensive in my tests. > > So the idea to allocate ICData in a per nmethod basis (and actually some other existing C heap allocations that are also `delete`d in this phase) seems the most promising to me. > > Also came up with the other suggestions, but I think that first one seems best to me at first glance. I did not really like the second because enqueuing adds another indirection for first gathering all of them and then separately free them. > > Metaspace is something I do not know that well to comment on that option. > > I am open to moving this improvement, if it is not easy to do, into a separate CR. Thanks for the reviews @tschatzl and @dean-long. I have pushed changes reflecting your feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1936040466 From eosterlund at openjdk.org Fri Feb 9 14:35:43 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 9 Feb 2024 14:35:43 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v9] In-Reply-To: References: Message-ID: > ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. > > The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. > > With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. > > I have tested the changes from tier1-7, and run through full aurora performance tests. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Feedback from Dean and Thomas ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17495/files - new: https://git.openjdk.org/jdk/pull/17495/files/dfdcdcc3..08c146b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17495&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17495&range=07-08 Stats: 71 lines in 17 files changed: 37 ins; 26 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/17495.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17495/head:pull/17495 PR: https://git.openjdk.org/jdk/pull/17495 From wkemper at openjdk.org Fri Feb 9 16:56:05 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Feb 2024 16:56:05 GMT Subject: Integrated: 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 18:32:46 GMT, William Kemper wrote: > The control thread used to run much more of the cycle directly. This code was all factored out into different classes, but many of the vestigial headers remained. Removing these improves compilation times and makes maintenance easier. This pull request has now been integrated. Changeset: 4a3a38d1 Author: William Kemper Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/4a3a38d1b71a4acc780a6d9802c076d750541714 Stats: 28 lines in 6 files changed: 10 ins; 15 del; 3 mod 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp Reviewed-by: shade, kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/17778 From wkemper at openjdk.org Fri Feb 9 18:48:34 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Feb 2024 18:48:34 GMT Subject: RFR: Merge openjdk/jdk:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-23+9 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/390/files - new: https://git.openjdk.org/shenandoah/pull/390/files/1fb9e3d6..1fb9e3d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=390&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=390&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/390.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/390/head:pull/390 PR: https://git.openjdk.org/shenandoah/pull/390 From wkemper at openjdk.org Fri Feb 9 18:48:35 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Feb 2024 18:48:35 GMT Subject: RFR: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 14:09:22 GMT, William Kemper wrote: > Merges tag jdk-23+9 Test failure looks like an infrastructure problem. ------------- PR Comment: https://git.openjdk.org/shenandoah/pull/390#issuecomment-1936431251 From wkemper at openjdk.org Fri Feb 9 18:48:35 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Feb 2024 18:48:35 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 14:09:22 GMT, William Kemper wrote: > Merges tag jdk-23+9 This pull request has now been integrated. Changeset: 710fde73 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/710fde73606e2e2f009a04d95d4c92411baaf559 Stats: 8811 lines in 464 files changed: 5233 ins; 1660 del; 1918 mod Merge ------------- PR: https://git.openjdk.org/shenandoah/pull/390 From wkemper at openjdk.org Fri Feb 9 19:48:14 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Feb 2024 19:48:14 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs Message-ID: Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. ------------- Commit messages: - Simplify handling/reporting of implicit/explicit requested GCs Changes: https://git.openjdk.org/jdk/pull/17795/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17795&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325574 Stats: 109 lines in 5 files changed: 44 ins; 42 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/17795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17795/head:pull/17795 PR: https://git.openjdk.org/jdk/pull/17795 From wkemper at openjdk.org Fri Feb 9 19:51:43 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Feb 2024 19:51:43 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode Message-ID: Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. ------------- Commit messages: - Merge remote-tracking branch 'shenandoah/master' into isolate-regulator-thread - Clean up includes - Remove unnecessary mode checks - Remove non-generational functionality from regulator thread - Move heap changed tracking from control thread to heap - Clean up includes and headers - Factor allocation failure handling methods into common base class - Merge remote-tracking branch 'shenandoah/master' into isolate-regulator-thread - Merge remote-tracking branch 'shenandoah/master' into isolate-control-thread - Add missing precompiled header - ... and 11 more: https://git.openjdk.org/shenandoah/compare/710fde73...d1ddca0e Changes: https://git.openjdk.org/shenandoah/pull/391/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324067 Stats: 2352 lines in 20 files changed: 1464 ins; 754 del; 134 mod Patch: https://git.openjdk.org/shenandoah/pull/391.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/391/head:pull/391 PR: https://git.openjdk.org/shenandoah/pull/391 From dlong at openjdk.org Fri Feb 9 21:31:08 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Feb 2024 21:31:08 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v9] In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 14:35:43 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Feedback from Dean and Thomas src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1554: > 1552: __ far_jump(RuntimeAddress(SharedRuntime::get_handle_wrong_method_stub())); > 1553: > 1554: // Verified entry point must be aligned I think this comment should come around line 1539. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484819672 From dlong at openjdk.org Fri Feb 9 21:36:16 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Feb 2024 21:36:16 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v9] In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 14:35:43 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Feedback from Dean and Thomas src/hotspot/cpu/arm/sharedRuntime_arm.cpp line 823: > 821: __ ic_check(CodeEntryAlignment /* end_alignment */); > 822: > 823: int vep_offset = __ pc() - start; Maybe add back `// Verified entry point` above this line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484823504 From ysr at openjdk.org Fri Feb 9 21:44:09 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 9 Feb 2024 21:44:09 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v10] In-Reply-To: References: Message-ID: <4Dz1zs0mjWUXLLB_vF5qEXlgpsj-dMxsTy00dwYwiDs=.6115ebe5-984d-4650-8569-5fb17887468e@github.com> On Wed, 7 Feb 2024 21:25:12 GMT, Kelvin Nilsen wrote: >> Several objectives: >> 1. Reduce humongous allocation failures by segregating regular regions from humongous regions >> 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB >> 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations >> 4. Treat collector reserves as available for Mutator allocations after evacuation completes >> 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah >> >> On internal performance pipelines, this change shows: >> >> 1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions. >> 2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%). This system CPU time increase appears to correlate with increased page faults and/or rss. >> 3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood) >> 4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs) >> >> Shenandoah >> ------------------------------------------------------------------------------------------------------- >> +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000 >> Control: 819938.875 (+/-5724.56 ) 40 >> Test: 2185552.625 (+/-26378.64 ) 20 >> >> +166.16% scimark.sparse.large/rss_max p=0.00000 >> Control: 3285226.375 (+/-22812.93 ) 40 >> Test: 8743881.500 (+/-104906.69 ) 20 >> >> +164.78% sunflow/cpu_system p=0.00000 >> Control: 1.280s (+/- 0.10s ) 40 >> Test: 3.390s (+/- 0.13s ) 20 >> >> +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000 >> Control: 3.259 (+/- 1.46 ) 33 >> Test: 8.125 (+/- 2.05 ) 20 >> >> +143.75% pmd/major_page_fault_count p=0.03622 >> Control: 1.000 (+/- 0.00 ) 40 >> Test: 2.438 (+/- 2.59 ) 20 >> >> +80.22% lusearch/minor_page_fault_count p=0.00000 >> Control: 2043930.938 (+/-4777.14 ) 40 >> Test: 3683477.625 (+/-5650.29 ) 20 >> >> +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000 >> Control: 697899.156 (+/-3457.82 ) 40 >> Test: 1050363.812 (+/-175... > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Respond to review feedback Thanks; looking forward to any perf improvements from this! Would be great to get some feedback also from either @shipilev or @rkennke . Thanks! ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17561#pullrequestreview-1873239068 PR Comment: https://git.openjdk.org/jdk/pull/17561#issuecomment-1936642061 From dlong at openjdk.org Fri Feb 9 22:05:13 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Feb 2024 22:05:13 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: <2digqyA8oO2r8CnmyFKAjlz6oYo_ReR1zQR-1TfkcOE=.b82a146a-4705-4afc-a53c-7b74448d3e11@github.com> References: <2digqyA8oO2r8CnmyFKAjlz6oYo_ReR1zQR-1TfkcOE=.b82a146a-4705-4afc-a53c-7b74448d3e11@github.com> Message-ID: On Fri, 9 Feb 2024 13:29:41 GMT, Robbin Ehn wrote: > Secondly, if an I-fetch gets data not intended to be decoded, the cpu can get upset in some cases. I've heard that in the past, and always wondered, does it matter if the data is "reachable"? What if it's on the other side of an unconditional branch? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484846150 From ysr at openjdk.org Fri Feb 9 22:17:08 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 9 Feb 2024 22:17:08 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 19:43:33 GMT, William Kemper wrote: > Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. LGTM. src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp line 104: > 102: > 103: bool is_implicit_gc(GCCause::Cause cause) { > 104: return !is_explicit_gc(cause) && Since it's less frequent, you can move this first clause last. May be it doesn't matter :-) src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp line 51: > 49: size_t _alloc_failure_degenerated_upgrade_to_full; > 50: size_t _alloc_failure_full; > 51: size_t _collection_causes[GCCause::_last_gc_cause]; `_collection_cause_counts[]` ? src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 83: > 81: bool alloc_failure_pending = _alloc_failure_gc.is_set(); > 82: bool is_gc_requested = _gc_requested.is_set(); > 83: GCCause::Cause requested_gc_cause = _requested_gc_cause; Let's const this variable, as well as variables default_mode and default_cause above? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17795#pullrequestreview-1873247396 PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1484843832 PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1484852331 PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1484835633 From wkemper at openjdk.org Fri Feb 9 22:26:33 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Feb 2024 22:26:33 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Pull in recent changes to control thread from upstream. ------------- Commit messages: - Merge openjdk/jdk:master - 8319578: Few java/lang/instrument ignore test.java.opts and accept test.vm.opts only - 8316460: 4 javax/management tests ignore VM flags - 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root - 8325038: runtime/cds/appcds/ProhibitedPackage.java can fail with UseLargePages - 8325203: System.exit(0) kills the launched 3rd party application - 8325264: two compiler/intrinsics/float16 tests fail after JDK-8324724 - 8325517: Shenandoah: Reduce unnecessary includes from shenandoahControlThread.cpp - 8325563: Remove unused Space::is_in - 8325551: Remove unused obj_is_alive and block_start in Space - ... and 23 more: https://git.openjdk.org/shenandoah/compare/710fde73...3e8868a4 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=393&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=393&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/393/files Stats: 3012 lines in 118 files changed: 2096 ins; 367 del; 549 mod Patch: https://git.openjdk.org/shenandoah/pull/393.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/393/head:pull/393 PR: https://git.openjdk.org/shenandoah/pull/393 From wkemper at openjdk.org Fri Feb 9 22:54:18 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Feb 2024 22:54:18 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 22:20:56 GMT, William Kemper wrote: > Pull in recent changes to control thread from upstream. This pull request has now been integrated. Changeset: 8f4e6e22 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/8f4e6e226de7cb08f60bfd8dbbede466463d5b9d Stats: 3012 lines in 118 files changed: 2096 ins; 367 del; 549 mod Merge ------------- PR: https://git.openjdk.org/shenandoah/pull/393 From kdnilsen at openjdk.org Fri Feb 9 23:35:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 9 Feb 2024 23:35:03 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 19:43:33 GMT, William Kemper wrote: > Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. Marked as reviewed by kdnilsen (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/17795#pullrequestreview-1873329669 From wkemper at openjdk.org Sat Feb 10 00:16:22 2024 From: wkemper at openjdk.org (William Kemper) Date: Sat, 10 Feb 2024 00:16:22 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v2] In-Reply-To: References: Message-ID: > Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge branch 'shenandoah-master' into isolate-regulator-thread - Merge remote-tracking branch 'shenandoah/master' into isolate-regulator-thread - Clean up includes - Remove unnecessary mode checks - Remove non-generational functionality from regulator thread - Move heap changed tracking from control thread to heap - Clean up includes and headers - Factor allocation failure handling methods into common base class - Merge remote-tracking branch 'shenandoah/master' into isolate-regulator-thread - Merge remote-tracking branch 'shenandoah/master' into isolate-control-thread - ... and 12 more: https://git.openjdk.org/shenandoah/compare/8f4e6e22...6915cd07 ------------- Changes: https://git.openjdk.org/shenandoah/pull/391/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=01 Stats: 2311 lines in 17 files changed: 1449 ins; 731 del; 131 mod Patch: https://git.openjdk.org/shenandoah/pull/391.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/391/head:pull/391 PR: https://git.openjdk.org/shenandoah/pull/391 From wkemper at openjdk.org Sat Feb 10 00:16:22 2024 From: wkemper at openjdk.org (William Kemper) Date: Sat, 10 Feb 2024 00:16:22 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v2] In-Reply-To: References: Message-ID: On Sat, 10 Feb 2024 00:13:45 GMT, William Kemper wrote: >> Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'shenandoah-master' into isolate-regulator-thread > - Merge remote-tracking branch 'shenandoah/master' into isolate-regulator-thread > - Clean up includes > - Remove unnecessary mode checks > - Remove non-generational functionality from regulator thread > - Move heap changed tracking from control thread to heap > - Clean up includes and headers > - Factor allocation failure handling methods into common base class > - Merge remote-tracking branch 'shenandoah/master' into isolate-regulator-thread > - Merge remote-tracking branch 'shenandoah/master' into isolate-control-thread > - ... and 12 more: https://git.openjdk.org/shenandoah/compare/8f4e6e22...6915cd07 src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 1: > 1: /* This file is largely reverted to the upstream's version. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 1: > 1: /* The genshen version of shenandoahControlThread.cpp has been renamed to shenandoahGenerationalControlThread.cpp. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/391#discussion_r1484904943 PR Review Comment: https://git.openjdk.org/shenandoah/pull/391#discussion_r1484905584 From dlong at openjdk.org Sat Feb 10 00:38:15 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 10 Feb 2024 00:38:15 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v9] In-Reply-To: References: Message-ID: <2zDHeeRc4pWQetiIzFMR0JwjZ5ExX7_OV8Yx9G4o_mg=.066b11a8-6a49-455e-a985-3fb5de974480@github.com> On Fri, 9 Feb 2024 14:35:43 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Feedback from Dean and Thomas If I just look at the new code, everything looks very reasonable. I tried to compare the new code to the old code, but quickly gave up. Could you explain why opt_virtual can now be a direct call and CompiledIC is now only for virtual? It seems like we could have done that even with the old code. Also, why don't we have to check for method->is_old() anymore? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1936773523 From wkemper at openjdk.org Sat Feb 10 00:39:21 2024 From: wkemper at openjdk.org (William Kemper) Date: Sat, 10 Feb 2024 00:39:21 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v3] In-Reply-To: References: Message-ID: > Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove unused code (design changed in upstream patch) ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/391/files - new: https://git.openjdk.org/shenandoah/pull/391/files/6915cd07..ce189e7b Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=01-02 Stats: 121 lines in 3 files changed: 0 ins; 121 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/391.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/391/head:pull/391 PR: https://git.openjdk.org/shenandoah/pull/391 From dlong at openjdk.org Sat Feb 10 00:51:09 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 10 Feb 2024 00:51:09 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v9] In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 14:35:43 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Feedback from Dean and Thomas src/hotspot/share/code/compiledIC.cpp line 347: > 345: void CompiledDirectCall::set(const methodHandle& callee_method) { > 346: CompiledMethod* code = callee_method->code(); > 347: CompiledMethod* caller = CodeCache::find_compiled(instruction_address()); Instead of doing the slow `find_compiled` unconditionally, I think it would be better to check `is_interpreted_call` first. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484915701 From dlong at openjdk.org Sat Feb 10 01:02:07 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 10 Feb 2024 01:02:07 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v9] In-Reply-To: References: Message-ID: <52h64zKHV_1xUjCJ_7cKHvTRwIFudtKdAWo6bv0T5_U=.ac193da2-837f-489b-8ec7-ea883b5c13c8@github.com> On Fri, 9 Feb 2024 14:35:43 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Feedback from Dean and Thomas src/hotspot/share/code/nmethod.cpp line 1248: > 1246: if (nm != nullptr) { > 1247: // Verify that inline caches pointing to bad nmethods are clean > 1248: if (!nm->is_in_use() || nm->is_unloading()) { Please explain this change. It's not obvious. src/hotspot/share/gc/x/xUnload.cpp line 104: > 102: > 103: virtual bool is_safe(CompiledMethod* method) { > 104: if (SafepointSynchronize::is_at_safepoint() || method->is_unloading()) { Please explain why is_unloading() is needed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484917912 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484918466 From dlong at openjdk.org Sat Feb 10 01:11:03 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 10 Feb 2024 01:11:03 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v9] In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 14:35:43 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Feedback from Dean and Thomas src/hotspot/share/runtime/sharedRuntime.cpp line 1774: > 1772: > 1773: CodeBlob* cb = CodeCache::find_blob(caller_pc); > 1774: if (cb == nullptr || !cb->is_compiled() || !callee->is_in_use() || callee->is_unloading()) { Checking for is_in_use() is just an optimization, right? src/hotspot/share/runtime/sharedRuntime.cpp line 1808: > 1806: > 1807: CompiledDirectCall* callsite = CompiledDirectCall::before(return_pc); > 1808: callsite->set_to_clean(); Clever way to get rid of should_fixup_call_destination(), something I've wanted to do! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484920202 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1484920628 From duke at openjdk.org Sat Feb 10 01:55:18 2024 From: duke at openjdk.org (duke) Date: Sat, 10 Feb 2024 01:55:18 GMT Subject: Withdrawn: Make use of nanoseconds for GC times In-Reply-To: References: Message-ID: On Thu, 2 Dec 2021 23:34:10 GMT, David Alvarez wrote: > In multiple places for hotspot management the resolution used for times is milliseconds. With new collectors getting into sub-millisecond pause times, this resolution is not enough. > > This change moves internal values in LastGcStat to use milliseconds. GcInfo is still reporting the values in milliseconds for compatibility reasons This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/shenandoah/pull/102 From duke at openjdk.org Sat Feb 10 01:55:16 2024 From: duke at openjdk.org (duke) Date: Sat, 10 Feb 2024 01:55:16 GMT Subject: Withdrawn: Unconditional conditional card marking In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 15:32:43 GMT, Aleksey Shipilev wrote: > For the overwhelming majority of current systems, it makes little sense to run without conditional card marks enabled. G1, for example, makes its card marks unconditional. In other words, G1 does not respond to `UseCondCardMark`. This also simplifies code, eliminates one additional testing configuration, and provides safety for the cases where `UseCondCardMark` is accidentally disabled. > > Additional testing: > - [x] macos-aarch64-server-fastdebug, `hotspot_gc_shenandoah` This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/shenandoah/pull/264 From rehn at openjdk.org Mon Feb 12 09:55:08 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 12 Feb 2024 09:55:08 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v6] In-Reply-To: References: <2digqyA8oO2r8CnmyFKAjlz6oYo_ReR1zQR-1TfkcOE=.b82a146a-4705-4afc-a53c-7b74448d3e11@github.com> Message-ID: On Fri, 9 Feb 2024 22:02:47 GMT, Dean Long wrote: > > Secondly, if an I-fetch gets data not intended to be decoded, the cpu can get upset in some cases. > > I've heard that in the past, and always wondered, does it matter if the data is "reachable"? What if it's on the other side of an unconditional branch? I don't think it's an guarantee. E.g. an uncond branch may be encoded with multiple instruction, so it might be seen as an indirect jump. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1485944023 From shade at openjdk.org Mon Feb 12 10:36:04 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Feb 2024 10:36:04 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs In-Reply-To: References: Message-ID: <55EE8zv7lqKgRb7-FVnPrCOh_TlW81q9jZz5Khqldmc=.5fc1f072-39bf-4c87-957c-ca400bb9ac2c@github.com> On Fri, 9 Feb 2024 19:43:33 GMT, William Kemper wrote: > Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. The simplification in `ShenandoahControlThread::run_service` looks very nice. I think we can make the code more straight-forward: src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp line 98: > 96: } > 97: > 98: bool is_explicit_gc(GCCause::Cause cause) { There is `ShenandoahControlThread::is_explicit_gc` too. I think we should have a shared definition for this somewhere. Maybe `ShenandoahControlThread` should delegate to `ShenandoahCollectorPolicy` then? src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp line 138: > 136: out->cr(); > 137: out->print_cr(SIZE_FORMAT_W(5) " Successful Concurrent GCs (%.2f%%)", _success_concurrent_gcs, percent_of(_success_concurrent_gcs, completed_gcs)); > 138: if (ExplicitGCInvokesConcurrent) { Instead of relying on flags here, should we explicitly (pun intended) record `_explicit_collection_causes` and `_implicit_collection_causes`? src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 385: > 383: bool ShenandoahControlThread::should_run_full_gc(GCCause::Cause cause) { > 384: return is_explicit_gc(cause) ? !ExplicitGCInvokesConcurrent : !ShenandoahImplicitGCInvokesConcurrent; > 385: } This sounds like a policy decision, which means a good fit for it is in `ShenandoahCollectorPolicy`? ------------- PR Review: https://git.openjdk.org/jdk/pull/17795#pullrequestreview-1874932086 PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1485989218 PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1485991956 PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1485993917 From eosterlund at openjdk.org Mon Feb 12 10:38:03 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 12 Feb 2024 10:38:03 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v9] In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 21:27:58 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback from Dean and Thomas > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1554: > >> 1552: __ far_jump(RuntimeAddress(SharedRuntime::get_handle_wrong_method_stub())); >> 1553: >> 1554: // Verified entry point must be aligned > > I think this comment should come around line 1539. Okay, I will fix it. > src/hotspot/cpu/arm/sharedRuntime_arm.cpp line 823: > >> 821: __ ic_check(CodeEntryAlignment /* end_alignment */); >> 822: >> 823: int vep_offset = __ pc() - start; > > Maybe add back > `// Verified entry point` > above this line. Sure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1485997296 PR Review Comment: https://git.openjdk.org/jdk/pull/17495#discussion_r1485997474 From eosterlund at openjdk.org Mon Feb 12 11:04:23 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 12 Feb 2024 11:04:23 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v10] In-Reply-To: References: Message-ID: > ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. > > The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. > > With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. > > I have tested the changes from tier1-7, and run through full aurora performance tests. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Some comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17495/files - new: https://git.openjdk.org/jdk/pull/17495/files/08c146b7..29790afe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17495&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17495&range=08-09 Stats: 4 lines in 2 files changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17495.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17495/head:pull/17495 PR: https://git.openjdk.org/jdk/pull/17495 From eosterlund at openjdk.org Mon Feb 12 11:04:23 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 12 Feb 2024 11:04:23 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v9] In-Reply-To: <2zDHeeRc4pWQetiIzFMR0JwjZ5ExX7_OV8Yx9G4o_mg=.066b11a8-6a49-455e-a985-3fb5de974480@github.com> References: <2zDHeeRc4pWQetiIzFMR0JwjZ5ExX7_OV8Yx9G4o_mg=.066b11a8-6a49-455e-a985-3fb5de974480@github.com> Message-ID: On Sat, 10 Feb 2024 00:35:47 GMT, Dean Long wrote: > If I just look at the new code, everything looks very reasonable. I tried to compare the new code to the old code, but quickly gave up. :) > Could you explain why opt_virtual can now be a direct call and CompiledIC is now only for virtual? It seems like we could have done that even with the old code. Sure! So I think at some point we had static calls and virtual calls, with virtual calls using CompiledIC and static calls using CompiledStaticCall. I think there must have been some confusion at some point when introducing opt_virtual. They are virtual calls, so it might have seem "natural" to jam them into CompiledIC, and have CompiledIC deal with them. But since they in their code shapes are a lot more similar to static calls, with a direct call and a stub for interpreter entry, it never was a great fit. To deal with it we had to 1) ensure the data is always null as there is no data, because it isn't really an inline cache, 2) check for it everywhere to not spin up ICStubs when transitioning, because it isn't really an inline cache, and then when manipulating the callsite, have some some native call wrapper abstraction with virtual calls that would convert requests to update the inline cache to update the corresponding direct call and stub instead for optimized virtual calls, and get out of the inline cache world. To me this always seemed a bit backwards. So in my new implementation I instead accept that opt_virtual calls really are more like the static calls (generated code is identical), and have pretty much nothing to do with inline caches, despite being virtual calls, and made them both use a common DirectCall abstraction instead, that fit both of them, as they are both direct calls. Could we have cleaned that up earlier? Yes, probably. But it was pretty ingrained and "interesting" to change with incremental changes. I figured this was my chance to do this right since I'm rewriting the CompiledIC file pretty much from scratch as you noticed. I hope you agree with this decision. > Also, why don't we have to check for method->is_old() anymore? The checks for is_old were there because when performing an inline cache transition, we had situations where we would need an ICStub, and after running out of ICStubs we would have to request a safepoint to refill ICStubs. That safepoint could sneak in a class redefinition operation, rendering the methods invalid, or at least is_old(). Since I removed ICStubs and don't need any ICStubs to transition inline caches, we also get any safepoints in this code. And therefore we can't get any class redefinition, and hence can remove all the is_old() checks, which are now effectively dead code. Thanks for the review @dean-long! I updated the comments as requested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1938453986 PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1938457516 From kdnilsen at openjdk.org Mon Feb 12 17:41:24 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 Feb 2024 17:41:24 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC Message-ID: At the end of GC, we set aside collector reserves to satisfy anticipated needs of the next GC. This PR reverts a change that accidentally prevents old-gen from being enlarged by this action. The observed failure condition was that mixed evacuations were not able to be performed, because old-gen was not large enough to receive the results of the desired evacuations. ------------- Commit messages: - Allow old-gen to expand when mutator memory is available Changes: https://git.openjdk.org/shenandoah/pull/394/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=394&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325670 Stats: 7 lines in 1 file changed: 4 ins; 3 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/394.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/394/head:pull/394 PR: https://git.openjdk.org/shenandoah/pull/394 From kdnilsen at openjdk.org Mon Feb 12 18:27:23 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 Feb 2024 18:27:23 GMT Subject: RFR: 8325673: GenShen: Share Reserves between Old and Young Collector Message-ID: Allow young-gen Collector reserve to share memory with old-gen Collector reserve in order to support prompt processing of mixed evacuations, as constrained by ShenandoahOldEvacRatioPercent. ------------- Commit messages: - Share reserves between Young Collector and Old Collector Changes: https://git.openjdk.org/shenandoah/pull/395/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=395&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325673 Stats: 28 lines in 3 files changed: 15 ins; 0 del; 13 mod Patch: https://git.openjdk.org/shenandoah/pull/395.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/395/head:pull/395 PR: https://git.openjdk.org/shenandoah/pull/395 From dlong at openjdk.org Mon Feb 12 21:17:22 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 12 Feb 2024 21:17:22 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v10] In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 11:04:23 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Some comments Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17495#pullrequestreview-1876321235 From ysr at openjdk.org Mon Feb 12 21:23:07 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Feb 2024 21:23:07 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 17:36:45 GMT, Kelvin Nilsen wrote: > At the end of GC, we set aside collector reserves to satisfy anticipated needs of the next GC. > > This PR reverts a change that accidentally prevents old-gen from being enlarged by this action. The observed failure condition was that mixed evacuations were not able to be performed, because old-gen was not large enough to receive the results of the desired evacuations. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1275: > 1273: // In the case that ShenandoahOldEvacRatioPercent equals 100, max_old_reserve is limited only by xfer_limit. > 1274: const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? > 1275: old_available + xfer_limit: (young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent); I guess I don't understand two things here: 1. Why do we special-case ShenandoahOldEvacRationPercent == 100 here? When it's less that 100, we consider xfer_limit only in the deficit calculations below. Should we be adding xfer_limit to the result of the above calculation irrespective of the setting of ShenandoahOldEvacRationPercent ? 2. Where was this adjustment being made in the code before the changes of https://github.com/openjdk/shenandoah/pull/369 ? ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/394#discussion_r1486787363 From kdnilsen at openjdk.org Mon Feb 12 21:34:14 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 Feb 2024 21:34:14 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v11] In-Reply-To: References: Message-ID: > Several objectives: > 1. Reduce humongous allocation failures by segregating regular regions from humongous regions > 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB > 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations > 4. Treat collector reserves as available for Mutator allocations after evacuation completes > 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah > > On internal performance pipelines, this change shows: > > 1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions. > 2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%). This system CPU time increase appears to correlate with increased page faults and/or rss. > 3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood) > 4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs) > > Shenandoah > ------------------------------------------------------------------------------------------------------- > +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000 > Control: 819938.875 (+/-5724.56 ) 40 > Test: 2185552.625 (+/-26378.64 ) 20 > > +166.16% scimark.sparse.large/rss_max p=0.00000 > Control: 3285226.375 (+/-22812.93 ) 40 > Test: 8743881.500 (+/-104906.69 ) 20 > > +164.78% sunflow/cpu_system p=0.00000 > Control: 1.280s (+/- 0.10s ) 40 > Test: 3.390s (+/- 0.13s ) 20 > > +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000 > Control: 3.259 (+/- 1.46 ) 33 > Test: 8.125 (+/- 2.05 ) 20 > > +143.75% pmd/major_page_fault_count p=0.03622 > Control: 1.000 (+/- 0.00 ) 40 > Test: 2.438 (+/- 2.59 ) 20 > > +80.22% lusearch/minor_page_fault_count p=0.00000 > Control: 2043930.938 (+/-4777.14 ) 40 > Test: 3683477.625 (+/-5650.29 ) 20 > > +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000 > Control: 697899.156 (+/-3457.82 ) 40 > Test: 1050363.812 (+/-175237.63 ) 20 > > +49.97% scimark.sparse.small/rss_max p=0.00000 > Control: 277075... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Experiment with proposed ShenandoahPackEvacTightly option ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17561/files - new: https://git.openjdk.org/jdk/pull/17561/files/b2ba4cf2..655e30f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=09-10 Stats: 19 lines in 2 files changed: 15 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17561/head:pull/17561 PR: https://git.openjdk.org/jdk/pull/17561 From wkemper at openjdk.org Mon Feb 12 21:40:15 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Feb 2024 21:40:15 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs In-Reply-To: <55EE8zv7lqKgRb7-FVnPrCOh_TlW81q9jZz5Khqldmc=.5fc1f072-39bf-4c87-957c-ca400bb9ac2c@github.com> References: <55EE8zv7lqKgRb7-FVnPrCOh_TlW81q9jZz5Khqldmc=.5fc1f072-39bf-4c87-957c-ca400bb9ac2c@github.com> Message-ID: <2jNjDmDDo_1giROYkjIjFYMF_fiuBZbfXTYuTNf6fcg=.80155237-7ad0-49be-8447-fb3ed285a815@github.com> On Mon, 12 Feb 2024 10:27:33 GMT, Aleksey Shipilev wrote: >> Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. > > src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp line 98: > >> 96: } >> 97: >> 98: bool is_explicit_gc(GCCause::Cause cause) { > > There is `ShenandoahControlThread::is_explicit_gc` too. I think we should have a shared definition for this somewhere. Maybe `ShenandoahControlThread` should delegate to `ShenandoahCollectorPolicy` then? Yes. I'll factor it's usage into `ShenandoahCollectorPolicy`. > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 385: > >> 383: bool ShenandoahControlThread::should_run_full_gc(GCCause::Cause cause) { >> 384: return is_explicit_gc(cause) ? !ExplicitGCInvokesConcurrent : !ShenandoahImplicitGCInvokesConcurrent; >> 385: } > > This sounds like a policy decision, which means a good fit for it is in `ShenandoahCollectorPolicy`? Agree. Moved it in there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1486803499 PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1486804033 From wkemper at openjdk.org Mon Feb 12 21:44:02 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Feb 2024 21:44:02 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs In-Reply-To: <55EE8zv7lqKgRb7-FVnPrCOh_TlW81q9jZz5Khqldmc=.5fc1f072-39bf-4c87-957c-ca400bb9ac2c@github.com> References: <55EE8zv7lqKgRb7-FVnPrCOh_TlW81q9jZz5Khqldmc=.5fc1f072-39bf-4c87-957c-ca400bb9ac2c@github.com> Message-ID: On Mon, 12 Feb 2024 10:29:56 GMT, Aleksey Shipilev wrote: >> Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. > > src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp line 138: > >> 136: out->cr(); >> 137: out->print_cr(SIZE_FORMAT_W(5) " Successful Concurrent GCs (%.2f%%)", _success_concurrent_gcs, percent_of(_success_concurrent_gcs, completed_gcs)); >> 138: if (ExplicitGCInvokesConcurrent) { > > Instead of relying on flags here, should we explicitly (pun intended) record `_explicit_collection_causes` and `_implicit_collection_causes`? That is what used to happen in `ShenandoahControlThread`. It feels like duplicating a result that can be derived from recording the gc causes. The `ExplicitGCInvokesConcurrent` and `ShenandoahImplicitGCInvokesConcurrent` flags aren't manageable, so I don't expect them to be different when the report is generated. I suppose that could change in the future... I'd be more inclined to not tally them at all and just leave the report to show the individual causes without classifying them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1486808915 From wkemper at openjdk.org Mon Feb 12 21:51:35 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Feb 2024 21:51:35 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs [v2] In-Reply-To: References: Message-ID: > Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. William Kemper has updated the pull request incrementally with three additional commits since the last revision: - Move more gc cause classification and policy decisions to shenandoahCollectorPolicy - Improve names for accumulating arrays - Const all the things ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17795/files - new: https://git.openjdk.org/jdk/pull/17795/files/ce06e434..9115c470 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17795&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17795&range=00-01 Stats: 86 lines in 4 files changed: 30 ins; 28 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/17795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17795/head:pull/17795 PR: https://git.openjdk.org/jdk/pull/17795 From ysr at openjdk.org Mon Feb 12 22:03:26 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Feb 2024 22:03:26 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it Message-ID: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it ------------- Commit messages: - Merge branch 'master' into generation_type - Introduce ShenandoahGenerationType and templatize most closures with it. Changes: https://git.openjdk.org/jdk/pull/17815/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17815&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325671 Stats: 119 lines in 9 files changed: 68 ins; 0 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/17815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17815/head:pull/17815 PR: https://git.openjdk.org/jdk/pull/17815 From wkemper at openjdk.org Mon Feb 12 22:12:24 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Feb 2024 22:12:24 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v4] In-Reply-To: References: Message-ID: > Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. > > The changes here move the regulator thread into `ShenandoahGenerationalHeap`. The generational version of the control thread is also now instantiated only by the generational heap. The upstream version of the control thread has more or less been restored. To summarize: > * An abstract base class called `ShenandoahController` has been introduced as the base class for the original and generational control threads. It has just one virtual method and it is not on a fast path. Much of the common code has been pulled up into this class. > * The respective control threads no longer need to check what mode they are in. They also no longer need to select which global generation they need to use. The regulator thread is now only used by the generational mode so it no longer supports running only global cycles. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Fix Mac and Zero builds ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/391/files - new: https://git.openjdk.org/shenandoah/pull/391/files/ce189e7b..1e845ff7 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=03 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=02-03 Stats: 4 lines in 3 files changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/391.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/391/head:pull/391 PR: https://git.openjdk.org/shenandoah/pull/391 From ysr at openjdk.org Mon Feb 12 22:22:03 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Feb 2024 22:22:03 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 18:31:30 GMT, Y. Srinivas Ramakrishna wrote: > 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.hpp line 36: > 34: > 35: class ShenandoahConcurrentMark: public ShenandoahMark { > 36: template friend class ShenandoahConcurrentMarkingTask; TBD: Needs #include of ShenandoahGenerationType. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17815#discussion_r1486851675 From ysr at openjdk.org Mon Feb 12 22:22:03 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Feb 2024 22:22:03 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 22:17:32 GMT, Y. Srinivas Ramakrishna wrote: >> 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.hpp line 36: > >> 34: >> 35: class ShenandoahConcurrentMark: public ShenandoahMark { >> 36: template friend class ShenandoahConcurrentMarkingTask; > > TBD: Needs #include of ShenandoahGenerationType. done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17815#discussion_r1486857634 From ysr at openjdk.org Mon Feb 12 22:25:27 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Feb 2024 22:25:27 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v2] In-Reply-To: References: Message-ID: > 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17815/files - new: https://git.openjdk.org/jdk/pull/17815/files/25397d03..04c32336 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17815&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17815&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17815/head:pull/17815 PR: https://git.openjdk.org/jdk/pull/17815 From wkemper at openjdk.org Mon Feb 12 22:27:13 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Feb 2024 22:27:13 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs [v3] In-Reply-To: References: Message-ID: > Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Add comments to new APIs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17795/files - new: https://git.openjdk.org/jdk/pull/17795/files/9115c470..8d174e81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17795&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17795&range=01-02 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17795/head:pull/17795 PR: https://git.openjdk.org/jdk/pull/17795 From ysr at openjdk.org Tue Feb 13 00:04:06 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 13 Feb 2024 00:04:06 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v2] In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 22:25:27 GMT, Y. Srinivas Ramakrishna wrote: >> 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it > > Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: > > Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). Did some spot checks and overll sanity checks to compare the libjvm.so contents before and after the change, and don't see any great difference (although the libjvm.so size did increase a tad). Here's one example of the new tempaltized vs old non-templatized versions (second column is the size from nm): * Before templating: 0000000015044416 0000000000000947 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015037600 0000000000000930 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015029888 0000000000000883 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015033552 0000000000000882 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015029040 0000000000000836 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015032704 0000000000000835 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015038544 0000000000000564 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015045376 0000000000000564 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015037104 0000000000000489 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015043920 0000000000000489 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015036912 0000000000000182 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015043728 0000000000000182 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000022615392 0000000000000056 b OopOopIterateDispatch::_table 0000000015028784 0000000000000037 t void OopOopIterateDispatch::Table::init(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015028832 0000000000000037 t void OopOopIterateDispatch::Table::init(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015028736 0000000000000037 t void OopOopIterateDispatch::Table::init(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015028688 0000000000000037 t void OopOopIterateDispatch::Table::init(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015028928 0000000000000037 t void OopOopIterateDispatch::Table::init(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015028880 0000000000000037 t void OopOopIterateDispatch::Table::init(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015028640 0000000000000037 t void OopOopIterateDispatch::Table::init(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000022615368 0000000000000008 b guard variable for OopOopIterateDispatch::_table 0000000015029024 0000000000000001 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) 0000000015029008 0000000000000001 t void OopOopIterateDispatch::Table::oop_oop_iterate(ShenandoahMarkRefsClosure*, oopDesc*, Klass*) * After templating: 0000000015038032 0000000000000947 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015044480 0000000000000930 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015029696 0000000000000883 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015033360 0000000000000882 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015028848 0000000000000836 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015032512 0000000000000835 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015058048 0000000000000564 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015038992 0000000000000564 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015043984 0000000000000489 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015037536 0000000000000489 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015043792 0000000000000182 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015037344 0000000000000182 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000022615392 0000000000000056 b OopOopIterateDispatch >::_table 0000000015028592 0000000000000037 t void OopOopIterateDispatch >::Table::init(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015028640 0000000000000037 t void OopOopIterateDispatch >::Table::init(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015028544 0000000000000037 t void OopOopIterateDispatch >::Table::init(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015028496 0000000000000037 t void OopOopIterateDispatch >::Table::init(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015028736 0000000000000037 t void OopOopIterateDispatch >::Table::init(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015028688 0000000000000037 t void OopOopIterateDispatch >::Table::init(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015028448 0000000000000037 t void OopOopIterateDispatch >::Table::init(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000022615368 0000000000000008 b guard variable for OopOopIterateDispatch >::_table 0000000015028832 0000000000000001 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) 0000000015028816 0000000000000001 t void OopOopIterateDispatch >::Table::oop_oop_iterate(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)0>*, oopDesc*, Klass*) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17815#issuecomment-1939816220 From ysr at openjdk.org Tue Feb 13 00:10:03 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 13 Feb 2024 00:10:03 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v2] In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 22:25:27 GMT, Y. Srinivas Ramakrishna wrote: >> 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it > > Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: > > Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). Difference in (release) libjvm.so sizes (second column is size, a difference of ~2.4KB, less than 0.01%): ``` 4095288896 26968936 generation_type.libjvm.so 2020291223 26966488 master.libjvm.so ------------- PR Comment: https://git.openjdk.org/jdk/pull/17815#issuecomment-1939823072 From kdnilsen at openjdk.org Tue Feb 13 01:04:08 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 13 Feb 2024 01:04:08 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 21:20:21 GMT, Y. Srinivas Ramakrishna wrote: >> At the end of GC, we set aside collector reserves to satisfy anticipated needs of the next GC. >> >> This PR reverts a change that accidentally prevents old-gen from being enlarged by this action. The observed failure condition was that mixed evacuations were not able to be performed, because old-gen was not large enough to receive the results of the desired evacuations. > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1275: > >> 1273: // In the case that ShenandoahOldEvacRatioPercent equals 100, max_old_reserve is limited only by xfer_limit. >> 1274: const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? >> 1275: old_available + xfer_limit: (young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent); > > I guess I don't understand two things here: > > 1. Why do we special-case ShenandoahOldEvacRationPercent == 100 here? When it's less that 100, we consider xfer_limit only in the deficit calculations below. Should we be adding xfer_limit to the result of the above calculation irrespective of the setting of ShenandoahOldEvacRationPercent ? > 2. Where was this adjustment being made in the code before the changes of https://github.com/openjdk/shenandoah/pull/369 ? We special case ShenandoahOldEvacRatioPercent==100 because the "other case" has divide by (100 - ShenandoahOldEvacRatioPercent), which becomes divide by zero. To generalize the form of the other expression, if ShenandoahOldEvacRatioPercent is 100, then there is no bound on maximum_old_evacuation_reserve. Or in other words, the bound is infinity times maximum_young_evacuation_reserve. In the original code, before the referenced change, if we can get past the divide-by-zero issue, we would find expansion of old to be limited by the xfer_limit at line 1265: if (old_region_deficit > max_old_region_xfer) { old_region_deficit = max_old_region_xfer; } We still ultimately limit expansion by xfer_limit. I may have misunderstood your questions. Please let me know if I missed the mark. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/394#discussion_r1487009240 From wkemper at openjdk.org Tue Feb 13 01:23:25 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Feb 2024 01:23:25 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v5] In-Reply-To: References: Message-ID: > Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. > > The changes here move the regulator thread into `ShenandoahGenerationalHeap`. The generational version of the control thread is also now instantiated only by the generational heap. The upstream version of the control thread has more or less been restored. To summarize: > * An abstract base class called `ShenandoahController` has been introduced as the base class for the original and generational control threads. It has just one virtual method and it is not on a fast path. Much of the common code has been pulled up into this class. > * The respective control threads no longer need to check what mode they are in. They also no longer need to select which global generation they need to use. The regulator thread is now only used by the generational mode so it no longer supports running only global cycles. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Fix zero build some more ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/391/files - new: https://git.openjdk.org/shenandoah/pull/391/files/1e845ff7..fabeb853 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=04 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=03-04 Stats: 9 lines in 2 files changed: 3 ins; 2 del; 4 mod Patch: https://git.openjdk.org/shenandoah/pull/391.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/391/head:pull/391 PR: https://git.openjdk.org/shenandoah/pull/391 From ysr at openjdk.org Tue Feb 13 02:57:17 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 13 Feb 2024 02:57:17 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC In-Reply-To: References: Message-ID: <61XuQE3dTAW_6nPgdLbSY2VkTUZSpTExRA3CdgiSzdQ=.ea68357f-37f0-4d1b-a9d0-236266c8b46a@github.com> On Tue, 13 Feb 2024 01:01:17 GMT, Kelvin Nilsen wrote: > We special case ShenandoahOldEvacRatioPercent==100 because the "other case" has divide by (100 - ShenandoahOldEvacRatioPercent), which becomes divide by zero. Yes, that I realize. I was asking about the addition of xfer_limit in just this case and not otherwise. > > To generalize the form of the other expression, if ShenandoahOldEvacRatioPercent is 100, then there is no bound on maximum_old_evacuation_reserve. Or in other words, the bound is infinity times maximum_young_evacuation_reserve. Correct. So I bounded it by max available. You corrected it to max_available + xfer_limit. It seems as if you want to bound everything by (max_available + xfer_limit). > > In the original code, before the referenced change, if we can get past the divide-by-zero issue, we would find expansion of old to be limited by the xfer_limit at line 1265: if (old_region_deficit > max_old_region_xfer) { old_region_deficit = max_old_region_xfer; } > That's still the case with old_region_deficit without your current change. > We still ultimately limit expansion by xfer_limit. I think that happened before as well, except when now because of your change we treat SOERP=100 specially (but nothing else). > > I may have misunderstood your questions. Please let me know if I missed the mark. What I am suggesting is that where we used to do: const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? old_available : MIN2((young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent), old_available); instead of doing what you suggest above, viz.: // In the case that ShenandoahOldEvacRatioPercent equals 100, max_old_reserve is limited only by xfer_limit. const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? old_available + xfer_limit: (young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent); that we do: const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? (old_available + xfer_limit) : MIN2((young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent), old_available + xfer_limit); Effectively, you are using `old_available + xfer_limit` for what we can ever have for the maximum size of old_reserve. Otherwise, for suitably large values of ShenandoahOldEvacRatioPercent, you'll use a larger value of max_old_reserve than you have available even after using the transfer from young. I guess I am not understanding enough of the subsequent bounds; I am just looking at the equivalence of the old code (before my change), the current code (after my previous change), and your proposed change which basically appears to say that we must augment whatever is availabe in old with whatever young is willing to transfer to old. That should happen irrespective of the what the combination of young_reserve and SOERP happens to be, not just special casing the extremal case that the previous fix handled. (Think about what happens in the usual cases where this value is left at the default: your proposed change would have no effect as far as I can see.) ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/394#discussion_r1487081469 From eosterlund at openjdk.org Tue Feb 13 09:15:28 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 13 Feb 2024 09:15:28 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v11] In-Reply-To: References: Message-ID: > ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. > > The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. > > With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. > > I have tested the changes from tier1-7, and run through full aurora performance tests. Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Merge branch 'master' into 8310823_object_streaming - Some comments - Feedback from Dean and Thomas - Update src/hotspot/share/code/compiledIC.hpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/runtime/sharedRuntime.cpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - ARM32 fixes - Add comment - Deal with short far branches on AArch64 - Batch allocate and free CompiledICData - JVMCI support - ... and 9 more: https://git.openjdk.org/jdk/compare/62a4be03...29994286 ------------- Changes: https://git.openjdk.org/jdk/pull/17495/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17495&range=10 Stats: 4253 lines in 142 files changed: 501 ins; 3226 del; 526 mod Patch: https://git.openjdk.org/jdk/pull/17495.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17495/head:pull/17495 PR: https://git.openjdk.org/jdk/pull/17495 From eosterlund at openjdk.org Tue Feb 13 09:19:17 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 13 Feb 2024 09:19:17 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v9] In-Reply-To: <2zDHeeRc4pWQetiIzFMR0JwjZ5ExX7_OV8Yx9G4o_mg=.066b11a8-6a49-455e-a985-3fb5de974480@github.com> References: <2zDHeeRc4pWQetiIzFMR0JwjZ5ExX7_OV8Yx9G4o_mg=.066b11a8-6a49-455e-a985-3fb5de974480@github.com> Message-ID: On Sat, 10 Feb 2024 00:35:47 GMT, Dean Long wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback from Dean and Thomas > > If I just look at the new code, everything looks very reasonable. I tried to compare the new code to the old code, but quickly gave up. > > Could you explain why opt_virtual can now be a direct call and CompiledIC is now only for virtual? It seems like we could have done that even with the old code. > > Also, why don't we have to check for method->is_old() anymore? Thanks for the review, @dean-long! I pushed a trivial merge conflict with the next round of NULL -> nullptr changes. Running final round of tests now ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1940860792 From kdnilsen at openjdk.org Wed Feb 14 00:44:08 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Feb 2024 00:44:08 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v5] In-Reply-To: References: Message-ID: <31EVgABcCxolZ6bGz8hKSxazbpbhIJLqSNRFdyRCv8s=.7adf5e7a-a93b-463d-b8bf-89a99cbcfa6c@github.com> On Tue, 13 Feb 2024 01:23:25 GMT, William Kemper wrote: >> Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. >> >> The changes here move the regulator thread into `ShenandoahGenerationalHeap`. The generational version of the control thread is also now instantiated only by the generational heap. The upstream version of the control thread has more or less been restored. To summarize: >> * An abstract base class called `ShenandoahController` has been introduced as the base class for the original and generational control threads. It has just one virtual method and it is not on a fast path. Much of the common code has been pulled up into this class. >> * The respective control threads no longer need to check what mode they are in. They also no longer need to select which global generation they need to use. The regulator thread is now only used by the generational mode so it no longer supports running only global cycles. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix zero build some more Thanks for separating this out. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/391#pullrequestreview-1879187203 From kdnilsen at openjdk.org Wed Feb 14 00:49:02 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Feb 2024 00:49:02 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v2] In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 22:25:27 GMT, Y. Srinivas Ramakrishna wrote: >> 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it > > Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: > > Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). Thanks for these refinements. Look good to me. ------------- Marked as reviewed by kdnilsen (no project role). PR Review: https://git.openjdk.org/jdk/pull/17815#pullrequestreview-1879190942 From ysr at openjdk.org Wed Feb 14 01:46:02 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 14 Feb 2024 01:46:02 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v2] In-Reply-To: References: Message-ID: On Wed, 14 Feb 2024 00:46:44 GMT, Kelvin Nilsen wrote: >> Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). > > Thanks for these refinements. Look good to me. Thanks @kdnilsen ! Can I please get a review from @rkennke , @shipilev , or other _jdk Reviewer_ familiar with Shenandoah. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17815#issuecomment-1942959206 From eosterlund at openjdk.org Wed Feb 14 10:04:11 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 14 Feb 2024 10:04:11 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v11] In-Reply-To: References: Message-ID: On Tue, 13 Feb 2024 09:15:28 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Merge branch 'master' into 8310823_object_streaming > - Some comments > - Feedback from Dean and Thomas > - Update src/hotspot/share/code/compiledIC.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/runtime/sharedRuntime.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - ARM32 fixes > - Add comment > - Deal with short far branches on AArch64 > - Batch allocate and free CompiledICData > - JVMCI support > - ... and 9 more: https://git.openjdk.org/jdk/compare/62a4be03...29994286 Tier1-7 test results look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1943430726 From tschatzl at openjdk.org Wed Feb 14 11:33:10 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 14 Feb 2024 11:33:10 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints [v11] In-Reply-To: References: Message-ID: On Tue, 13 Feb 2024 09:15:28 GMT, Erik ?sterlund wrote: >> ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. >> >> The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. >> >> With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. >> >> I have tested the changes from tier1-7, and run through full aurora performance tests. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Merge branch 'master' into 8310823_object_streaming > - Some comments > - Feedback from Dean and Thomas > - Update src/hotspot/share/code/compiledIC.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/runtime/sharedRuntime.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - ARM32 fixes > - Add comment > - Deal with short far branches on AArch64 > - Batch allocate and free CompiledICData > - JVMCI support > - ... and 9 more: https://git.openjdk.org/jdk/compare/62a4be03...29994286 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17495#pullrequestreview-1880048712 From eosterlund at openjdk.org Wed Feb 14 11:48:11 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 14 Feb 2024 11:48:11 GMT Subject: RFR: 8322630: Remove ICStubs and related safepoints In-Reply-To: References: <9GolX3m7SkG4Fs0KTN5qMRxVK47eAhPLdmlaO3oGSKc=.736c3053-abc5-45ce-bd54-85c4f70c3fc9@github.com> Message-ID: On Mon, 29 Jan 2024 09:38:52 GMT, Thomas Schatzl wrote: >> On linux, the time for "Purge Unlinked NMethods" goes down when I comment out `delete ic->data();` and ignore the memory leak. (MacOS seems to be ok with it.) >> Adding trace code to `purge_ic_callsites` shows that we often have 0 or 2 ICData instances, sometimes up to 30 ones. >> It would be good to think a bit about the allocation scheme. Some ideas would be >> >> - Allocate ICData in an array per nmethod instead of individually. That should help to some degree and also improve data locality (and hence cache efficiency). Would also save iterating over the relocations when purging unlinked NMethods. It's not very complex. >> - Instead of freeing ICData instances, we could enqueue them and either reuse or free them during a concurrent phase. This may be a bit complicated. Not sure if it's worth it. >> - Allocate in Metaspace? > >> On linux, the time for "Purge Unlinked NMethods" goes down when I comment out delete ic->data(); and ignore the memory leak. (MacOS seems to be ok with it.) >>Adding trace code to purge_ic_callsites shows that we often have 0 or 2 ICData instances, sometimes up to 30 ones. >>It would be good to think a bit about the allocation scheme. Some ideas would be > >> Allocate ICData in an array per nmethod instead of individually. That should help to some degree and also improve data locality (and hence cache efficiency). Would also save iterating over the relocations when purging unlinked NMethods. It's not very complex. >> Instead of freeing ICData instances, we could enqueue them and either reuse or free them during a concurrent phase. This may be a bit complicated. Not sure if it's worth it. >> Allocate in Metaspace? > > Sorry for being unresponsive for a bit. > > Yes, the issue is the new `delete ic->data()`; but also the iteration over the relocinfo here is almost as expensive in my tests. > > So the idea to allocate ICData in a per nmethod basis (and actually some other existing C heap allocations that are also `delete`d in this phase) seems the most promising to me. > > Also came up with the other suggestions, but I think that first one seems best to me at first glance. I did not really like the second because enqueuing adds another indirection for first gathering all of them and then separately free them. > > Metaspace is something I do not know that well to comment on that option. > > I am open to moving this improvement, if it is not easy to do, into a separate CR. Thanks for the review @tschatzl! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17495#issuecomment-1943609432 From eosterlund at openjdk.org Wed Feb 14 11:48:12 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 14 Feb 2024 11:48:12 GMT Subject: Integrated: 8322630: Remove ICStubs and related safepoints In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 06:25:20 GMT, Erik ?sterlund wrote: > ICStubs solve an atomicity problem when setting both the destination and data of an inline cache. Unfortunately, it also leads to occasional safepoint carpets when multiple threads need to ICRefill the stubs at the same time, and spurious GuaranteedSafepointInterval "Cleanup" safepoints every second. This patch changes inline caches to not change the data part at all during the nmethod life cycle, hence removing the need for ICStubs. > > The new scheme is less stateful. Instead of adding and removing callsite metadata back and forth when transitioning inline cache states, it installs all state any shape of call will ever need at resolution time in a struct that I call CompiledICData. This reduces inline cache state changes to simply changing the destination of the call, and it doesn't really matter what state transitions to what other state. > > With this patch, we get rid of ICStub and ICBuffer classes and the related ICRefill and almost all Cleanup safepoints in practice. It also makes the inline cache code much simpler. > > I have tested the changes from tier1-7, and run through full aurora performance tests. This pull request has now been integrated. Changeset: 84965ea1 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/84965ea1a86703818410f11c8d284e4b824817dd Stats: 4253 lines in 142 files changed: 501 ins; 3226 del; 526 mod 8322630: Remove ICStubs and related safepoints Co-authored-by: Martin Doerr Co-authored-by: Aleksey Shipilev Co-authored-by: Amit Kumar Co-authored-by: Robbin Ehn Co-authored-by: Aleksei Voitylov Reviewed-by: tschatzl, aboldtch, dlong ------------- PR: https://git.openjdk.org/jdk/pull/17495 From shade at openjdk.org Wed Feb 14 15:02:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Feb 2024 15:02:06 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs [v3] In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 22:27:13 GMT, William Kemper wrote: >> Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Add comments to new APIs Looks good, with minor nits. src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp line 93: > 91: > 92: // Returns true if this is an 'explicit' gc request and ExplicitGCInvokesConcurrent is disabled, > 93: // or if this is an 'implicit' gc request and ShenandoahImplicitGCInvokesConcurrent is disabled. These comments are not really necessary, as they describe the implementation rather than the contract. There is a risk the real implementation would get out of sync with comments later. I think the name conveys the contract well already. src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp line 96: > 94: static bool should_run_full_gc(GCCause::Cause cause); > 95: > 96: // Returns false if this is an `explicit` gc request and DisableExplicitGC is active, otherwise Same. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17795#pullrequestreview-1880524122 PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1489617367 PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1489617455 From shade at openjdk.org Wed Feb 14 15:02:07 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Feb 2024 15:02:07 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs [v3] In-Reply-To: References: <55EE8zv7lqKgRb7-FVnPrCOh_TlW81q9jZz5Khqldmc=.5fc1f072-39bf-4c87-957c-ca400bb9ac2c@github.com> Message-ID: On Mon, 12 Feb 2024 21:41:36 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp line 138: >> >>> 136: out->cr(); >>> 137: out->print_cr(SIZE_FORMAT_W(5) " Successful Concurrent GCs (%.2f%%)", _success_concurrent_gcs, percent_of(_success_concurrent_gcs, completed_gcs)); >>> 138: if (ExplicitGCInvokesConcurrent) { >> >> Instead of relying on flags here, should we explicitly (pun intended) record `_explicit_collection_causes` and `_implicit_collection_causes`? > > That is what used to happen in `ShenandoahControlThread`. It feels like duplicating a result that can be derived from recording the gc causes. The `ExplicitGCInvokesConcurrent` and `ShenandoahImplicitGCInvokesConcurrent` flags aren't manageable, so I don't expect them to be different when the report is generated. I suppose that could change in the future... I'd be more inclined to not tally them at all and just leave the report to show the individual causes without classifying them. All right, fine, if this would be a problem later, we would reconsider. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17795#discussion_r1489620291 From shade at openjdk.org Wed Feb 14 15:28:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Feb 2024 15:28:57 GMT Subject: RFR: 8323634: Shenandoah: Document behavior of EvacOOM protocol [v5] In-Reply-To: References: <6ciSyKdz9hA6RBOZeDicFetK_G4AUBpx40YX7yT1O1M=.870e1ba1-6f4b-48e9-8360-dab141a3041d@github.com> Message-ID: On Wed, 24 Jan 2024 17:53:39 GMT, Kelvin Nilsen wrote: >> The protocol for handling OOM during evacuation is subtle and critical for correct operation. This PR does NOT change behavior. It provides improved documentation of existing behavior. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix spelling error and mismatched parentheses. Hm, I expected that "Document behavior" would only change comments, but I also see code additions. Let me look at the whole thing. I think I have a few editorial comments as well, I'll just give a patch on top of this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17385#issuecomment-1944059758 From wkemper at openjdk.org Wed Feb 14 16:46:18 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Feb 2024 16:46:18 GMT Subject: RFR: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs [v4] In-Reply-To: References: Message-ID: > Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17795/files - new: https://git.openjdk.org/jdk/pull/17795/files/8d174e81..dfa31c98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17795&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17795&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17795/head:pull/17795 PR: https://git.openjdk.org/jdk/pull/17795 From wkemper at openjdk.org Wed Feb 14 16:57:09 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Feb 2024 16:57:09 GMT Subject: Integrated: 8325574: Shenandoah: Simplify and enhance reporting of requested GCs In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 19:43:33 GMT, William Kemper wrote: > Shenandoah distinguishes between 'implicit' and 'explicit' GC requests. The distinction is used to decide whether or not a request is serviced concurrently or with a full GC. The data collected is also included in the end-of-process report. This change simplifies handling of these requests and adds a tally of the underlying GC causes to the end-of-process report. This pull request has now been integrated. Changeset: b823fa44 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/b823fa44508901a6bf39795ab18991d055a71b4e Stats: 173 lines in 5 files changed: 69 ins; 65 del; 39 mod 8325574: Shenandoah: Simplify and enhance reporting of requested GCs Reviewed-by: ysr, kdnilsen, shade ------------- PR: https://git.openjdk.org/jdk/pull/17795 From ysr at openjdk.org Wed Feb 14 17:42:14 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 14 Feb 2024 17:42:14 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v3] In-Reply-To: References: Message-ID: > 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into generation_type - Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). - Merge branch 'master' into generation_type - Introduce ShenandoahGenerationType and templatize most closures with it. The template expands for only the NON_GEN type for the non-generational version of Shenandoah currently, and will in the future accomodate Generational Shenandoah. ------------- Changes: https://git.openjdk.org/jdk/pull/17815/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17815&range=02 Stats: 120 lines in 9 files changed: 69 ins; 0 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/17815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17815/head:pull/17815 PR: https://git.openjdk.org/jdk/pull/17815 From ysr at openjdk.org Wed Feb 14 18:16:16 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 14 Feb 2024 18:16:16 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v4] In-Reply-To: References: Message-ID: > 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into generation_type - Merge branch 'master' into generation_type - Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). - Merge branch 'master' into generation_type - Introduce ShenandoahGenerationType and templatize most closures with it. The template expands for only the NON_GEN type for the non-generational version of Shenandoah currently, and will in the future accomodate Generational Shenandoah. ------------- Changes: https://git.openjdk.org/jdk/pull/17815/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17815&range=03 Stats: 120 lines in 9 files changed: 69 ins; 0 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/17815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17815/head:pull/17815 PR: https://git.openjdk.org/jdk/pull/17815 From shade at openjdk.org Wed Feb 14 18:30:11 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Feb 2024 18:30:11 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 18:39:32 GMT, Robbin Ehn wrote: > Think about if you can add missing pieces to: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/spinYield.hpp And utilize that, as it is would be nice to have one? configurable back-off strategy in the future. Thought about it. The way current patch is done, there is little leeway to make the code common or reuse it, unfortunately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17813#issuecomment-1942299012 From shade at openjdk.org Wed Feb 14 18:30:11 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Feb 2024 18:30:11 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM Message-ID: `ShenandoahLock` is a spinlock that is supposed to guard heap state. That lock is normally only lightly contended, as threads normally only allocate large TLABs/GCLABs. So we just summarily delegated to `Thread::{SpinAcquire,SpinRelease}`, which spins a bit, and then starts going to sleep/yield dance on contention. This does not work well when there are lots of threads near the OOM conditions. Then, most of these threads would fail to allocate the TLAB, go for out-of-TLAB alloc, start lock acquisition, and spend _a lot_ of time trying to acquire the lock. The handshake/safepoint would think those threads are running, even when they are actually yielding or sleeping. As seen in bug report, a handshake operation over many such threads could then take hundreds of seconds. The solution is to notify VM that we are blocking before going for `sleep` or `yield`. This is similar to what other VM code does near such yields. This involves state transitions, so it is only cheap to do near the actual blocking. Protecting the whole lock with VM transition would be very slow. I also de-uglified bits of adjacent code. Additional testing: - [x] Original Extremem reproducer does not have outliers anymore - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` - [x] Linux x86_64 server fastdebug, `all` passes with `-XX:+UseShenandoahGC` (some old failures, but quite a few "timeouts" also disappear) ------------- Commit messages: - Touchups - Fix Changes: https://git.openjdk.org/jdk/pull/17813/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17813&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325587 Stats: 81 lines in 3 files changed: 61 ins; 5 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/17813.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17813/head:pull/17813 PR: https://git.openjdk.org/jdk/pull/17813 From rehn at openjdk.org Wed Feb 14 18:30:11 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 14 Feb 2024 18:30:11 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 17:40:00 GMT, Aleksey Shipilev wrote: > `ShenandoahLock` is a spinlock that is supposed to guard heap state. That lock is normally only lightly contended, as threads normally only allocate large TLABs/GCLABs. So we just summarily delegated to `Thread::{SpinAcquire,SpinRelease}`, which spins a bit, and then starts going to sleep/yield dance on contention. > > This does not work well when there are lots of threads near the OOM conditions. Then, most of these threads would fail to allocate the TLAB, go for out-of-TLAB alloc, start lock acquisition, and spend _a lot_ of time trying to acquire the lock. The handshake/safepoint would think those threads are running, even when they are actually yielding or sleeping. As seen in bug report, a handshake operation over many such threads could then take hundreds of seconds. > > The solution is to notify VM that we are blocking before going for `sleep` or `yield`. This is similar to what other VM code does near such yields. This involves state transitions, so it is only cheap to do near the actual blocking. Protecting the whole lock with VM transition would be very slow. > > I also de-uglified bits of adjacent code. > > Additional testing: > - [x] Original Extremem reproducer does not have outliers anymore > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `all` passes with `-XX:+UseShenandoahGC` (some old failures, but quite a few "timeouts" also disappear) Think about if you can add missing pieces to: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/spinYield.hpp And utilize that, as it is would be nice to have one? configurable back-off strategy in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17813#issuecomment-1939319623 From rehn at openjdk.org Wed Feb 14 19:37:07 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 14 Feb 2024 19:37:07 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 17:40:00 GMT, Aleksey Shipilev wrote: > `ShenandoahLock` is a spinlock that is supposed to guard heap state. That lock is normally only lightly contended, as threads normally only allocate large TLABs/GCLABs. So we just summarily delegated to `Thread::{SpinAcquire,SpinRelease}`, which spins a bit, and then starts going to sleep/yield dance on contention. > > This does not work well when there are lots of threads near the OOM conditions. Then, most of these threads would fail to allocate the TLAB, go for out-of-TLAB alloc, start lock acquisition, and spend _a lot_ of time trying to acquire the lock. The handshake/safepoint would think those threads are running, even when they are actually yielding or sleeping. As seen in bug report, a handshake operation over many such threads could then take hundreds of seconds. > > The solution is to notify VM that we are blocking before going for `sleep` or `yield`. This is similar to what other VM code does near such yields. This involves state transitions, so it is only cheap to do near the actual blocking. Protecting the whole lock with VM transition would be very slow. > > I also de-uglified bits of adjacent code. > > Additional testing: > - [x] Original Extremem reproducer does not have outliers anymore > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `all` passes with `-XX:+UseShenandoahGC` (some old failures, but quite a few "timeouts" also disappear) I know to little about Shenandoah to say if blocking is always okay, sorry. Other than that I don't spot any issues, had some minor suggestion. src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 65: > 63: } > 64: > 65: void ShenandoahLock::contended_lock_or_block(JavaThread* java_thread) { As this is a copy you could use ThreadBlockInVM or a NoOp as template parameter to get one version. src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 35: > 33: class ShenandoahLock { > 34: private: > 35: enum LockState { unlocked = 0, locked = 1 }; Keep ? src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 48: > 46: > 47: void lock(bool allow_block_for_safepoint) { > 48: assert(_owner != Thread::current(), "reentrant locking attempt, would deadlock"); We are trying to use Atomic::load on all volatile reads (i.e. those that can be written concurrently). As volatile semantic is on the way out. src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 51: > 49: > 50: // Try to lock fast, or dive into contended lock handling. > 51: if (Atomic::cmpxchg(&_state, 0, 1) != 0) { If we keep enum, this line reads: `if (Atomic::cmpxchg(&_state, unlocked, locked) != unlocked) {` Which I find more appealing. src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 57: > 55: assert(_state == 1, "must be locked"); > 56: assert(_owner == nullptr, "must not be owned"); > 57: DEBUG_ONLY(_owner = Thread::current();) We are trying to use Atomic::store on all volatile writes (i.e. those that can be read concurrently). As volatile semantic is on the way out. ------------- PR Review: https://git.openjdk.org/jdk/pull/17813#pullrequestreview-1881035131 PR Review Comment: https://git.openjdk.org/jdk/pull/17813#discussion_r1489939902 PR Review Comment: https://git.openjdk.org/jdk/pull/17813#discussion_r1489940587 PR Review Comment: https://git.openjdk.org/jdk/pull/17813#discussion_r1489944834 PR Review Comment: https://git.openjdk.org/jdk/pull/17813#discussion_r1489939951 PR Review Comment: https://git.openjdk.org/jdk/pull/17813#discussion_r1489940074 From ysr at openjdk.org Wed Feb 14 22:04:13 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 14 Feb 2024 22:04:13 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v5] In-Reply-To: References: Message-ID: On Tue, 13 Feb 2024 01:23:25 GMT, William Kemper wrote: >> Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. >> >> The changes here move the regulator thread into `ShenandoahGenerationalHeap`. The generational version of the control thread is also now instantiated only by the generational heap. The upstream version of the control thread has more or less been restored. To summarize: >> * An abstract base class called `ShenandoahController` has been introduced as the base class for the original and generational control threads. It has just one virtual method and it is not on a fast path. Much of the common code has been pulled up into this class. >> * The respective control threads no longer need to check what mode they are in. They also no longer need to select which global generation they need to use. The regulator thread is now only used by the generational mode so it no longer supports running only global cycles. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix zero build some more This looks very good; left a minor comment. Not necessarily actionable here or now, but just something to think about. src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 53: > 51: > 52: void ShenandoahControlThread::run_service() { > 53: ShenandoahHeap* heap = ShenandoahHeap::heap(); I see that in getting back to being closer to upstream tip of Shenandoah, we have lost (amongst other things), a bunch of const's here as an example. Also some fixes to comments as well. Was the the intention? The alternative is that these kinds of changes to upstream code could be upstreamed so that they aren't lost in trying to reconcile with legacy Shenandoah. Thoughts? src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 55: > 53: ShenandoahHeap* heap = ShenandoahHeap::heap(); > 54: > 55: GCMode default_mode = concurrent_normal; Lost const. src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 66: > 64: // shrinking with lag of less than 1/10-th of true delay. > 65: // ShenandoahUncommitDelay is in msecs, but shrink_period is in seconds. > 66: double shrink_period = (double)ShenandoahUncommitDelay / 1000 / 10; Lost const. src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 76: > 74: GCCause::Cause requested_gc_cause = _requested_gc_cause; > 75: bool explicit_gc_requested = is_gc_requested && is_explicit_gc(requested_gc_cause); > 76: bool implicit_gc_requested = is_gc_requested && !is_explicit_gc(requested_gc_cause); Lost consts. I'll stop adding these "Lost const" comments further below, but you get the idea. src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 260: > 258: } > 259: } else { > 260: // Allow allocators to know we have seen this much regions "words allocated" was more accurate than "regions" in comment above. ------------- Marked as reviewed by ysr (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/391#pullrequestreview-1881341745 PR Review Comment: https://git.openjdk.org/shenandoah/pull/391#discussion_r1490102749 PR Review Comment: https://git.openjdk.org/shenandoah/pull/391#discussion_r1490103541 PR Review Comment: https://git.openjdk.org/shenandoah/pull/391#discussion_r1490104148 PR Review Comment: https://git.openjdk.org/shenandoah/pull/391#discussion_r1490104574 PR Review Comment: https://git.openjdk.org/shenandoah/pull/391#discussion_r1490108285 From ysr at openjdk.org Wed Feb 14 22:04:14 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 14 Feb 2024 22:04:14 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v2] In-Reply-To: References: Message-ID: On Sat, 10 Feb 2024 00:11:44 GMT, William Kemper wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'shenandoah-master' into isolate-regulator-thread >> - Merge remote-tracking branch 'shenandoah/master' into isolate-regulator-thread >> - Clean up includes >> - Remove unnecessary mode checks >> - Remove non-generational functionality from regulator thread >> - Move heap changed tracking from control thread to heap >> - Clean up includes and headers >> - Factor allocation failure handling methods into common base class >> - Merge remote-tracking branch 'shenandoah/master' into isolate-regulator-thread >> - Merge remote-tracking branch 'shenandoah/master' into isolate-control-thread >> - ... and 12 more: https://git.openjdk.org/shenandoah/compare/8f4e6e22...6915cd07 > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 1: > >> 1: /* > > This file is largely reverted to the upstream's version. Very nice! However, see a few comments below on some good stuff that we may not want to lose in the modifications to the original code that you are now reverting to its original form. Worth a quick check to see if it's worth fixing up separately in upstream perhaps. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/391#discussion_r1490089711 From duke at openjdk.org Thu Feb 15 08:04:17 2024 From: duke at openjdk.org (duke) Date: Thu, 15 Feb 2024 08:04:17 GMT Subject: Withdrawn: 8312116 GenShen: make instantaneous allocation rate triggers more timely In-Reply-To: <3kCLM3psFvTGFmMUovwVQoxC_3ByduhCfwccYyd-SYg=.78d2f69d-e06c-46db-9053-d17076bffc33@github.com> References: <3kCLM3psFvTGFmMUovwVQoxC_3ByduhCfwccYyd-SYg=.78d2f69d-e06c-46db-9053-d17076bffc33@github.com> Message-ID: On Mon, 18 Sep 2023 17:07:41 GMT, Kelvin Nilsen wrote: > When heuristics fail to trigger because an instantaneous "allocation spike" is not so large as to consume all available memory before GC completes, this assessment is based on an assumption that the allocation rate remains constant, and it ignores the time that will be lost due to the _interval between consecutive allocation spike measurements. > > This PR watches for "acceleration" of allocation rates. When acceleration of allocation is detected in 3 consecutive allocation spike measurements, it calculates a best-fit curve (assuming constant acceleration) and predicts the memory to be consumed during the time that spans both the next sample interval and the GC effort that follows it. If the memory to be allocated according to anticipated acceleration of allocations during this time span exceeds what is available, we trigger immediately. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/shenandoah/pull/327 From shade at openjdk.org Thu Feb 15 11:24:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Feb 2024 11:24:54 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM In-Reply-To: References: Message-ID: On Wed, 14 Feb 2024 19:07:27 GMT, Robbin Ehn wrote: >> `ShenandoahLock` is a spinlock that is supposed to guard heap state. That lock is normally only lightly contended, as threads normally only allocate large TLABs/GCLABs. So we just summarily delegated to `Thread::{SpinAcquire,SpinRelease}`, which spins a bit, and then starts going to sleep/yield dance on contention. >> >> This does not work well when there are lots of threads near the OOM conditions. Then, most of these threads would fail to allocate the TLAB, go for out-of-TLAB alloc, start lock acquisition, and spend _a lot_ of time trying to acquire the lock. The handshake/safepoint would think those threads are running, even when they are actually yielding or sleeping. As seen in bug report, a handshake operation over many such threads could then take hundreds of seconds. >> >> The solution is to notify VM that we are blocking before going for `sleep` or `yield`. This is similar to what other VM code does near such yields. This involves state transitions, so it is only cheap to do near the actual blocking. Protecting the whole lock with VM transition would be very slow. >> >> I also de-uglified bits of adjacent code. >> >> Additional testing: >> - [x] Original Extremem reproducer does not have outliers anymore >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Linux x86_64 server fastdebug, `all` passes with `-XX:+UseShenandoahGC` (some old failures, but quite a few "timeouts" also disappear) > > src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 65: > >> 63: } >> 64: >> 65: void ShenandoahLock::contended_lock_or_block(JavaThread* java_thread) { > > As this is a copy you could use ThreadBlockInVM or a NoOp as template parameter to get one version. Right, that's a good idea. Let me see... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17813#discussion_r1490856738 From shade at openjdk.org Thu Feb 15 12:02:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Feb 2024 12:02:29 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM [v2] In-Reply-To: References: Message-ID: > `ShenandoahLock` is a spinlock that is supposed to guard heap state. That lock is normally only lightly contended, as threads normally only allocate large TLABs/GCLABs. So we just summarily delegated to `Thread::{SpinAcquire,SpinRelease}`, which spins a bit, and then starts going to sleep/yield dance on contention. > > This does not work well when there are lots of threads near the OOM conditions. Then, most of these threads would fail to allocate the TLAB, go for out-of-TLAB alloc, start lock acquisition, and spend _a lot_ of time trying to acquire the lock. The handshake/safepoint would think those threads are running, even when they are actually yielding or sleeping. As seen in bug report, a handshake operation over many such threads could then take hundreds of seconds. > > The solution is to notify VM that we are blocking before going for `sleep` or `yield`. This is similar to what other VM code does near such yields. This involves state transitions, so it is only cheap to do near the actual blocking. Protecting the whole lock with VM transition would be very slow. > > I also de-uglified bits of adjacent code. > > Additional testing: > - [x] Original Extremem reproducer does not have outliers anymore > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `all` passes with `-XX:+UseShenandoahGC` (some old failures, but quite a few "timeouts" also disappear) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17813/files - new: https://git.openjdk.org/jdk/pull/17813/files/3711cf13..81af0e16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17813&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17813&range=00-01 Stats: 45 lines in 2 files changed: 8 ins; 17 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/17813.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17813/head:pull/17813 PR: https://git.openjdk.org/jdk/pull/17813 From shade at openjdk.org Thu Feb 15 12:17:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Feb 2024 12:17:05 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM [v2] In-Reply-To: References: Message-ID: On Thu, 15 Feb 2024 11:21:49 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 65: >> >>> 63: } >>> 64: >>> 65: void ShenandoahLock::contended_lock_or_block(JavaThread* java_thread) { >> >> As this is a copy you could use ThreadBlockInVM or a NoOp as template parameter to get one version. > > Right, that's a good idea. Let me see... Done in new commit. Retesting... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17813#discussion_r1490920363 From rehn at openjdk.org Thu Feb 15 12:51:05 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 15 Feb 2024 12:51:05 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM [v2] In-Reply-To: References: Message-ID: <9HCPwzdDCUXEoyI8tNt8LKNC7tSuRQcicTik3n_UEnM=.bfb6afe0-5900-4eaa-9319-854241ad9faa@github.com> On Thu, 15 Feb 2024 12:02:29 GMT, Aleksey Shipilev wrote: >> `ShenandoahLock` is a spinlock that is supposed to guard heap state. That lock is normally only lightly contended, as threads normally only allocate large TLABs/GCLABs. So we just summarily delegated to `Thread::{SpinAcquire,SpinRelease}`, which spins a bit, and then starts going to sleep/yield dance on contention. >> >> This does not work well when there are lots of threads near the OOM conditions. Then, most of these threads would fail to allocate the TLAB, go for out-of-TLAB alloc, start lock acquisition, and spend _a lot_ of time trying to acquire the lock. The handshake/safepoint would think those threads are running, even when they are actually yielding or sleeping. As seen in bug report, a handshake operation over many such threads could then take hundreds of seconds. >> >> The solution is to notify VM that we are blocking before going for `sleep` or `yield`. This is similar to what other VM code does near such yields. This involves state transitions, so it is only cheap to do near the actual blocking. Protecting the whole lock with VM transition would be very slow. >> >> I also de-uglified bits of adjacent code. >> >> Additional testing: >> - [x] Original Extremem reproducer does not have outliers anymore >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Linux x86_64 server fastdebug, `all` passes with `-XX:+UseShenandoahGC` (some old failures, but quite a few "timeouts" also disappear) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Nice work! I'll review it, but as stated don't trust me that it's works :) Looks good, thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17813#pullrequestreview-1882684317 From shade at openjdk.org Thu Feb 15 13:40:04 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Feb 2024 13:40:04 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v4] In-Reply-To: References: Message-ID: On Wed, 14 Feb 2024 18:16:16 GMT, Y. Srinivas Ramakrishna wrote: >> 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it > > Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into generation_type > - Merge branch 'master' into generation_type > - Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). > - Merge branch 'master' into generation_type > - Introduce ShenandoahGenerationType and templatize most closures with it. > The template expands for only the NON_GEN type for the non-generational > version of Shenandoah currently, and will in the future accomodate > Generational Shenandoah. The change looks okay, but I do not see a good reason to introduce this into Shenandoah before Generational mode arrives. It is just extra (mostly dead) code to maintain, and it effectively hides what changes Generational needs to do in shared Shenandoah code. Maybe we should consider this as part of Generational integration PR? src/hotspot/share/gc/shenandoah/shenandoahMark.cpp line 92: > 90: > 91: template > 92: void ShenandoahMark::mark_loop(ShenandoahGenerationType generation /* ignored */, uint worker_id, TaskTerminator* terminator, ShenandoahReferenceProcessor *rp, StringDedup::Requests* const req) { I think `/* ignored */` is not needed here. Also, the argument list line is probably too long. AFAICS, we put the arguments we curry from method arguments to template arguments at very end. See how the method arguments are entering the template arguments in the same order: template void ShenandoahMark::mark_loop(uint worker_id, TaskTerminator* terminator, ShenandoahReferenceProcessor *rp, ShenandoahGenerationType generation, StringDedup::Requests* const req) ... void ShenandoahMark::mark_loop(uint worker_id, TaskTerminator* terminator, ShenandoahReferenceProcessor *rp, bool cancellable, StringDedupMode dedup_mode, ShenandoahGenerationType generation, StringDedup::Requests* const req ... ------------- PR Review: https://git.openjdk.org/jdk/pull/17815#pullrequestreview-1882776311 PR Review Comment: https://git.openjdk.org/jdk/pull/17815#discussion_r1491016785 From wkemper at openjdk.org Thu Feb 15 14:20:30 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 15 Feb 2024 14:20:30 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master Message-ID: Merges tag jdk-21.0.3+3 ------------- Commit messages: - 8325194: GHA: Add macOS M1 testing - 8324753: [AIX] adjust os_posix after JDK-8318696 - 8323671: DevKit build gcc libraries contain full paths to source location - 8323667: Library debug files contain non-reproducible full gcc include paths - 8318039: GHA: Bump macOS and Xcode versions - 8325150: (tz) Update Timezone Data to 2024a - 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 - 8324637: [aix] Implement support for reporting swap space in jdk.management - 8324598: use mem_unit when working with sysinfo memory and swap related information - 8323964: runtime/Thread/ThreadCountLimit.java fails intermittently on AIX - ... and 14 more: https://git.openjdk.org/shenandoah-jdk21u/compare/2518d203...b9cf41da The webrev contains the conflicts with master: - merge conflicts: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=21&range=00.conflicts Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/21/files Stats: 1787 lines in 95 files changed: 1192 ins; 374 del; 221 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/21.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/21/head:pull/21 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/21 From shade at openjdk.org Thu Feb 15 16:00:59 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Feb 2024 16:00:59 GMT Subject: RFR: 8323634: Shenandoah: Document behavior of EvacOOM protocol [v5] In-Reply-To: References: <6ciSyKdz9hA6RBOZeDicFetK_G4AUBpx40YX7yT1O1M=.870e1ba1-6f4b-48e9-8360-dab141a3041d@github.com> Message-ID: On Tue, 6 Feb 2024 00:58:47 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix spelling error and mismatched parentheses. > > src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 172: > >> 170: * make the protocol more efficient. >> 171: * >> 172: * TODO: make refinements to the OOM-during-evac protocol so that it is less disruptive and more efficient. > > May be all of this and the remainder of this comment in terms of improvements from line 162 above up to line 203 below should instead go in a JBS ticket, include here only a terse TODO with a pointer to the ticket for details: > > // TODO: JDK-XXXX will investigate potential performance/efficiency improvements to this protocol. +1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1491241892 From shade at openjdk.org Thu Feb 15 16:40:55 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Feb 2024 16:40:55 GMT Subject: RFR: 8323634: Shenandoah: Document behavior of EvacOOM protocol [v5] In-Reply-To: References: <6ciSyKdz9hA6RBOZeDicFetK_G4AUBpx40YX7yT1O1M=.870e1ba1-6f4b-48e9-8360-dab141a3041d@github.com> Message-ID: On Wed, 24 Jan 2024 17:53:39 GMT, Kelvin Nilsen wrote: >> The protocol for handling OOM during evacuation is subtle and critical for correct operation. This PR does NOT change behavior. It provides improved documentation of existing behavior. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix spelling error and mismatched parentheses. Sorry, but I think it needs significantly more work. 1. Either split out the actual code changes from this PR, or rename it to avoid the impression that it is a docs only change. 2. Describe the high-level protocol in one global comment, and the idea for implementation there. 3. Do _not_ repeat any of these near the code. Code is the source of truth here, not the comments. Consider this: hardly anyone would read more than 1 page of comments. We have to be terse and up to the point. I tried to rewrite the top-level comment more tersely here: [evac-protocol.txt](https://github.com/openjdk/jdk/files/14299573/evac-protocol.txt) src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.hpp line 232: > 230: /** > 231: * Enter a protected evacuation path. > 232: * I see no point repeating _both_ the high-level description of protocol, _and_ the implementation of the method itself here. ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17385#pullrequestreview-1883312877 PR Review Comment: https://git.openjdk.org/jdk/pull/17385#discussion_r1491300081 From ysr at openjdk.org Thu Feb 15 16:52:00 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 15 Feb 2024 16:52:00 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v4] In-Reply-To: References: Message-ID: <_g4FpQL2EYt18dNBgznZzDKAm8RN9NMnpl3QAPgyKmY=.a13c278f-e6a3-49ef-861a-ef5b52be8af4@github.com> On Thu, 15 Feb 2024 13:36:51 GMT, Aleksey Shipilev wrote: > The change looks okay, but I do not see a good reason to introduce this into Shenandoah before Generational mode arrives. It is just extra (mostly dead) code to maintain, and it effectively hides what changes Generational needs to do in shared Shenandoah code. Maybe we should consider this as part of Generational integration PR? Indeed the idea was to introduce this earlier that any Generational-specific code as a decomposition and refactoring of the work needed for generational mode, as I describe in the summary above, such that its effect was independently shown to be minimal & harmless, and making subsequent integration of Generational mode much smaller, simpler, and easier to review on its own. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17815#issuecomment-1946567186 From ysr at openjdk.org Thu Feb 15 17:19:02 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 15 Feb 2024 17:19:02 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v4] In-Reply-To: References: Message-ID: On Thu, 15 Feb 2024 13:33:42 GMT, Aleksey Shipilev wrote: >> Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into generation_type >> - Merge branch 'master' into generation_type >> - Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). >> - Merge branch 'master' into generation_type >> - Introduce ShenandoahGenerationType and templatize most closures with it. >> The template expands for only the NON_GEN type for the non-generational >> version of Shenandoah currently, and will in the future accomodate >> Generational Shenandoah. > > src/hotspot/share/gc/shenandoah/shenandoahMark.cpp line 92: > >> 90: >> 91: template >> 92: void ShenandoahMark::mark_loop(ShenandoahGenerationType generation /* ignored */, uint worker_id, TaskTerminator* terminator, ShenandoahReferenceProcessor *rp, StringDedup::Requests* const req) { > > I think `/* ignored */` is not needed here. > > Also, the argument list line is probably too long. AFAICS, we put the arguments we curry from method arguments to template arguments at very end. See how the method arguments are entering the template arguments in the same order: > > > template > void ShenandoahMark::mark_loop(uint worker_id, TaskTerminator* terminator, ShenandoahReferenceProcessor *rp, > ShenandoahGenerationType generation, StringDedup::Requests* const req) ... > > void ShenandoahMark::mark_loop(uint worker_id, TaskTerminator* terminator, ShenandoahReferenceProcessor *rp, > bool cancellable, StringDedupMode dedup_mode, ShenandoahGenerationType generation, StringDedup::Requests* const req ... I'll fix the order so it conforms to that convention; thanks for pointing it out! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17815#discussion_r1491360556 From wkemper at openjdk.org Thu Feb 15 18:38:16 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 15 Feb 2024 18:38:16 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v5] In-Reply-To: References: Message-ID: On Wed, 14 Feb 2024 21:42:58 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix zero build some more > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 53: > >> 51: >> 52: void ShenandoahControlThread::run_service() { >> 53: ShenandoahHeap* heap = ShenandoahHeap::heap(); > > I see that in getting back to being closer to upstream tip of Shenandoah, we have lost (amongst other things), a bunch of const's here as an example. Also some fixes to comments as well. Was the the intention? > > The alternative is that these kinds of changes to upstream code could be upstreamed so that they aren't lost in trying to reconcile with legacy Shenandoah. > > Thoughts? Yes, I made most of these declarations `const` here: https://github.com/openjdk/jdk/pull/17795/files. Though for some reason, I missed the `heap` variable so I'll fix that before integrating. > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 260: > >> 258: } >> 259: } else { >> 260: // Allow allocators to know we have seen this much regions > > "these many words allocated" was more accurate than "this much regions" in comment above. Yes, I'll restore that change. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/391#discussion_r1491447570 PR Review Comment: https://git.openjdk.org/shenandoah/pull/391#discussion_r1491449158 From wkemper at openjdk.org Thu Feb 15 21:37:20 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 15 Feb 2024 21:37:20 GMT Subject: Integrated: Merge openjdk/jdk21u-dev:master In-Reply-To: References: Message-ID: On Thu, 8 Feb 2024 14:14:33 GMT, William Kemper wrote: > Merges tag jdk-21.0.3+2 This pull request has now been integrated. Changeset: 4e9b613e Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/4e9b613ee7a2a268f3e7b63860a2a49def24d557 Stats: 1353 lines in 68 files changed: 895 ins; 357 del; 101 mod Merge ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/20 From ysr at openjdk.org Fri Feb 16 07:48:09 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 16 Feb 2024 07:48:09 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v5] In-Reply-To: References: Message-ID: > 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: Changes from review: Adjust order of parms in functions so they are consistent with their template parameter order and contiguity, per convention. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17815/files - new: https://git.openjdk.org/jdk/pull/17815/files/bce39e40..69070cfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17815&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17815&range=03-04 Stats: 21 lines in 4 files changed: 2 ins; 3 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/17815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17815/head:pull/17815 PR: https://git.openjdk.org/jdk/pull/17815 From ysr at openjdk.org Fri Feb 16 07:52:55 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 16 Feb 2024 07:52:55 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v4] In-Reply-To: References: Message-ID: <8pEltg5UJ953ozNP9ob-oNAOXQ-mZKpsKSbYicmseXg=.fb5fd2f8-c5d0-472a-8181-36f969d68451@github.com> On Thu, 15 Feb 2024 17:15:54 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahMark.cpp line 92: >> >>> 90: >>> 91: template >>> 92: void ShenandoahMark::mark_loop(ShenandoahGenerationType generation /* ignored */, uint worker_id, TaskTerminator* terminator, ShenandoahReferenceProcessor *rp, StringDedup::Requests* const req) { >> >> I think `/* ignored */` is not needed here. >> >> Also, the argument list line is probably too long. AFAICS, we put the arguments we curry from method arguments to template arguments at very end. See how the method arguments are entering the template arguments in the same order: >> >> >> template >> void ShenandoahMark::mark_loop(uint worker_id, TaskTerminator* terminator, ShenandoahReferenceProcessor *rp, >> ShenandoahGenerationType generation, StringDedup::Requests* const req) ... >> >> void ShenandoahMark::mark_loop(uint worker_id, TaskTerminator* terminator, ShenandoahReferenceProcessor *rp, >> bool cancellable, StringDedupMode dedup_mode, ShenandoahGenerationType generation, StringDedup::Requests* const req ... > > I'll fix the order so it conforms to that convention; thanks for pointing it out! Adjusted order of parms to be consistent with order in templated functions, and fixed long lines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17815#discussion_r1492072147 From wkemper at openjdk.org Fri Feb 16 14:15:25 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Feb 2024 14:15:25 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Merges tag jdk-23+10 ------------- Commit messages: - 8321282: RISC-V: SpinPause() not implemented - 8323994: gtest runner repeats test name for every single gtest assertion - 8325910: Rename jnihelper.h - 8325682: Rename nsk_strace.h - 8325574: Shenandoah: Simplify and enhance reporting of requested GCs - 8252136: Several methods in hotspot are missing "static" - 8316340: (bf) Missing {@inheritDoc} for exception in MappedByteBuffer::compact - 8325643: G1: Refactor G1FlushHumongousCandidateRemSets - 8325403: Add SystemGC JMH benchmarks - 8318966: Some methods make promises about Java array element alignment that are too strong - ... and 50 more: https://git.openjdk.org/shenandoah/compare/3ebe6c19...8cb9b479 The webrev contains the conflicts with master: - merge conflicts: https://webrevs.openjdk.org/?repo=shenandoah&pr=396&range=00.conflicts Changes: https://git.openjdk.org/shenandoah/pull/396/files Stats: 16763 lines in 519 files changed: 5321 ins; 7870 del; 3572 mod Patch: https://git.openjdk.org/shenandoah/pull/396.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/396/head:pull/396 PR: https://git.openjdk.org/shenandoah/pull/396 From wkemper at openjdk.org Fri Feb 16 17:25:15 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Feb 2024 17:25:15 GMT Subject: Withdrawn: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 16 Feb 2024 14:10:26 GMT, William Kemper wrote: > Merges tag jdk-23+10 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/shenandoah/pull/396 From wkemper at openjdk.org Fri Feb 16 17:25:15 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Feb 2024 17:25:15 GMT Subject: RFR: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 16 Feb 2024 14:10:26 GMT, William Kemper wrote: > Merges tag jdk-23+10 Closing this automated PR for the one I created manually. ------------- PR Comment: https://git.openjdk.org/shenandoah/pull/396#issuecomment-1948941260 From wkemper at openjdk.org Fri Feb 16 17:32:15 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Feb 2024 17:32:15 GMT Subject: Integrated: Merge openjdk/jdk:master Message-ID: Merges tag jdk-23+10 ------------- Commit messages: - Merge tag 'jdk-23+10' into merge-jdk-23+10 - Merge - Merge - Merge - Merge - 8324325: [Genshen] Normalize wrt AgeTable changes from JDK-8314329 - Merge - 8324173: GenShen: Fix error that could cause young gcs to fail when old marking is running - Merge - 8323630: GenShen: Control thread may (still) ignore requests to start concurrent GC - ... and 374 more: https://git.openjdk.org/shenandoah/compare/8cb9b479...aaa7b1c9 The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/shenandoah/pull/397/files Stats: 21727 lines in 218 files changed: 19849 ins; 896 del; 982 mod Patch: https://git.openjdk.org/shenandoah/pull/397.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/397/head:pull/397 PR: https://git.openjdk.org/shenandoah/pull/397 From wkemper at openjdk.org Fri Feb 16 17:32:16 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Feb 2024 17:32:16 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 16 Feb 2024 17:22:50 GMT, William Kemper wrote: > Merges tag jdk-23+10 This pull request has now been integrated. Changeset: c7bcd74c Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/c7bcd74c4a9c1e5d04fc4601a9df2f2a522d8992 Stats: 16766 lines in 519 files changed: 5328 ins; 7876 del; 3562 mod Merge ------------- PR: https://git.openjdk.org/shenandoah/pull/397 From wkemper at openjdk.org Fri Feb 16 18:39:34 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Feb 2024 18:39:34 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master [v2] In-Reply-To: References: Message-ID: <6SWEGqDCusPelZEKAhKJryDM5WGplxYy-pRlv7MPlwA=.3e6cce85-7abf-4422-a709-d6351920ea84@github.com> > Merges tag jdk-21.0.3+3 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'shenandoah-21u-master' into merge-jdk-21.0.3+3 - 8325194: GHA: Add macOS M1 testing 8325444: GHA: JDK-8325194 causes a regression Reviewed-by: shade Backport-of: d1c82156ba6ede4b798ac15f935289cfcc99d1a0 - 8324753: [AIX] adjust os_posix after JDK-8318696 Backport-of: 8950d68ddb36d35831fbb4b98969cd0537527070 - 8323671: DevKit build gcc libraries contain full paths to source location Backport-of: dd0694b9cbbfa2defdc3b09f86f20f686688cf7b - 8323667: Library debug files contain non-reproducible full gcc include paths Backport-of: 57fad677819ae3142782f811a8fba94b38f5a74c - 8318039: GHA: Bump macOS and Xcode versions Backport-of: 605c9767291ddf1c409c3e805ffb3182899d06c2 - 8325150: (tz) Update Timezone Data to 2024a Backport-of: 917838e0a564b1f2cbfb6cc214ccbfd1a237019f - 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 Reviewed-by: aph Backport-of: afdaa2a3305461538f3a36de2b0b540fe2da9b37 - 8324637: [aix] Implement support for reporting swap space in jdk.management Backport-of: 33324a59ccdb220250cb74e15ce13af0e99dcb07 - 8324598: use mem_unit when working with sysinfo memory and swap related information Backport-of: 7a798d3cebea0915f8a73af57333b3488c2091af - ... and 3 more: https://git.openjdk.org/shenandoah-jdk21u/compare/9eb7842d...3e6fb7a6 ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/21/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/21/files/b9cf41da..3e6fb7a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=21&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=21&range=00-01 Stats: 21984 lines in 221 files changed: 19954 ins; 952 del; 1078 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/21.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/21/head:pull/21 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/21 From wkemper at openjdk.org Fri Feb 16 19:24:33 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Feb 2024 19:24:33 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v6] In-Reply-To: References: Message-ID: > Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. > > The changes here move the regulator thread into `ShenandoahGenerationalHeap`. The generational version of the control thread is also now instantiated only by the generational heap. The upstream version of the control thread has more or less been restored. To summarize: > * An abstract base class called `ShenandoahController` has been introduced as the base class for the original and generational control threads. It has just one virtual method and it is not on a fast path. Much of the common code has been pulled up into this class. > * The respective control threads no longer need to check what mode they are in. They also no longer need to select which global generation they need to use. The regulator thread is now only used by the generational mode so it no longer supports running only global cycles. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge remote-tracking branch 'shenandoah/master' into isolate-regulator-thread - Fix zero build some more - Fix Mac and Zero builds - Remove unused code (design changed in upstream patch) - Merge branch 'shenandoah-master' into isolate-regulator-thread - Merge remote-tracking branch 'shenandoah/master' into isolate-regulator-thread - Clean up includes - Remove unnecessary mode checks - Remove non-generational functionality from regulator thread - Move heap changed tracking from control thread to heap - ... and 16 more: https://git.openjdk.org/shenandoah/compare/c7bcd74c...358d48d1 ------------- Changes: https://git.openjdk.org/shenandoah/pull/391/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=05 Stats: 2127 lines in 15 files changed: 1282 ins; 728 del; 117 mod Patch: https://git.openjdk.org/shenandoah/pull/391.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/391/head:pull/391 PR: https://git.openjdk.org/shenandoah/pull/391 From wkemper at openjdk.org Fri Feb 16 19:47:21 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Feb 2024 19:47:21 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp Message-ID: This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. ------------- Commit messages: - Fix zero build - Initialize member - Fix typo, remove TODOs - Small cleanup - Move some smaller chunks of code out shFullGC - Fix warnings - Move some big chunks of code out of shFullGC Changes: https://git.openjdk.org/shenandoah/pull/398/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=398&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325808 Stats: 1169 lines in 12 files changed: 634 ins; 495 del; 40 mod Patch: https://git.openjdk.org/shenandoah/pull/398.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/398/head:pull/398 PR: https://git.openjdk.org/shenandoah/pull/398 From wkemper at openjdk.org Fri Feb 16 19:53:14 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Feb 2024 19:53:14 GMT Subject: RFR: 8325807: Shenandoah: Refactor full gc in preparation for generational mode changes Message-ID: Changes to prepare for generational mode: * A `phase5_epilogue` method is added to run the final steps of the gc * Prepare for mark operation is run by multiple worker threads * `finish_region` method of compacting preparation closure is renamed to `finish` * The prepare for compaction loop is extracted to a template method, parameterized on closure type ------------- Commit messages: - Factor out epilogue and template function to prepare for compaction Changes: https://git.openjdk.org/jdk/pull/17894/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17894&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325807 Stats: 76 lines in 2 files changed: 33 ins; 14 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/17894.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17894/head:pull/17894 PR: https://git.openjdk.org/jdk/pull/17894 From wkemper at openjdk.org Sat Feb 17 00:59:28 2024 From: wkemper at openjdk.org (William Kemper) Date: Sat, 17 Feb 2024 00:59:28 GMT Subject: RFR: 8324067: GenShen: Isolate regulator thread to generational mode [v7] In-Reply-To: References: Message-ID: > Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. > > The changes here move the regulator thread into `ShenandoahGenerationalHeap`. The generational version of the control thread is also now instantiated only by the generational heap. The upstream version of the control thread has more or less been restored. To summarize: > * An abstract base class called `ShenandoahController` has been introduced as the base class for the original and generational control threads. It has just one virtual method and it is not on a fast path. Much of the common code has been pulled up into this class. > * The respective control threads no longer need to check what mode they are in. They also no longer need to select which global generation they need to use. The regulator thread is now only used by the generational mode so it no longer supports running only global cycles. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Fix merge error - Update comment ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/391/files - new: https://git.openjdk.org/shenandoah/pull/391/files/358d48d1..8680cfe1 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=06 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=391&range=05-06 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/391.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/391/head:pull/391 PR: https://git.openjdk.org/shenandoah/pull/391 From ysr at openjdk.org Sat Feb 17 01:47:09 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 17 Feb 2024 01:47:09 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v6] In-Reply-To: References: Message-ID: > 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into generation_type - Changes from review: Adjust order of parms in functions so they are consistent with their template parameter order and contiguity, per convention. - Merge branch 'master' into generation_type - Merge branch 'master' into generation_type - Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). - Merge branch 'master' into generation_type - Introduce ShenandoahGenerationType and templatize most closures with it. The template expands for only the NON_GEN type for the non-generational version of Shenandoah currently, and will in the future accomodate Generational Shenandoah. ------------- Changes: https://git.openjdk.org/jdk/pull/17815/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17815&range=05 Stats: 124 lines in 9 files changed: 71 ins; 3 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/17815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17815/head:pull/17815 PR: https://git.openjdk.org/jdk/pull/17815 From ddong at openjdk.org Sun Feb 18 08:01:00 2024 From: ddong at openjdk.org (Denghui Dong) Date: Sun, 18 Feb 2024 08:01:00 GMT Subject: RFR: 8326111: JFR: Cleanup for JFR_ONLY Message-ID: Greeting, Could I have a review of this trivial change that cleans up code where used JFR_ONLY incorrectly? testing: build with --enable-jvm-feature-jfr and --disable-jvm-feature-jfr Denghui ------------- Commit messages: - 8326111: JFR: Cleanup for JFR_ONLY Changes: https://git.openjdk.org/jdk/pull/17903/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17903&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326111 Stats: 9 lines in 5 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17903/head:pull/17903 PR: https://git.openjdk.org/jdk/pull/17903 From ddong at openjdk.org Sun Feb 18 08:10:16 2024 From: ddong at openjdk.org (Denghui Dong) Date: Sun, 18 Feb 2024 08:10:16 GMT Subject: RFR: 8326111: JFR: Cleanup for JFR_ONLY [v2] In-Reply-To: References: Message-ID: > Greeting, > > Could I have a review of this trivial change that cleans up code where used JFR_ONLY incorrectly? > > testing: build with --enable-jvm-feature-jfr and --disable-jvm-feature-jfr > > Denghui Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: more cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17903/files - new: https://git.openjdk.org/jdk/pull/17903/files/fb3fcab2..75710690 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17903&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17903&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17903/head:pull/17903 PR: https://git.openjdk.org/jdk/pull/17903 From shade at openjdk.org Mon Feb 19 10:40:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Feb 2024 10:40:54 GMT Subject: RFR: 8325807: Shenandoah: Refactor full gc in preparation for generational mode changes In-Reply-To: References: Message-ID: <1plQFbrg7DyZxwHfUDcyqKa7c-I1v3HgvnSgTd_W4eY=.6ffa82a9-310c-4ee7-859d-0660f8c80fb7@github.com> On Fri, 16 Feb 2024 19:49:07 GMT, William Kemper wrote: > Changes to prepare for generational mode: > * A `phase5_epilogue` method is added to run the final steps of the gc > * Prepare for mark operation is run by multiple worker threads > * `finish_region` method of compacting preparation closure is renamed to `finish` > * The prepare for compaction loop is extracted to a template method, parameterized on closure type Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17894#pullrequestreview-1888100788 From egahlin at openjdk.org Mon Feb 19 12:27:53 2024 From: egahlin at openjdk.org (Erik Gahlin) Date: Mon, 19 Feb 2024 12:27:53 GMT Subject: RFR: 8326111: JFR: Cleanup for JFR_ONLY [v2] In-Reply-To: References: Message-ID: On Sun, 18 Feb 2024 08:10:16 GMT, Denghui Dong wrote: >> Greetings, >> >> Could I have a review of this trivial change that cleans up code where used JFR_ONLY incorrectly? >> >> testing: build with --enable-jvm-feature-jfr and --disable-jvm-feature-jfr >> >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > more cleanup Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17903#pullrequestreview-1888292545 From ysr at openjdk.org Mon Feb 19 15:31:54 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 19 Feb 2024 15:31:54 GMT Subject: RFR: 8325807: Shenandoah: Refactor full gc in preparation for generational mode changes In-Reply-To: References: Message-ID: <7rbIIywsr1nm718fx2BX_n4VhsnGvvgMrzXBADnNJHE=.80299ee9-e455-4a4f-b00d-fc654433d0cf@github.com> On Fri, 16 Feb 2024 19:49:07 GMT, William Kemper wrote: > Changes to prepare for generational mode: > * A `phase5_epilogue` method is added to run the final steps of the gc > * Prepare for mark operation is run by multiple worker threads > * `finish_region` method of compacting preparation closure is renamed to `finish` > * The prepare for compaction loop is extracted to a template method, parameterized on closure type Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17894#pullrequestreview-1888700081 From kdnilsen at openjdk.org Mon Feb 19 16:02:22 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 19 Feb 2024 16:02:22 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v12] In-Reply-To: References: Message-ID: > Several objectives: > 1. Reduce humongous allocation failures by segregating regular regions from humongous regions > 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB > 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations > 4. Treat collector reserves as available for Mutator allocations after evacuation completes > 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah > > On internal performance pipelines, this change shows: > > 1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions. > 2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%). This system CPU time increase appears to correlate with increased page faults and/or rss. > 3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood) > 4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs) > > Shenandoah > ------------------------------------------------------------------------------------------------------- > +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000 > Control: 819938.875 (+/-5724.56 ) 40 > Test: 2185552.625 (+/-26378.64 ) 20 > > +166.16% scimark.sparse.large/rss_max p=0.00000 > Control: 3285226.375 (+/-22812.93 ) 40 > Test: 8743881.500 (+/-104906.69 ) 20 > > +164.78% sunflow/cpu_system p=0.00000 > Control: 1.280s (+/- 0.10s ) 40 > Test: 3.390s (+/- 0.13s ) 20 > > +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000 > Control: 3.259 (+/- 1.46 ) 33 > Test: 8.125 (+/- 2.05 ) 20 > > +143.75% pmd/major_page_fault_count p=0.03622 > Control: 1.000 (+/- 0.00 ) 40 > Test: 2.438 (+/- 2.59 ) 20 > > +80.22% lusearch/minor_page_fault_count p=0.00000 > Control: 2043930.938 (+/-4777.14 ) 40 > Test: 3683477.625 (+/-5650.29 ) 20 > > +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000 > Control: 697899.156 (+/-3457.82 ) 40 > Test: 1050363.812 (+/-175237.63 ) 20 > > +49.97% scimark.sparse.small/rss_max p=0.00000 > Control: 277075... Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Use bitmap to represent each freeset for improved performance - Revert "Experiment with proposed ShenandoahPackEvacTightly option" This reverts commit 655e30f65dca5d60b3d2e2a882d3ef3a86952e7e. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17561/files - new: https://git.openjdk.org/jdk/pull/17561/files/655e30f6..daacbaab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=10-11 Stats: 955 lines in 3 files changed: 710 ins; 52 del; 193 mod Patch: https://git.openjdk.org/jdk/pull/17561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17561/head:pull/17561 PR: https://git.openjdk.org/jdk/pull/17561 From kdnilsen at openjdk.org Mon Feb 19 17:25:11 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 19 Feb 2024 17:25:11 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC In-Reply-To: References: <61XuQE3dTAW_6nPgdLbSY2VkTUZSpTExRA3CdgiSzdQ=.ea68357f-37f0-4d1b-a9d0-236266c8b46a@github.com> Message-ID: On Mon, 19 Feb 2024 17:19:35 GMT, Kelvin Nilsen wrote: >>> We special case ShenandoahOldEvacRatioPercent==100 because the "other case" has divide by (100 - ShenandoahOldEvacRatioPercent), which becomes divide by zero. >> >> Yes, that I realize. I was asking about the addition of xfer_limit in just this case and not otherwise. >> >>> >>> To generalize the form of the other expression, if ShenandoahOldEvacRatioPercent is 100, then there is no bound on maximum_old_evacuation_reserve. Or in other words, the bound is infinity times maximum_young_evacuation_reserve. >> >> Correct. So I bounded it by max available. You corrected it to max_available + xfer_limit. It seems as if you want to bound everything by (max_available + xfer_limit). >> >>> >>> In the original code, before the referenced change, if we can get past the divide-by-zero issue, we would find expansion of old to be limited by the xfer_limit at line 1265: if (old_region_deficit > max_old_region_xfer) { old_region_deficit = max_old_region_xfer; } >>> >> >> That's still the case with old_region_deficit without your current change. >> >>> We still ultimately limit expansion by xfer_limit. >> >> I think that happened before as well, except when now because of your change we treat SOERP=100 specially (but nothing else). >>> >>> I may have misunderstood your questions. Please let me know if I missed the mark. >> >> What I am suggesting is that where we used to do: >> >> const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? >> old_available : MIN2((young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent), >> old_available); >> >> >> instead of doing what you suggest above, viz.: >> >> >> // In the case that ShenandoahOldEvacRatioPercent equals 100, max_old_reserve is limited only by xfer_limit. >> const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? >> old_available + xfer_limit: (young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent); >> >> >> that we do: >> >> >> const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? >> (old_available + xfer_limit) : MIN2((young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent), >> old_available + xfer_limit); >> >> >> Effectively, you are using `old_available + xfer_limit` for what we can ever have for the maximum size of old_reserve. Otherwise, for suitably large values of ShenandoahOldEvacRatioPercent, you'll us... > > With this PR, I was attempting to restore the "normal-case behavior" (when ShenandoahOldEvacRatioPercenet != 100) to how it behaved before https://github.com/openjdk/shenandoah/pull/369 > > Before that change, this line of code did not impose any restriction on the size of young_evacuation_reserve based on old_available: > > size_t maximum_old_evacuation_reserve = maximum_young_evacuation_reserve * ShenandoahOldEvacRatioPercent / (100 - ShenandoahOldEvacRatioPercent); > > For this new code, I invented an "artificial limit" to replace "infiinity" in the case that ShenandoahOldEvacRatioPercent equals 100. > > Having studied this issue in the current implementaiton, I am inclined to pursue an even more aggressive change in a distinct [PR](https://github.com/openjdk/shenandoah/pull/395), which allows OLD to grow not only by stealing memory from the Mutator's excesses, but also by borrowing from the Young Collector's reserves. So I'd prefer not to place more restrictions on the allowed growth of old at this line of code. If you feel more comfortable, I can put the MIN2 expression into the normal case handling. But I'll be wanting to take it out in the upcoming complementary PR. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/394#discussion_r1494864998 From kdnilsen at openjdk.org Mon Feb 19 17:25:11 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 19 Feb 2024 17:25:11 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC In-Reply-To: <61XuQE3dTAW_6nPgdLbSY2VkTUZSpTExRA3CdgiSzdQ=.ea68357f-37f0-4d1b-a9d0-236266c8b46a@github.com> References: <61XuQE3dTAW_6nPgdLbSY2VkTUZSpTExRA3CdgiSzdQ=.ea68357f-37f0-4d1b-a9d0-236266c8b46a@github.com> Message-ID: On Tue, 13 Feb 2024 02:41:18 GMT, Y. Srinivas Ramakrishna wrote: >> We special case ShenandoahOldEvacRatioPercent==100 because the "other case" has divide by (100 - ShenandoahOldEvacRatioPercent), which becomes divide by zero. >> >> To generalize the form of the other expression, if ShenandoahOldEvacRatioPercent is 100, then there is no bound on maximum_old_evacuation_reserve. Or in other words, the bound is infinity times maximum_young_evacuation_reserve. >> >> In the original code, before the referenced change, if we can get past the divide-by-zero issue, we would find expansion of old to be limited by the xfer_limit at line 1265: >> if (old_region_deficit > max_old_region_xfer) { >> old_region_deficit = max_old_region_xfer; >> } >> >> We still ultimately limit expansion by xfer_limit. >> >> I may have misunderstood your questions. Please let me know if I missed the mark. > >> We special case ShenandoahOldEvacRatioPercent==100 because the "other case" has divide by (100 - ShenandoahOldEvacRatioPercent), which becomes divide by zero. > > Yes, that I realize. I was asking about the addition of xfer_limit in just this case and not otherwise. > >> >> To generalize the form of the other expression, if ShenandoahOldEvacRatioPercent is 100, then there is no bound on maximum_old_evacuation_reserve. Or in other words, the bound is infinity times maximum_young_evacuation_reserve. > > Correct. So I bounded it by max available. You corrected it to max_available + xfer_limit. It seems as if you want to bound everything by (max_available + xfer_limit). > >> >> In the original code, before the referenced change, if we can get past the divide-by-zero issue, we would find expansion of old to be limited by the xfer_limit at line 1265: if (old_region_deficit > max_old_region_xfer) { old_region_deficit = max_old_region_xfer; } >> > > That's still the case with old_region_deficit without your current change. > >> We still ultimately limit expansion by xfer_limit. > > I think that happened before as well, except when now because of your change we treat SOERP=100 specially (but nothing else). >> >> I may have misunderstood your questions. Please let me know if I missed the mark. > > What I am suggesting is that where we used to do: > > const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? > old_available : MIN2((young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent), > old_available); > > > instead of doing what you suggest above, viz.: > > > // In the case that ShenandoahOldEvacRatioPercent equals 100, max_old_reserve is limited only by xfer_limit. > const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? > old_available + xfer_limit: (young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent); > > > that we do: > > > const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? > (old_available + xfer_limit) : MIN2((young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent), > old_available + xfer_limit); > > > Effectively, you are using `old_available + xfer_limit` for what we can ever have for the maximum size of old_reserve. Otherwise, for suitably large values of ShenandoahOldEvacRatioPercent, you'll use a larger value of max_old_reserve than you have available even after using the ... With this PR, I was attempting to restore the "normal-case behavior" (when ShenandoahOldEvacRatioPercenet != 100) to how it behaved before https://github.com/openjdk/shenandoah/pull/369 Before that change, this line of code did not impose any restriction on the size of young_evacuation_reserve based on old_available: size_t maximum_old_evacuation_reserve = maximum_young_evacuation_reserve * ShenandoahOldEvacRatioPercent / (100 - ShenandoahOldEvacRatioPercent); For this new code, I invented an "artificial limit" to replace "infiinity" in the case that ShenandoahOldEvacRatioPercent equals 100. Having studied this issue in the current implementaiton, I am inclined to pursue an even more aggressive change in a distinct [PR](https://github.com/openjdk/shenandoah/pull/395), which allows OLD to grow not only by stealing memory from the Mutator's excesses, but also by borrowing from the Young Collector's reserves. So I'd prefer not to place more restrictions on the allowed growth of old at this line of code. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/394#discussion_r1494862313 From kdnilsen at openjdk.org Mon Feb 19 17:43:15 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 19 Feb 2024 17:43:15 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC In-Reply-To: References: <61XuQE3dTAW_6nPgdLbSY2VkTUZSpTExRA3CdgiSzdQ=.ea68357f-37f0-4d1b-a9d0-236266c8b46a@github.com> Message-ID: On Mon, 19 Feb 2024 17:22:02 GMT, Kelvin Nilsen wrote: >> With this PR, I was attempting to restore the "normal-case behavior" (when ShenandoahOldEvacRatioPercenet != 100) to how it behaved before https://github.com/openjdk/shenandoah/pull/369 >> >> Before that change, this line of code did not impose any restriction on the size of old_evacuation_reserve based on old_available: >> >> size_t maximum_old_evacuation_reserve = maximum_young_evacuation_reserve * ShenandoahOldEvacRatioPercent / (100 - ShenandoahOldEvacRatioPercent); >> >> For this new code, I invented an "artificial limit" to replace "infiinity" in the case that ShenandoahOldEvacRatioPercent equals 100. >> >> Having studied this issue in the current implementaiton, I am inclined to pursue an even more aggressive change in a distinct [PR](https://github.com/openjdk/shenandoah/pull/395), which allows OLD to grow not only by stealing memory from the Mutator's excesses, but also by borrowing from the Young Collector's reserves. So I'd prefer not to place more restrictions on the allowed growth of old at this line of code. > > If you feel more comfortable, I can put the MIN2 expression into the normal case handling. But I'll be wanting to take it out in the upcoming complementary PR. Context: xfer_limit will be zero if there is no "planned" GC idle time. The allocation runway goes to zero if we have experienced recent degens and/or full GCs, because penalties accumulate which cause us to immediately trigger young GCs. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/394#discussion_r1494882108 From ysr at openjdk.org Mon Feb 19 18:33:10 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 19 Feb 2024 18:33:10 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp In-Reply-To: References: Message-ID: On Mon, 19 Feb 2024 15:38:39 GMT, Y. Srinivas Ramakrishna wrote: >> This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. > > src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 35: > >> 33: >> 34: >> 35: class ShenandoahSetRememberedCardsToDirtyClosure : public BasicOopIterateClosure { > > Nit: I realize this is the existing name of the closure, and you just moved it here, but consider the following renaming suggestion: `ShenandoahDirtyRememberedSetClosure` Nit: may be place a block comment for the closure: // A closure that takes an oop in the old generation and, if it's // pointing into the young generation, dirties the corresponding // remembered set entry. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494755034 From ysr at openjdk.org Mon Feb 19 18:33:10 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 19 Feb 2024 18:33:10 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp In-Reply-To: References: Message-ID: On Fri, 16 Feb 2024 19:42:54 GMT, William Kemper wrote: > This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. Looks good, but left a few suggestions, one of which (if the suggested slight simplification were to work out) would affect the sister PR as well. Otherwise looks great; the refactoring and pulling the work out of line into its separate class like you've done here should make upstreaming easier. Thanks! src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 183: > 181: // b. Cancel all concurrent marks, if in progress > 182: if (heap->is_concurrent_mark_in_progress()) { > 183: // TODO: Send cancel_concurrent_mark upstream? Does it really not have it already? We have: void ShenandoahHeap::cancel_concurrent_mark() { _young_generation->cancel_marking(); _old_generation->cancel_marking(); _global_generation->cancel_marking(); ShenandoahBarrierSet::satb_mark_queue_set().abandon_partial_marking(); } whereas upstream has: // b. Cancel concurrent mark, if in progress if (heap->is_concurrent_mark_in_progress()) { ShenandoahConcurrentGC::cancel(); heap->set_concurrent_mark_in_progress(false); } Should probably be reconciled and upstreamed if not here, then in a separate but linked CR. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 460: > 458: > 459: template > 460: void ShenandoahPrepareForCompactionTask::prepare_for_compaction(ClosureType& cl, Looking at this method, it looks like it actually belongs as a public method in the closure, with that closure's methods invoked here becoming private to the closure. Makes for a narrower public API all around and keeps the loop in the right place. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 773: > 771: distribute_slices(worker_slices); > 772: > 773: // TODO: This is ResourceMark is missing upstream. May be roll it into the refactoring PR you have open and under review at https://github.com/openjdk/jdk/pull/17894/files#diff-d58d0fe2415e3fa64f687435baf73227fb4e3e9eee8ba52ba1f3bb8301437106 ? src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 126: > 124: size_t available() const override; > 125: size_t available_with_reserve() const; > 126: size_t used_and_wasted() const { Nit: technically `used_or_wasted()`, since space that is used is not wasted. src/hotspot/share/gc/shenandoah/shenandoahGenerationalFullGC.hpp line 46: > 44: static void account_for_region(ShenandoahHeapRegion* r, size_t ®ion_count, size_t ®ion_usage, size_t &humongous_waste); > 45: static void restore_top_before_promote(ShenandoahHeap* heap); > 46: static void maybe_coalesce_and_fill_region(ShenandoahHeapRegion* r); Please provide a 1-line documentation for each of these public API methods. src/hotspot/share/gc/shenandoah/shenandoahGenerationalFullGC.hpp line 49: > 47: }; > 48: > 49: class ShenandoahPrepareForGenerationalCompactionObjectClosure : public ObjectClosure { See comment in `ShenandoahPrepareForCompactionTask::prepare_for_compaction()`. I think this closure should include `prepare_for_compaction` in its public API, and full gc should just invoke that method of the closure directly. If this simplification works out, then it would also affect the sister upstream PR https://github.com/openjdk/jdk/pull/17894 similarly. src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 35: > 33: > 34: > 35: class ShenandoahSetRememberedCardsToDirtyClosure : public BasicOopIterateClosure { Nit: I realize this is the existing name of the closure, and you just moved it here, but consider the following renaming suggestion: `ShenandoahDirtyRememberedSetClosure` src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 47: > 45: template > 46: inline void work(T* p) { > 47: T o = RawAccess<>::oop_load(p); May be a paranoid assert (unless it slows things down too much): assert(_heap->is_in_old(p), "Expecting to get an old gen address"); src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 51: > 49: oop obj = CompressedOops::decode_not_null(o); > 50: if (_heap->is_in_young(obj)) { > 51: // Found interesting pointer. Mark the containing card as dirty. Nit: may be: // Dirty the card containing the cross-generational pointer. src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 411: > 409: ShenandoahHeap* heap = ShenandoahHeap::heap(); > 410: RememberedScanner* scanner = heap->card_scan(); > 411: ShenandoahSetRememberedCardsToDirtyClosure dirty_cards_for_interesting_pointers; Nit: I might rename "interesting" to something more concrete & meaningful, like "cross_generational", so: ShenandoahDirtyRememberedSetClosure dirty_cards_for_cross_generational_pointers; src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 452: > 450: } > 451: // else, this region is FREE or YOUNG or inactive and we can ignore it. > 452: // TODO: Assert this. I'd delete the TODO, since it would be tautological and not provide any additional value if you were to say: assert(some disjunction of conditions || !r->is_active(), "Error"); because of the last disjunct which is always true because of the: if (r->is_old() && r->is_active()) { src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 1051: > 1049: // After Full GC is done, reconstruct the remembered set by iterating over OLD regions, > 1050: // registering all objects between bottom() and top(), and setting remembered set cards to > 1051: // DIRTY if they hold interesting pointers. Nit: instead of: // ... setting remembered set cards to DIRTY if they hold interesting pointers perhaps: // ... dirty the cards containing cross-generational pointers. ------------- PR Review: https://git.openjdk.org/shenandoah/pull/398#pullrequestreview-1888718888 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494881735 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494910380 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494902839 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494802378 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494905029 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494913729 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494740029 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494762829 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494765314 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494774978 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494800047 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494782959 From ysr at openjdk.org Mon Feb 19 18:48:54 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 19 Feb 2024 18:48:54 GMT Subject: RFR: 8325807: Shenandoah: Refactor full gc in preparation for generational mode changes In-Reply-To: References: Message-ID: On Fri, 16 Feb 2024 19:49:07 GMT, William Kemper wrote: > Changes to prepare for generational mode: > * A `phase5_epilogue` method is added to run the final steps of the gc > * Prepare for mark operation is run by multiple worker threads > * `finish_region` method of compacting preparation closure is renamed to `finish` > * The prepare for compaction loop is extracted to a template method, parameterized on closure type src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 431: > 429: > 430: template > 431: void ShenandoahPrepareForCompactionTask::prepare_for_compaction(ClosureType& cl, See some related comments in sister PR: - https://github.com/openjdk/shenandoah/pull/398/files#r1494913729 - https://github.com/openjdk/shenandoah/pull/398/files#r1494910380 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17894#discussion_r1494924914 From ysr at openjdk.org Mon Feb 19 18:51:16 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 19 Feb 2024 18:51:16 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp In-Reply-To: References: Message-ID: <7W7mqoGMJ1FYZAbrtZH7KDAbjckED6AIkhAMoQ8jMvQ=.99ade932-8b0b-4085-b33f-340ba75a3ab5@github.com> On Mon, 19 Feb 2024 18:26:45 GMT, Y. Srinivas Ramakrishna wrote: >> This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalFullGC.hpp line 49: > >> 47: }; >> 48: >> 49: class ShenandoahPrepareForGenerationalCompactionObjectClosure : public ObjectClosure { > > See comment in `ShenandoahPrepareForCompactionTask::prepare_for_compaction()`. I think this closure should include `prepare_for_compaction` in its public API, and full gc should just invoke that method of the closure directly. > > If this simplification works out, then it would also affect the sister upstream PR https://github.com/openjdk/jdk/pull/17894 similarly. Left a link there as well: https://github.com/openjdk/jdk/pull/17894#discussion_r1494924914 ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1494929290 From ysr at openjdk.org Mon Feb 19 20:54:10 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 19 Feb 2024 20:54:10 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC In-Reply-To: References: <61XuQE3dTAW_6nPgdLbSY2VkTUZSpTExRA3CdgiSzdQ=.ea68357f-37f0-4d1b-a9d0-236266c8b46a@github.com> Message-ID: On Mon, 19 Feb 2024 17:41:00 GMT, Kelvin Nilsen wrote: >> If you feel more comfortable, I can put the MIN2 expression into the normal case handling. But I'll be wanting to take it out in the upcoming complementary PR. > > Context: xfer_limit will be zero if there is no "planned" GC idle time. The allocation runway goes to zero if we have experienced recent degens and/or full GCs, because penalties accumulate which cause us to immediately trigger young GCs. I am beginning to better understand what you were trying to achieve, but I am still not quite there. Is there a natural sensible limit at which `max_old_reserve` can be bounded? It would seem then that, since you were not previously bounding the computation of `max_old_reserve` in any manner and you don't want to bound it to `old_available + xfer_limit`, that a more natural and essentially largest possible value would be the sum of what young can promote and what old can evacuate, which would look something like `heap->max_capacity()`, since it would effectively be morally equivalent to imposing no limits on `max_old_reserve`. Alternatively, if you are considering changing this whole thing anyway, perhaps we just do that directly. If you expect that PR to take a while and you just want to restore old behaviour, I'd suggest bounding the calculation of `max_old_reserve` to `heap->max_capacity()`, since that is a natural limit irrespective of what SOERP happens to be (and not artificial and confusing like the one that you suggested covering that one case of SOERP sending the value to NaN but not otherwise bounding it and allowing it to grow arbitrarily large and wrapping around). Let me know if that makes sense. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/394#discussion_r1495004950 From ddong at openjdk.org Tue Feb 20 00:41:57 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 20 Feb 2024 00:41:57 GMT Subject: RFR: 8326111: JFR: Cleanup for JFR_ONLY [v2] In-Reply-To: References: Message-ID: On Sun, 18 Feb 2024 08:10:16 GMT, Denghui Dong wrote: >> Greetings, >> >> Could I have a review of this trivial change that cleans up code where used JFR_ONLY incorrectly? >> >> testing: build with --enable-jvm-feature-jfr and --disable-jvm-feature-jfr >> >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > more cleanup Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17903#issuecomment-1953317021 From ddong at openjdk.org Tue Feb 20 00:41:57 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 20 Feb 2024 00:41:57 GMT Subject: Integrated: 8326111: JFR: Cleanup for JFR_ONLY In-Reply-To: References: Message-ID: <62Lb_E6CpMkzkn_YBCdTteY2G0rKazp6lVoWeq2-9c0=.dfedb812-e3de-4632-8687-9d240dfebcd2@github.com> On Sun, 18 Feb 2024 07:54:57 GMT, Denghui Dong wrote: > Greetings, > > Could I have a review of this trivial change that cleans up code where used JFR_ONLY incorrectly? > > testing: build with --enable-jvm-feature-jfr and --disable-jvm-feature-jfr > > Denghui This pull request has now been integrated. Changeset: 7d32a1a8 Author: Denghui Dong URL: https://git.openjdk.org/jdk/commit/7d32a1a8293f6d82f4d5959a4c929f96244cc057 Stats: 13 lines in 7 files changed: 0 ins; 0 del; 13 mod 8326111: JFR: Cleanup for JFR_ONLY Reviewed-by: egahlin ------------- PR: https://git.openjdk.org/jdk/pull/17903 From ysr at openjdk.org Tue Feb 20 00:49:05 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 20 Feb 2024 00:49:05 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC In-Reply-To: References: <61XuQE3dTAW_6nPgdLbSY2VkTUZSpTExRA3CdgiSzdQ=.ea68357f-37f0-4d1b-a9d0-236266c8b46a@github.com> Message-ID: <142TqrkwJTq2_m311li4eBfR-ORcE3FWOmoEQAUBM1U=.74290f2b-17f7-41c7-b066-1e71bfbb6fab@github.com> On Mon, 19 Feb 2024 20:50:46 GMT, Y. Srinivas Ramakrishna wrote: > ... covering that one case of SOERP sending the value to NaN but not otherwise bounding it and allowing it to grow arbitrarily large and wrapping around). I realize that there is in fact a natural bound to that value when SOERP < 100, viz. when it's 99 (since it's not a float): `young_reserve * 99/(100-99)`, i.e. `99 * young_reserve`. I guess the simpler thing to do then is to just avoid this completely and declare `ShenandoahEvactReserve` to `range(1,99)` and be done, throwing awy the protection for the lone case of `SOERP=100` -- after all we don't allow `SOERP=0`, so by symmetry it looks like we shouldn't allow 100 either, just `range(1,99)`. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/394#discussion_r1495116929 From dholmes at openjdk.org Tue Feb 20 01:18:07 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Feb 2024 01:18:07 GMT Subject: RFR: 8326222: Fix copyright year in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Message-ID: Trivial fix to add back missing comma. Thanks ------------- Commit messages: - 8326222: Fix copyright here in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Changes: https://git.openjdk.org/jdk/pull/17920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17920&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326222 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17920/head:pull/17920 PR: https://git.openjdk.org/jdk/pull/17920 From jiefu at openjdk.org Tue Feb 20 01:23:56 2024 From: jiefu at openjdk.org (Jie Fu) Date: Tue, 20 Feb 2024 01:23:56 GMT Subject: RFR: 8326222: Fix copyright year in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 01:13:19 GMT, David Holmes wrote: > Trivial fix to add back missing comma. > > Thanks Looks good and trivial. Thanks. ------------- Marked as reviewed by jiefu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17920#pullrequestreview-1889345339 From mikael at openjdk.org Tue Feb 20 01:28:57 2024 From: mikael at openjdk.org (Mikael Vidstedt) Date: Tue, 20 Feb 2024 01:28:57 GMT Subject: RFR: 8326222: Fix copyright year in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 01:13:19 GMT, David Holmes wrote: > Trivial fix to add back missing comma. > > Thanks Marked as reviewed by mikael (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17920#pullrequestreview-1889347272 From dholmes at openjdk.org Tue Feb 20 01:28:58 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Feb 2024 01:28:58 GMT Subject: RFR: 8326222: Fix copyright year in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 01:21:38 GMT, Jie Fu wrote: >> Trivial fix to add back missing comma. >> >> Thanks > > Looks good and trivial. > Thanks. Thanks @DamonFool ------------- PR Comment: https://git.openjdk.org/jdk/pull/17920#issuecomment-1953344865 From dholmes at openjdk.org Tue Feb 20 01:28:58 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Feb 2024 01:28:58 GMT Subject: RFR: 8326222: Fix copyright year in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 01:24:47 GMT, Mikael Vidstedt wrote: >> Trivial fix to add back missing comma. >> >> Thanks > > Marked as reviewed by mikael (Reviewer). Thanks @vidmik ------------- PR Comment: https://git.openjdk.org/jdk/pull/17920#issuecomment-1953345335 From dholmes at openjdk.org Tue Feb 20 01:28:59 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 20 Feb 2024 01:28:59 GMT Subject: Integrated: 8326222: Fix copyright year in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 01:13:19 GMT, David Holmes wrote: > Trivial fix to add back missing comma. > > Thanks This pull request has now been integrated. Changeset: 69a11c7f Author: David Holmes URL: https://git.openjdk.org/jdk/commit/69a11c7f7ea7c4195a8ee56391bdf04c75bd8156 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8326222: Fix copyright year in src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Reviewed-by: jiefu, mikael ------------- PR: https://git.openjdk.org/jdk/pull/17920 From roland at openjdk.org Tue Feb 20 09:58:03 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 Feb 2024 09:58:03 GMT Subject: RFR: 8325372: C2 compilation event causes SIGSEV crash (unnecessary_acquire(Node const*)) in JDK 17.0.x Message-ID: After shenandoah barrier expansion, a shenandoah specific pass looks for heap stable checks that are back to back: if (heap_stable) { // fast path 1 } else { // slow path 1 } if (heap_stable) { // fast path 2 } else { // slow path 2 } and fuse them: if (heap_stable) { // fast path 1 // fast path 2 } else { // slow path 1 // slow path 2 } In the case of the failure, a `GetAndSetP` (or `GetAndSetN`) node is between the 2 heap_stable checks. The fusion of the 2 tests is implemented by taking advantage of the split if c2 optimization. But split if doesn't support having a `GetAndSet` node at the region where split if happens (that can only happen with shenandoah late barrier expansion). That causes the `GetAndSet` node to lose its `SCMemProj` which can then result in the `GetAndSet` being entirely removed. The fix I propose is to not perform the heap_stable fusion in this particular case. ------------- Commit messages: - test clean up - fix - test Changes: https://git.openjdk.org/jdk/pull/17926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17926&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325372 Stats: 102 lines in 4 files changed: 101 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17926/head:pull/17926 PR: https://git.openjdk.org/jdk/pull/17926 From shade at openjdk.org Tue Feb 20 11:58:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 20 Feb 2024 11:58:54 GMT Subject: RFR: 8325372: C2 compilation event causes SIGSEV crash (unnecessary_acquire(Node const*)) in JDK 17.0.x In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 09:46:31 GMT, Roland Westrelin wrote: > After shenandoah barrier expansion, a shenandoah specific pass looks > for heap stable checks that are back to back: > > > if (heap_stable) { > // fast path 1 > } else { > // slow path 1 > } > if (heap_stable) { > // fast path 2 > } else { > // slow path 2 > } > > > and fuse them: > > > if (heap_stable) { > // fast path 1 > // fast path 2 > } else { > // slow path 1 > // slow path 2 > } > > > In the case of the failure, a `GetAndSetP` (or `GetAndSetN`) node is > between the 2 heap_stable checks. The fusion of the 2 tests is > implemented by taking advantage of the split if c2 optimization. But > split if doesn't support having a `GetAndSet` node at the region where > split if happens (that can only happen with shenandoah late barrier > expansion). That causes the `GetAndSet` node to lose its `SCMemProj` > which can then result in the `GetAndSet` being entirely removed. > > The fix I propose is to not perform the heap_stable fusion in this > particular case. I suggest we rename the bug to: `8325372: Shenandoah: SIGSEGV crash in unnecessary_acquire due to LoadStore split through phi` src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 1737: > 1735: } > 1736: > 1737: bool ShenandoahBarrierC2Support::merge_point_safe(Node* region) { This should probably be `is_merge_point_safe`, since it answers `bool`? test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeLoadStoreMergedHeapStableTests.java line 33: > 31: * @run main/othervm -XX:+UseShenandoahGC -XX:-BackgroundCompilation TestUnsafeLoadStoreMergedHeapStableTests > 32: * > 33: * Superfluous: Suggestion: ------------- PR Review: https://git.openjdk.org/jdk/pull/17926#pullrequestreview-1890229258 PR Review Comment: https://git.openjdk.org/jdk/pull/17926#discussion_r1495688710 PR Review Comment: https://git.openjdk.org/jdk/pull/17926#discussion_r1495690169 From kdnilsen at openjdk.org Tue Feb 20 13:49:06 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 20 Feb 2024 13:49:06 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC In-Reply-To: <142TqrkwJTq2_m311li4eBfR-ORcE3FWOmoEQAUBM1U=.74290f2b-17f7-41c7-b066-1e71bfbb6fab@github.com> References: <61XuQE3dTAW_6nPgdLbSY2VkTUZSpTExRA3CdgiSzdQ=.ea68357f-37f0-4d1b-a9d0-236266c8b46a@github.com> <142TqrkwJTq2_m311li4eBfR-ORcE3FWOmoEQAUBM1U=.74290f2b-17f7-41c7-b066-1e71bfbb6fab@github.com> Message-ID: On Tue, 20 Feb 2024 00:46:04 GMT, Y. Srinivas Ramakrishna wrote: >> I am beginning to better understand what you were trying to achieve, but I am still not quite there. >> >> Is there a natural sensible limit at which `max_old_reserve` can be bounded? It would seem then that, since you were not previously bounding the computation of `max_old_reserve` in any manner and you don't want to bound it to `old_available + xfer_limit`, that a more natural and essentially largest possible value would be the sum of what young can promote and what old can evacuate, which would look something like `heap->max_capacity()`, since it would effectively be morally equivalent to imposing no limits on `max_old_reserve`. >> >> Alternatively, if you are considering changing this whole thing anyway, perhaps we just do that directly. If you expect that PR to take a while and you just want to restore old behaviour, I'd suggest bounding the calculation of `max_old_reserve` to `heap->max_capacity()`, since that is a natural limit irrespective of what SOERP happens to be (and not artificial and confusing like the one that you suggested covering that one case of SOERP sending the value to NaN but not otherwise bounding it and allowing it to grow arbitrarily large and wrapping around). >> >> In other words, I suggest using: >> >> const size_t max_old_reserve = (ShenandoahOldEvacRatioPercent == 100) ? >> heap->max_capacity() : MIN2((young_reserve * ShenandoahOldEvacRatioPercent) / (100 - ShenandoahOldEvacRatioPercent), >> heap->max_capacity()); >> >> >> Let me know if that makes sense. > >> ... covering that one case of SOERP sending the value to NaN but not otherwise bounding it and allowing it to grow arbitrarily large and wrapping around). > > I realize that there is in fact a natural bound to that value when SOERP < 100, viz. when it's 99 (since it's not a float): `young_reserve * 99/(100-99)`, i.e. `99 * young_reserve`. I guess the simpler thing to do then is to just avoid this completely and declare `ShenandoahEvactReserve` to `range(1,99)` and be done, throwing awy the protection for the lone case of `SOERP=100` -- after all we don't allow `SOERP=0`, so by symmetry it looks like we shouldn't allow 100 either, just `range(1,99)`. Thanks for these suggestions. I'll see if I can stabilize a solution that works for all cases. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/394#discussion_r1495851540 From roland at openjdk.org Tue Feb 20 15:17:10 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 Feb 2024 15:17:10 GMT Subject: RFR: 8325372: Shenandoah: SIGSEGV crash in unnecessary_acquire due to LoadStore split through phi [v2] In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 11:49:27 GMT, Aleksey Shipilev wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeLoadStoreMergedHeapStableTests.java >> >> Co-authored-by: Aleksey Shipil?v > > src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 1737: > >> 1735: } >> 1736: >> 1737: bool ShenandoahBarrierC2Support::merge_point_safe(Node* region) { > > This should probably be `is_merge_point_safe`, since it answers `bool`? There's an existing `merge_point_safe` method that validate merge points for split if (loopopts.cpp). That's why I named it like that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17926#discussion_r1495996962 From roland at openjdk.org Tue Feb 20 15:17:10 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 Feb 2024 15:17:10 GMT Subject: RFR: 8325372: Shenandoah: SIGSEGV crash in unnecessary_acquire due to LoadStore split through phi [v2] In-Reply-To: References: Message-ID: > After shenandoah barrier expansion, a shenandoah specific pass looks > for heap stable checks that are back to back: > > > if (heap_stable) { > // fast path 1 > } else { > // slow path 1 > } > if (heap_stable) { > // fast path 2 > } else { > // slow path 2 > } > > > and fuse them: > > > if (heap_stable) { > // fast path 1 > // fast path 2 > } else { > // slow path 1 > // slow path 2 > } > > > In the case of the failure, a `GetAndSetP` (or `GetAndSetN`) node is > between the 2 heap_stable checks. The fusion of the 2 tests is > implemented by taking advantage of the split if c2 optimization. But > split if doesn't support having a `GetAndSet` node at the region where > split if happens (that can only happen with shenandoah late barrier > expansion). That causes the `GetAndSet` node to lose its `SCMemProj` > which can then result in the `GetAndSet` being entirely removed. > > The fix I propose is to not perform the heap_stable fusion in this > particular case. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeLoadStoreMergedHeapStableTests.java Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17926/files - new: https://git.openjdk.org/jdk/pull/17926/files/6866a5ee..cb1ace1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17926&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17926&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17926/head:pull/17926 PR: https://git.openjdk.org/jdk/pull/17926 From shade at openjdk.org Tue Feb 20 15:32:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 20 Feb 2024 15:32:54 GMT Subject: RFR: 8325372: Shenandoah: SIGSEGV crash in unnecessary_acquire due to LoadStore split through phi [v2] In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 15:17:10 GMT, Roland Westrelin wrote: >> After shenandoah barrier expansion, a shenandoah specific pass looks >> for heap stable checks that are back to back: >> >> >> if (heap_stable) { >> // fast path 1 >> } else { >> // slow path 1 >> } >> if (heap_stable) { >> // fast path 2 >> } else { >> // slow path 2 >> } >> >> >> and fuse them: >> >> >> if (heap_stable) { >> // fast path 1 >> // fast path 2 >> } else { >> // slow path 1 >> // slow path 2 >> } >> >> >> In the case of the failure, a `GetAndSetP` (or `GetAndSetN`) node is >> between the 2 heap_stable checks. The fusion of the 2 tests is >> implemented by taking advantage of the split if c2 optimization. But >> split if doesn't support having a `GetAndSet` node at the region where >> split if happens (that can only happen with shenandoah late barrier >> expansion). That causes the `GetAndSet` node to lose its `SCMemProj` >> which can then result in the `GetAndSet` being entirely removed. >> >> The fix I propose is to not perform the heap_stable fusion in this >> particular case. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeLoadStoreMergedHeapStableTests.java > > Co-authored-by: Aleksey Shipil?v Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17926#pullrequestreview-1890751273 From shade at openjdk.org Tue Feb 20 15:32:55 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 20 Feb 2024 15:32:55 GMT Subject: RFR: 8325372: Shenandoah: SIGSEGV crash in unnecessary_acquire due to LoadStore split through phi [v2] In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 15:14:39 GMT, Roland Westrelin wrote: >> src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 1737: >> >>> 1735: } >>> 1736: >>> 1737: bool ShenandoahBarrierC2Support::merge_point_safe(Node* region) { >> >> This should probably be `is_merge_point_safe`, since it answers `bool`? > > There's an existing `merge_point_safe` method that validate merge points for split if (loopopts.cpp). That's why I named it like that. Ah, okay then! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17926#discussion_r1496026273 From kdnilsen at openjdk.org Tue Feb 20 16:50:24 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 20 Feb 2024 16:50:24 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC [v2] In-Reply-To: References: Message-ID: > At the end of GC, we set aside collector reserves to satisfy anticipated needs of the next GC. > > This PR reverts a change that accidentally prevents old-gen from being enlarged by this action. The observed failure condition was that mixed evacuations were not able to be performed, because old-gen was not large enough to receive the results of the desired evacuations. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Refine calculation of max_old_reserve ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/394/files - new: https://git.openjdk.org/shenandoah/pull/394/files/f583f713..e0761e82 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=394&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=394&range=00-01 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/shenandoah/pull/394.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/394/head:pull/394 PR: https://git.openjdk.org/shenandoah/pull/394 From kdnilsen at openjdk.org Tue Feb 20 16:50:24 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 20 Feb 2024 16:50:24 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC [v2] In-Reply-To: References: <61XuQE3dTAW_6nPgdLbSY2VkTUZSpTExRA3CdgiSzdQ=.ea68357f-37f0-4d1b-a9d0-236266c8b46a@github.com> <142TqrkwJTq2_m311li4eBfR-ORcE3FWOmoEQAUBM1U=.74290f2b-17f7-41c7-b066-1e71bfbb6fab@github.com> Message-ID: On Tue, 20 Feb 2024 13:46:24 GMT, Kelvin Nilsen wrote: >>> ... covering that one case of SOERP sending the value to NaN but not otherwise bounding it and allowing it to grow arbitrarily large and wrapping around). >> >> I realize that there is in fact a natural bound to that value when SOERP < 100, viz. when it's 99 (since it's not a float): `young_reserve * 99/(100-99)`, i.e. `99 * young_reserve`. I guess the simpler thing to do then is to just avoid this completely and declare `ShenandoahEvactReserve` to `range(1,99)` and be done, throwing awy the protection for the lone case of `SOERP=100` -- after all we don't allow `SOERP=0`, so by symmetry it looks like we shouldn't allow 100 either, just `range(1,99)`. > > Thanks for these suggestions. I'll see if I can stabilize a solution that works for all cases. I've committed a new revision of this code. Does this make it more clear? (I'll still keep the sharing of Collector reserve code in a different PR. That's a bit more subtle, and my first attempt at that code has introduced some regressions, which I'm debugging.) ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/394#discussion_r1496158166 From kdnilsen at openjdk.org Tue Feb 20 23:36:10 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 20 Feb 2024 23:36:10 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v13] In-Reply-To: References: Message-ID: > Several objectives: > 1. Reduce humongous allocation failures by segregating regular regions from humongous regions > 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB > 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations > 4. Treat collector reserves as available for Mutator allocations after evacuation completes > 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah > > On internal performance pipelines, this change shows: > > 1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions. > 2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%). This system CPU time increase appears to correlate with increased page faults and/or rss. > 3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood) > 4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs) > > Shenandoah > ------------------------------------------------------------------------------------------------------- > +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000 > Control: 819938.875 (+/-5724.56 ) 40 > Test: 2185552.625 (+/-26378.64 ) 20 > > +166.16% scimark.sparse.large/rss_max p=0.00000 > Control: 3285226.375 (+/-22812.93 ) 40 > Test: 8743881.500 (+/-104906.69 ) 20 > > +164.78% sunflow/cpu_system p=0.00000 > Control: 1.280s (+/- 0.10s ) 40 > Test: 3.390s (+/- 0.13s ) 20 > > +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000 > Control: 3.259 (+/- 1.46 ) 33 > Test: 8.125 (+/- 2.05 ) 20 > > +143.75% pmd/major_page_fault_count p=0.03622 > Control: 1.000 (+/- 0.00 ) 40 > Test: 2.438 (+/- 2.59 ) 20 > > +80.22% lusearch/minor_page_fault_count p=0.00000 > Control: 2043930.938 (+/-4777.14 ) 40 > Test: 3683477.625 (+/-5650.29 ) 20 > > +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000 > Control: 697899.156 (+/-3457.82 ) 40 > Test: 1050363.812 (+/-175237.63 ) 20 > > +49.97% scimark.sparse.small/rss_max p=0.00000 > Control: 277075... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix an error in search for contiguous regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17561/files - new: https://git.openjdk.org/jdk/pull/17561/files/daacbaab..3ec905e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=11-12 Stats: 306 lines in 2 files changed: 135 ins; 135 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/17561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17561/head:pull/17561 PR: https://git.openjdk.org/jdk/pull/17561 From rkennke at openjdk.org Wed Feb 21 10:37:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 Feb 2024 10:37:55 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM [v2] In-Reply-To: References: Message-ID: <6aUvvUzNNQYrJK18_CzI75aoKXY8NiH8H2cEjaAYrEA=.c37f789a-42a9-45cd-b45d-ba42e3827296@github.com> On Thu, 15 Feb 2024 12:02:29 GMT, Aleksey Shipilev wrote: >> `ShenandoahLock` is a spinlock that is supposed to guard heap state. That lock is normally only lightly contended, as threads normally only allocate large TLABs/GCLABs. So we just summarily delegated to `Thread::{SpinAcquire,SpinRelease}`, which spins a bit, and then starts going to sleep/yield dance on contention. >> >> This does not work well when there are lots of threads near the OOM conditions. Then, most of these threads would fail to allocate the TLAB, go for out-of-TLAB alloc, start lock acquisition, and spend _a lot_ of time trying to acquire the lock. The handshake/safepoint would think those threads are running, even when they are actually yielding or sleeping. As seen in bug report, a handshake operation over many such threads could then take hundreds of seconds. >> >> The solution is to notify VM that we are blocking before going for `sleep` or `yield`. This is similar to what other VM code does near such yields. This involves state transitions, so it is only cheap to do near the actual blocking. Protecting the whole lock with VM transition would be very slow. >> >> I also de-uglified bits of adjacent code. >> >> Additional testing: >> - [x] Original Extremem reproducer does not have outliers anymore >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Linux x86_64 server fastdebug, `all` passes with `-XX:+UseShenandoahGC` (some old failures, but quite a few "timeouts" also disappear) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Looks good. I have one question (and leave it up to you to make the change). Thanks, Roman src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 39: > 37: class ShenandoahNoBlockOp : public StackObj { > 38: public: > 39: ShenandoahNoBlockOp(JavaThread* java_thread) { Should this be inline? Or would compiler be clever enough to not emit anything there? Edit: or doesn't it matter, because we are going to yield anyway? ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17813#pullrequestreview-1892683211 PR Review Comment: https://git.openjdk.org/jdk/pull/17813#discussion_r1497259792 From shade at openjdk.org Wed Feb 21 10:37:55 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Feb 2024 10:37:55 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM [v2] In-Reply-To: <6aUvvUzNNQYrJK18_CzI75aoKXY8NiH8H2cEjaAYrEA=.c37f789a-42a9-45cd-b45d-ba42e3827296@github.com> References: <6aUvvUzNNQYrJK18_CzI75aoKXY8NiH8H2cEjaAYrEA=.c37f789a-42a9-45cd-b45d-ba42e3827296@github.com> Message-ID: On Wed, 21 Feb 2024 10:23:38 GMT, Roman Kennke wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 39: > >> 37: class ShenandoahNoBlockOp : public StackObj { >> 38: public: >> 39: ShenandoahNoBlockOp(JavaThread* java_thread) { > > Should this be inline? Or would compiler be clever enough to not emit anything there? > Edit: or doesn't it matter, because we are going to yield anyway? I don't think it matters, because the actual yield would dominate. Also, I don't think we do `inline` for RAII constructors... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17813#discussion_r1497282609 From shade at openjdk.org Wed Feb 21 11:52:00 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Feb 2024 11:52:00 GMT Subject: RFR: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM [v2] In-Reply-To: References: Message-ID: On Thu, 15 Feb 2024 12:02:29 GMT, Aleksey Shipilev wrote: >> `ShenandoahLock` is a spinlock that is supposed to guard heap state. That lock is normally only lightly contended, as threads normally only allocate large TLABs/GCLABs. So we just summarily delegated to `Thread::{SpinAcquire,SpinRelease}`, which spins a bit, and then starts going to sleep/yield dance on contention. >> >> This does not work well when there are lots of threads near the OOM conditions. Then, most of these threads would fail to allocate the TLAB, go for out-of-TLAB alloc, start lock acquisition, and spend _a lot_ of time trying to acquire the lock. The handshake/safepoint would think those threads are running, even when they are actually yielding or sleeping. As seen in bug report, a handshake operation over many such threads could then take hundreds of seconds. >> >> The solution is to notify VM that we are blocking before going for `sleep` or `yield`. This is similar to what other VM code does near such yields. This involves state transitions, so it is only cheap to do near the actual blocking. Protecting the whole lock with VM transition would be very slow. >> >> I also de-uglified bits of adjacent code. >> >> Additional testing: >> - [x] Original Extremem reproducer does not have outliers anymore >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Linux x86_64 server fastdebug, `all` passes with `-XX:+UseShenandoahGC` (some old failures, but quite a few "timeouts" also disappear) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17813#issuecomment-1956479910 From shade at openjdk.org Wed Feb 21 11:52:00 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Feb 2024 11:52:00 GMT Subject: Integrated: 8325587: Shenandoah: ShenandoahLock should allow blocking in VM In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 17:40:00 GMT, Aleksey Shipilev wrote: > `ShenandoahLock` is a spinlock that is supposed to guard heap state. That lock is normally only lightly contended, as threads normally only allocate large TLABs/GCLABs. So we just summarily delegated to `Thread::{SpinAcquire,SpinRelease}`, which spins a bit, and then starts going to sleep/yield dance on contention. > > This does not work well when there are lots of threads near the OOM conditions. Then, most of these threads would fail to allocate the TLAB, go for out-of-TLAB alloc, start lock acquisition, and spend _a lot_ of time trying to acquire the lock. The handshake/safepoint would think those threads are running, even when they are actually yielding or sleeping. As seen in bug report, a handshake operation over many such threads could then take hundreds of seconds. > > The solution is to notify VM that we are blocking before going for `sleep` or `yield`. This is similar to what other VM code does near such yields. This involves state transitions, so it is only cheap to do near the actual blocking. Protecting the whole lock with VM transition would be very slow. > > I also de-uglified bits of adjacent code. > > Additional testing: > - [x] Original Extremem reproducer does not have outliers anymore > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `all` passes with `-XX:+UseShenandoahGC` (some old failures, but quite a few "timeouts" also disappear) This pull request has now been integrated. Changeset: 492e8bf5 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/492e8bf563135d27b46fde198880e62d5f1940e8 Stats: 67 lines in 3 files changed: 48 ins; 1 del; 18 mod 8325587: Shenandoah: ShenandoahLock should allow blocking in VM Reviewed-by: rehn, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/17813 From kdnilsen at openjdk.org Wed Feb 21 19:15:56 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 21 Feb 2024 19:15:56 GMT Subject: RFR: 8325807: Shenandoah: Refactor full gc in preparation for generational mode changes In-Reply-To: References: Message-ID: On Fri, 16 Feb 2024 19:49:07 GMT, William Kemper wrote: > Changes to prepare for generational mode: > * A `phase5_epilogue` method is added to run the final steps of the gc > * Prepare for mark operation is run by multiple worker threads > * `finish_region` method of compacting preparation closure is renamed to `finish` > * The prepare for compaction loop is extracted to a template method, parameterized on closure type Marked as reviewed by kdnilsen (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/17894#pullrequestreview-1894217524 From shade at openjdk.org Wed Feb 21 19:58:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Feb 2024 19:58:54 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v6] In-Reply-To: References: Message-ID: On Sat, 17 Feb 2024 01:47:09 GMT, Y. Srinivas Ramakrishna wrote: >> 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it > > Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into generation_type > - Changes from review: Adjust order of parms in functions so they are consistent with their > template parameter order and contiguity, per convention. > - Merge branch 'master' into generation_type > - Merge branch 'master' into generation_type > - Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). > - Merge branch 'master' into generation_type > - Introduce ShenandoahGenerationType and templatize most closures with it. > The template expands for only the NON_GEN type for the non-generational > version of Shenandoah currently, and will in the future accomodate > Generational Shenandoah. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17815#pullrequestreview-1894287619 From rkennke at openjdk.org Wed Feb 21 20:27:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 Feb 2024 20:27:55 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v6] In-Reply-To: References: Message-ID: On Sat, 17 Feb 2024 01:47:09 GMT, Y. Srinivas Ramakrishna wrote: >> 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it > > Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into generation_type > - Changes from review: Adjust order of parms in functions so they are consistent with their > template parameter order and contiguity, per convention. > - Merge branch 'master' into generation_type > - Merge branch 'master' into generation_type > - Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). > - Merge branch 'master' into generation_type > - Introduce ShenandoahGenerationType and templatize most closures with it. > The template expands for only the NON_GEN type for the non-generational > version of Shenandoah currently, and will in the future accomodate > Generational Shenandoah. Maybe as a compromise we can review this PR now but only ship it with the rest of Generational Shenandoah, when it arrives? Shouldn't be hard to maintain the PR until then, we don't currently expect many changes in the same code. WDYT? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17815#issuecomment-1957846624 From ysr at openjdk.org Wed Feb 21 20:40:06 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 21 Feb 2024 20:40:06 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v7] In-Reply-To: References: Message-ID: > 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'master' into generation_type - Merge branch 'master' into generation_type - Changes from review: Adjust order of parms in functions so they are consistent with their template parameter order and contiguity, per convention. - Merge branch 'master' into generation_type - Merge branch 'master' into generation_type - Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). - Merge branch 'master' into generation_type - Introduce ShenandoahGenerationType and templatize most closures with it. The template expands for only the NON_GEN type for the non-generational version of Shenandoah currently, and will in the future accomodate Generational Shenandoah. ------------- Changes: https://git.openjdk.org/jdk/pull/17815/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17815&range=06 Stats: 124 lines in 9 files changed: 71 ins; 3 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/17815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17815/head:pull/17815 PR: https://git.openjdk.org/jdk/pull/17815 From ysr at openjdk.org Wed Feb 21 22:32:54 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 21 Feb 2024 22:32:54 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v6] In-Reply-To: References: Message-ID: On Wed, 21 Feb 2024 20:25:01 GMT, Roman Kennke wrote: > Maybe as a compromise we can review this PR now but only ship it with the rest of Generational Shenandoah, when it arrives? Shouldn't be hard to maintain the PR until then, we don't currently expect many changes in the same code. WDYT? It'd be nicer to be able to ship this so: 1. it gets regular testing on its own 2. it can be shared with subsequent changes in preparation of Generational, which you may have seen more of recently 3. it has the advantage of digestible small incremental chunks of change (pun not intended) that can be easily and independently reviewed on their own without having to produce a series of dependent PRs and maintaining them on a private branch as subsequent work proceeds Meanwhile, I'll discuss other potential alternatives with the team and think more about the approach I've taken in this regard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17815#issuecomment-1958153363 From mdoerr at openjdk.org Thu Feb 22 04:29:57 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 22 Feb 2024 04:29:57 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission [v8] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 17:46:46 GMT, Cesar Soares Lucas wrote: >> # Description >> >> Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. >> >> Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. >> >> # Help Needed for Testing >> >> I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. >> >> # Tier-1 Testing status >> >> | | Win | Mac | Linux | >> |----------|---------|---------|---------| >> | ARM64 | ? | ? | | >> | ARM32 | n/a | n/a | | >> | x86 | | | ? | >> | x64 | ? | ? | ? | >> | PPC64 | n/a | n/a | | >> | S390x | n/a | n/a | | >> | RiscV | n/a | n/a | ? | > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Catching up with origin/master > - Catch up with origin/master > - Merge with origin/master > - Fix build, copyright dates, m4 files. > - Fix merge > - Catch up with master branch. > > Merge remote-tracking branch 'origin/master' into reuse-macroasm > - Some inst_mark fixes; Catch up with master. > - Catch up with changes on master > - Reuse same C2_MacroAssembler object to emit instructions. Seems like this PR got stuck because nobody has time to review such a large change. Maybe we should split the review work? I could review the PPC64 part if that helps. Unfortunately, there are whitespace errors and a merge-conflict which need to get resolved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16484#issuecomment-1958677211 From kvn at openjdk.org Thu Feb 22 07:49:57 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 Feb 2024 07:49:57 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission [v8] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 17:46:46 GMT, Cesar Soares Lucas wrote: >> # Description >> >> Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. >> >> Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. >> >> # Help Needed for Testing >> >> I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. >> >> # Tier-1 Testing status >> >> | | Win | Mac | Linux | >> |----------|---------|---------|---------| >> | ARM64 | ? | ? | | >> | ARM32 | n/a | n/a | | >> | x86 | | | ? | >> | x64 | ? | ? | ? | >> | PPC64 | n/a | n/a | | >> | S390x | n/a | n/a | | >> | RiscV | n/a | n/a | ? | > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Catching up with origin/master > - Catch up with origin/master > - Merge with origin/master > - Fix build, copyright dates, m4 files. > - Fix merge > - Catch up with master branch. > > Merge remote-tracking branch 'origin/master' into reuse-macroasm > - Some inst_mark fixes; Catch up with master. > - Catch up with changes on master > - Reuse same C2_MacroAssembler object to emit instructions. Yes, I have the same issue. When I have time to test it - it does not merge. I reviewed x86. I think `NULL` -> `nullptr` can be removed from these changes since they are already fixed in main sources. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16484#issuecomment-1958878271 From rkennke at openjdk.org Thu Feb 22 11:29:56 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Feb 2024 11:29:56 GMT Subject: RFR: 8325372: Shenandoah: SIGSEGV crash in unnecessary_acquire due to LoadStore split through phi [v2] In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 15:17:10 GMT, Roland Westrelin wrote: >> After shenandoah barrier expansion, a shenandoah specific pass looks >> for heap stable checks that are back to back: >> >> >> if (heap_stable) { >> // fast path 1 >> } else { >> // slow path 1 >> } >> if (heap_stable) { >> // fast path 2 >> } else { >> // slow path 2 >> } >> >> >> and fuse them: >> >> >> if (heap_stable) { >> // fast path 1 >> // fast path 2 >> } else { >> // slow path 1 >> // slow path 2 >> } >> >> >> In the case of the failure, a `GetAndSetP` (or `GetAndSetN`) node is >> between the 2 heap_stable checks. The fusion of the 2 tests is >> implemented by taking advantage of the split if c2 optimization. But >> split if doesn't support having a `GetAndSet` node at the region where >> split if happens (that can only happen with shenandoah late barrier >> expansion). That causes the `GetAndSet` node to lose its `SCMemProj` >> which can then result in the `GetAndSet` being entirely removed. >> >> The fix I propose is to not perform the heap_stable fusion in this >> particular case. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeLoadStoreMergedHeapStableTests.java > > Co-authored-by: Aleksey Shipil?v Looks good to me, thank you! Roman ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17926#pullrequestreview-1895619215 From wkemper at openjdk.org Thu Feb 22 14:23:21 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 22 Feb 2024 14:23:21 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master Message-ID: Merges tag jdk-21.0.3+4 ------------- Commit messages: - 8322790: RISC-V: Tune costs for shuffles with no conversion - 8324874: AArch64: crypto pmull based CRC32/CRC32C intrinsics clobber V8-V15 registers - 8318737: Fallback linker passes bad JNI handle - 8320890: [AIX] Find a better way to mimic dl handle equality - 8325672: C2: allocate PhaseIdealLoop::_loop_or_ctrl from C->comp_arena() - 8315891: java/foreign/TestLinker.java failed with "error occurred while instantiating class TestLinker: null" - 8315602: Open source swing security manager test - 8324347: Enable "maybe-uninitialized" warning for FreeType 2.13.1 - 8318603: Parallelize sun/java2d/marlin/ClipShapeTest.java - 8009550: PlatformPCSC should load versioned so - ... and 18 more: https://git.openjdk.org/shenandoah-jdk21u/compare/57956950...4fcc5c74 The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/22/files Stats: 1841 lines in 135 files changed: 1278 ins; 159 del; 404 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/22.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/22/head:pull/22 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/22 From ysr at openjdk.org Thu Feb 22 23:49:10 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 22 Feb 2024 23:49:10 GMT Subject: RFR: 8325670: GenShen: Allow old to expand at end of each GC [v2] In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 16:50:24 GMT, Kelvin Nilsen wrote: >> At the end of GC, we set aside collector reserves to satisfy anticipated needs of the next GC. >> >> This PR reverts a change that accidentally prevents old-gen from being enlarged by this action. The observed failure condition was that mixed evacuations were not able to be performed, because old-gen was not large enough to receive the results of the desired evacuations. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Refine calculation of max_old_reserve LGTM! ------------- Marked as reviewed by ysr (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/394#pullrequestreview-1897176282 From thartmann at openjdk.org Fri Feb 23 06:13:56 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 Feb 2024 06:13:56 GMT Subject: RFR: 8325372: Shenandoah: SIGSEGV crash in unnecessary_acquire due to LoadStore split through phi [v2] In-Reply-To: References: Message-ID: <0K9vRZdRl-RwkY3k1omPxgUGX7EK3ZXMFLwylJPN1Yo=.6053aa65-1f2d-4a57-80fc-3e7de7b9f6f6@github.com> On Tue, 20 Feb 2024 15:17:10 GMT, Roland Westrelin wrote: >> After shenandoah barrier expansion, a shenandoah specific pass looks >> for heap stable checks that are back to back: >> >> >> if (heap_stable) { >> // fast path 1 >> } else { >> // slow path 1 >> } >> if (heap_stable) { >> // fast path 2 >> } else { >> // slow path 2 >> } >> >> >> and fuse them: >> >> >> if (heap_stable) { >> // fast path 1 >> // fast path 2 >> } else { >> // slow path 1 >> // slow path 2 >> } >> >> >> In the case of the failure, a `GetAndSetP` (or `GetAndSetN`) node is >> between the 2 heap_stable checks. The fusion of the 2 tests is >> implemented by taking advantage of the split if c2 optimization. But >> split if doesn't support having a `GetAndSet` node at the region where >> split if happens (that can only happen with shenandoah late barrier >> expansion). That causes the `GetAndSet` node to lose its `SCMemProj` >> which can then result in the `GetAndSet` being entirely removed. >> >> The fix I propose is to not perform the heap_stable fusion in this >> particular case. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeLoadStoreMergedHeapStableTests.java > > Co-authored-by: Aleksey Shipil?v Shared code changes look good. Testing passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17926#pullrequestreview-1897425315 From roland at openjdk.org Fri Feb 23 10:12:03 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 23 Feb 2024 10:12:03 GMT Subject: RFR: 8325372: Shenandoah: SIGSEGV crash in unnecessary_acquire due to LoadStore split through phi [v2] In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 15:30:03 GMT, Aleksey Shipilev wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/gc/shenandoah/compiler/TestUnsafeLoadStoreMergedHeapStableTests.java >> >> Co-authored-by: Aleksey Shipil?v > > Marked as reviewed by shade (Reviewer). @shipilev @rkennke @TobiHartmann thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/17926#issuecomment-1961048423 From roland at openjdk.org Fri Feb 23 10:12:04 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 23 Feb 2024 10:12:04 GMT Subject: Integrated: 8325372: Shenandoah: SIGSEGV crash in unnecessary_acquire due to LoadStore split through phi In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 09:46:31 GMT, Roland Westrelin wrote: > After shenandoah barrier expansion, a shenandoah specific pass looks > for heap stable checks that are back to back: > > > if (heap_stable) { > // fast path 1 > } else { > // slow path 1 > } > if (heap_stable) { > // fast path 2 > } else { > // slow path 2 > } > > > and fuse them: > > > if (heap_stable) { > // fast path 1 > // fast path 2 > } else { > // slow path 1 > // slow path 2 > } > > > In the case of the failure, a `GetAndSetP` (or `GetAndSetN`) node is > between the 2 heap_stable checks. The fusion of the 2 tests is > implemented by taking advantage of the split if c2 optimization. But > split if doesn't support having a `GetAndSet` node at the region where > split if happens (that can only happen with shenandoah late barrier > expansion). That causes the `GetAndSet` node to lose its `SCMemProj` > which can then result in the `GetAndSet` being entirely removed. > > The fix I propose is to not perform the heap_stable fusion in this > particular case. This pull request has now been integrated. Changeset: 5d414da5 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/5d414da50459b7a1e6f0f537ff3b318854b2c427 Stats: 100 lines in 4 files changed: 99 ins; 0 del; 1 mod 8325372: Shenandoah: SIGSEGV crash in unnecessary_acquire due to LoadStore split through phi Reviewed-by: shade, rkennke, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17926 From wkemper at openjdk.org Fri Feb 23 14:15:39 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 23 Feb 2024 14:15:39 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Merges tag jdk-23+11 ------------- Commit messages: - 8326461: tools/jlink/CheckExecutable.java fails as .debuginfo files are not executable - 8325870: Zap end padding bits for ArrayOops in non-release builds - 8326414: Serial: Inline SerialHeap::create_rem_set - 8323695: RenderPerf (2D) enhancements (23.12) - 8324243: Compilation failures in java.desktop module with gcc 14 - 8325342: Remove unneeded exceptions in compare.sh - 8326158: Javadoc for java.time.DayOfWeek#minus(long) - 8326351: Update the Zlib version in open/src/java.base/share/legal/zlib.md to 1.3.1 - 8326235: RISC-V: Size CodeCache for short calls encoding - 8326412: debuginfo files should not have executable bit set - ... and 82 more: https://git.openjdk.org/shenandoah/compare/8cb9b479...cc1e216e The webrev contains the conflicts with master: - merge conflicts: https://webrevs.openjdk.org/?repo=shenandoah&pr=399&range=00.conflicts Changes: https://git.openjdk.org/shenandoah/pull/399/files Stats: 12283 lines in 327 files changed: 6367 ins; 3630 del; 2286 mod Patch: https://git.openjdk.org/shenandoah/pull/399.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/399/head:pull/399 PR: https://git.openjdk.org/shenandoah/pull/399 From ysr at openjdk.org Fri Feb 23 18:37:57 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 23 Feb 2024 18:37:57 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v7] In-Reply-To: References: Message-ID: On Wed, 21 Feb 2024 20:40:06 GMT, Y. Srinivas Ramakrishna wrote: >> 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it > > Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into generation_type > - Merge branch 'master' into generation_type > - Changes from review: Adjust order of parms in functions so they are consistent with their > template parameter order and contiguity, per convention. > - Merge branch 'master' into generation_type > - Merge branch 'master' into generation_type > - Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). > - Merge branch 'master' into generation_type > - Introduce ShenandoahGenerationType and templatize most closures with it. > The template expands for only the NON_GEN type for the non-generational > version of Shenandoah currently, and will in the future accomodate > Generational Shenandoah. Followed up offline w/Roman, and will check these in later today after a final round of merge-&-test. Thanks everyone for your reviews and input! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17815#issuecomment-1961809175 From ysr at openjdk.org Fri Feb 23 18:45:19 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 23 Feb 2024 18:45:19 GMT Subject: RFR: 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it [v8] In-Reply-To: References: Message-ID: > 8325671: Shenandoah: Introduce a ShenandoahGenerationType and templatize certain marking closures with it Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'master' into generation_type - Merge branch 'master' into generation_type - Merge branch 'master' into generation_type - Changes from review: Adjust order of parms in functions so they are consistent with their template parameter order and contiguity, per convention. - Merge branch 'master' into generation_type - Merge branch 'master' into generation_type - Missing #include of ShenandoahGenerationType (albeit satisfied transitively via other include). - Merge branch 'master' into generation_type - Introduce ShenandoahGenerationType and templatize most closures with it. The template expands for only the NON_GEN type for the non-generational version of Shenandoah currently, and will in the future accomodate Generational Shenandoah. ------------- Changes: https://git.openjdk.org/jdk/pull/17815/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17815&range=07 Stats: 124 lines in 9 files changed: 71 ins; 3 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/17815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17815/head:pull/17815 PR: https://git.openjdk.org/jdk/pull/17815 From fw at deneb.enyo.de Sun Feb 25 18:56:34 2024 From: fw at deneb.enyo.de (Florian Weimer) Date: Sun, 25 Feb 2024 19:56:34 +0100 Subject: Load reference barriers and safepoints Message-ID: <878r38s78d.fsf@mid.deneb.enyo.de> Do load reference barriers may reach a safepoint? Or slightly different question: Does Hotspot need to keep an accurate stack map at the point of the barrier? I'm just curious, no particular reasons for me to ask. From kdnilsen at openjdk.org Mon Feb 26 00:26:18 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 26 Feb 2024 00:26:18 GMT Subject: RFR: 8326626: GenShen: Remove dead code associated with non-elastic TLABSRemove dead inelastic tlab code Message-ID: Remove extraneous remnant of code that is no longer needed because all LABs are not elastic. ------------- Commit messages: - Remove dead inelastic plab code - Revert "Remove dead code for inelastic plabs" - Remove dead code for inelastic plabs Changes: https://git.openjdk.org/shenandoah/pull/400/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=400&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326626 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/400.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/400/head:pull/400 PR: https://git.openjdk.org/shenandoah/pull/400 From maoliang.ml at alibaba-inc.com Mon Feb 26 02:24:24 2024 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Mon, 26 Feb 2024 10:24:24 +0800 Subject: =?UTF-8?B?UmU6IExvYWQgcmVmZXJlbmNlIGJhcnJpZXJzIGFuZCBzYWZlcG9pbnRz?= In-Reply-To: <878r38s78d.fsf@mid.deneb.enyo.de> References: <878r38s78d.fsf@mid.deneb.enyo.de> Message-ID: No. Load reference barriers don't have oopmap. And the slow path runtime call of LRB is a leaf call which doesn't have oopmap and cannot suspend in safepoint. Thanks, Liang ------------------------------------------------------------------ From:Florian Weimer Send Time:2024 Feb. 26 (Mon.) 02:56 To:shenandoah-dev Subject:Load reference barriers and safepoints Do load reference barriers may reach a safepoint? Or slightly different question: Does Hotspot need to keep an accurate stack map at the point of the barrier? I'm just curious, no particular reasons for me to ask. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdnilsen at openjdk.org Mon Feb 26 19:15:07 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 26 Feb 2024 19:15:07 GMT Subject: Integrated: 8325670: GenShen: Allow old to expand at end of each GC In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 17:36:45 GMT, Kelvin Nilsen wrote: > At the end of GC, we set aside collector reserves to satisfy anticipated needs of the next GC. > > This PR reverts a change that accidentally prevents old-gen from being enlarged by this action. The observed failure condition was that mixed evacuations were not able to be performed, because old-gen was not large enough to receive the results of the desired evacuations. This pull request has now been integrated. Changeset: 2f1fa6db Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/2f1fa6dbe2b6dcea15603ac42087af83961622f9 Stats: 8 lines in 1 file changed: 5 ins; 0 del; 3 mod 8325670: GenShen: Allow old to expand at end of each GC Reviewed-by: ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/394 From fw at deneb.enyo.de Mon Feb 26 19:47:14 2024 From: fw at deneb.enyo.de (Florian Weimer) Date: Mon, 26 Feb 2024 20:47:14 +0100 Subject: Load reference barriers and safepoints In-Reply-To: (Liang Mao's message of "Mon, 26 Feb 2024 10:24:24 +0800") References: <878r38s78d.fsf@mid.deneb.enyo.de> Message-ID: <87plwjypml.fsf@mid.deneb.enyo.de> * Liang Mao: > No. Load reference barriers don't have oopmap. And the slow path > runtime call of LRB is a leaf call which doesn't have oopmap and > cannot suspend in safepoint. Thanks. What happens on evacuation failure? Is that impossible for mutator threads, or will the thread throw a pre-allocated exception? From wkemper at openjdk.org Mon Feb 26 20:08:32 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 26 Feb 2024 20:08:32 GMT Subject: RFR: Merge openjdk/jdk:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-23+11 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 93 commits: - Merge remote-tracking branch 'shenandoah/master' into merge-jdk-23+11 - 8326461: tools/jlink/CheckExecutable.java fails as .debuginfo files are not executable Reviewed-by: shade, alanb - 8325870: Zap end padding bits for ArrayOops in non-release builds Reviewed-by: stefank, ayang - 8326414: Serial: Inline SerialHeap::create_rem_set Reviewed-by: kbarrett - 8323695: RenderPerf (2D) enhancements (23.12) Reviewed-by: avu, prr - 8324243: Compilation failures in java.desktop module with gcc 14 Reviewed-by: jwaters, ihse, kbarrett, prr - 8325342: Remove unneeded exceptions in compare.sh Reviewed-by: erikj - 8326158: Javadoc for java.time.DayOfWeek#minus(long) Reviewed-by: iris, lancea - 8326351: Update the Zlib version in open/src/java.base/share/legal/zlib.md to 1.3.1 Reviewed-by: iris, naoto, jpai - 8326235: RISC-V: Size CodeCache for short calls encoding Reviewed-by: fyang, tonyp - ... and 83 more: https://git.openjdk.org/shenandoah/compare/2f1fa6db...507cd6a7 ------------- Changes: https://git.openjdk.org/shenandoah/pull/399/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=399&range=01 Stats: 12284 lines in 327 files changed: 6368 ins; 3630 del; 2286 mod Patch: https://git.openjdk.org/shenandoah/pull/399.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/399/head:pull/399 PR: https://git.openjdk.org/shenandoah/pull/399 From wkemper at openjdk.org Mon Feb 26 20:08:58 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 26 Feb 2024 20:08:58 GMT Subject: RFR: 8326626: GenShen: Remove dead code associated with non-elastic TLABS In-Reply-To: References: Message-ID: <-Ekor7ACJQsa8vPcxaCweacbtumX_RJL72b0dmJDBVQ=.b9d5ff24-07bf-41b5-a3d6-616b3fa94736@github.com> On Mon, 26 Feb 2024 00:21:55 GMT, Kelvin Nilsen wrote: > Remove extraneous remnant of code that is no longer needed because all LABs are not elastic. Good catch! Build failures look strange. ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/400#pullrequestreview-1901893823 PR Comment: https://git.openjdk.org/shenandoah/pull/400#issuecomment-1965157739 From wkemper at openjdk.org Mon Feb 26 21:40:56 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 26 Feb 2024 21:40:56 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp In-Reply-To: References: Message-ID: On Mon, 19 Feb 2024 17:40:32 GMT, Y. Srinivas Ramakrishna wrote: >> This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 183: > >> 181: // b. Cancel all concurrent marks, if in progress >> 182: if (heap->is_concurrent_mark_in_progress()) { >> 183: // TODO: Send cancel_concurrent_mark upstream? Does it really not have it already? > > We have: > > void ShenandoahHeap::cancel_concurrent_mark() { > _young_generation->cancel_marking(); > _old_generation->cancel_marking(); > _global_generation->cancel_marking(); > > ShenandoahBarrierSet::satb_mark_queue_set().abandon_partial_marking(); > } > > > whereas upstream has: > > // b. Cancel concurrent mark, if in progress > if (heap->is_concurrent_mark_in_progress()) { > ShenandoahConcurrentGC::cancel(); > heap->set_concurrent_mark_in_progress(false); > } > > > Should probably be reconciled and upstreamed if not here, then in a separate but linked CR. We changed the behavior of cancellation somewhat for the generational mode. We essentially overloaded 'cancellation' to mean 'suspension' for old gen marking. I agree it needs to be reconciled, but I think it's outside the scope of this PR. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1503332619 From wkemper at openjdk.org Mon Feb 26 21:53:57 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 26 Feb 2024 21:53:57 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp In-Reply-To: References: Message-ID: On Mon, 19 Feb 2024 18:21:36 GMT, Y. Srinivas Ramakrishna wrote: >> This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 460: > >> 458: >> 459: template >> 460: void ShenandoahPrepareForCompactionTask::prepare_for_compaction(ClosureType& cl, > > Looking at this method, it looks like it actually belongs as a public method in the closure, with that closure's methods invoked here becoming private to the closure. Makes for a narrower public API all around and keeps the loop in what appears to me in its more natural place. Hmm, I agree from a design perspective. However, we have two versions of the closure. For them to share the implementation of `prepare_for_compaction` as you suggest, I believe we'd need to introduce a common base class (or parent one closure to another) and make some of these methods virtual. Or were you thinking of making `prepare_for_compaction` a static member of a common base class and keeping the same signature. Or, just duplicating the implementation of `prepare_for_compaction` in each closure? Not sure if any of those are simpler than what we have here. > src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 126: > >> 124: size_t available() const override; >> 125: size_t available_with_reserve() const; >> 126: size_t used_and_wasted() const { > > Nit: technically `used_or_wasted()`, since space that is used is not wasted. How about `used_including_waste`? ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1503346149 PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1503347497 From wkemper at openjdk.org Mon Feb 26 22:34:35 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 26 Feb 2024 22:34:35 GMT Subject: RFR: Merge openjdk/jdk:master [v3] In-Reply-To: References: Message-ID: <7TwRTmLTrfCErw4wW54RNSZ0nrJ_pGfRcAwNK7ElzME=.4f82e6c5-0e20-4653-8d6d-ecc4c0f502cc@github.com> > Merges tag jdk-23+11 William Kemper has updated the pull request incrementally with one additional commit since the last revision: Fix zero build ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/399/files - new: https://git.openjdk.org/shenandoah/pull/399/files/507cd6a7..b88095ac Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=399&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=399&range=01-02 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/399.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/399/head:pull/399 PR: https://git.openjdk.org/shenandoah/pull/399 From kdnilsen at openjdk.org Mon Feb 26 22:36:54 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 26 Feb 2024 22:36:54 GMT Subject: RFR: Merge openjdk/jdk:master [v3] In-Reply-To: <7TwRTmLTrfCErw4wW54RNSZ0nrJ_pGfRcAwNK7ElzME=.4f82e6c5-0e20-4653-8d6d-ecc4c0f502cc@github.com> References: <7TwRTmLTrfCErw4wW54RNSZ0nrJ_pGfRcAwNK7ElzME=.4f82e6c5-0e20-4653-8d6d-ecc4c0f502cc@github.com> Message-ID: On Mon, 26 Feb 2024 22:34:35 GMT, William Kemper wrote: >> Merges tag jdk-23+11 > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix zero build Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/399#pullrequestreview-1902177420 From wkemper at openjdk.org Mon Feb 26 23:29:18 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 26 Feb 2024 23:29:18 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp [v2] In-Reply-To: References: Message-ID: <5MmbPyhhvix0vS4NTAMHFMSboGbUcmtAaqGWzKQ1mso=.8f876068-aae0-49ef-81a5-78988cb478fc@github.com> > This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Improve names and comments ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/398/files - new: https://git.openjdk.org/shenandoah/pull/398/files/850b011a..cfed36ea Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=398&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=398&range=00-01 Stats: 53 lines in 6 files changed: 27 ins; 7 del; 19 mod Patch: https://git.openjdk.org/shenandoah/pull/398.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/398/head:pull/398 PR: https://git.openjdk.org/shenandoah/pull/398 From wkemper at openjdk.org Tue Feb 27 00:22:08 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 27 Feb 2024 00:22:08 GMT Subject: Integrated: 8324067: GenShen: Isolate regulator thread to generational mode In-Reply-To: References: Message-ID: On Fri, 9 Feb 2024 19:47:14 GMT, William Kemper wrote: > Shenandoah's generational mode uses a second thread to evaluate heuristics. This is necessary so that the heuristics may interrupt a control thread which is running an old cycle in order to run a young cycle. > > The changes here move the regulator thread into `ShenandoahGenerationalHeap`. The generational version of the control thread is also now instantiated only by the generational heap. The upstream version of the control thread has more or less been restored. To summarize: > * An abstract base class called `ShenandoahController` has been introduced as the base class for the original and generational control threads. It has just one virtual method and it is not on a fast path. Much of the common code has been pulled up into this class. > * The respective control threads no longer need to check what mode they are in. They also no longer need to select which global generation they need to use. The regulator thread is now only used by the generational mode so it no longer supports running only global cycles. This pull request has now been integrated. Changeset: fcf8a8c7 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/fcf8a8c7e11d02272e4ee07179664bce2f56b2cb Stats: 2127 lines in 15 files changed: 1282 ins; 727 del; 118 mod 8324067: GenShen: Isolate regulator thread to generational mode Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/391 From ysr at openjdk.org Tue Feb 27 01:14:59 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 27 Feb 2024 01:14:59 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp [v2] In-Reply-To: <5MmbPyhhvix0vS4NTAMHFMSboGbUcmtAaqGWzKQ1mso=.8f876068-aae0-49ef-81a5-78988cb478fc@github.com> References: <5MmbPyhhvix0vS4NTAMHFMSboGbUcmtAaqGWzKQ1mso=.8f876068-aae0-49ef-81a5-78988cb478fc@github.com> Message-ID: On Mon, 26 Feb 2024 23:29:18 GMT, William Kemper wrote: >> This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Improve names and comments Changes look great; thanks for the clearer renaming as well as the great documentation! ------------- Marked as reviewed by ysr (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/398#pullrequestreview-1902335730 From ysr at openjdk.org Tue Feb 27 01:40:04 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 27 Feb 2024 01:40:04 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp [v2] In-Reply-To: References: Message-ID: On Mon, 26 Feb 2024 21:49:56 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 460: >> >>> 458: >>> 459: template >>> 460: void ShenandoahPrepareForCompactionTask::prepare_for_compaction(ClosureType& cl, >> >> Looking at this method, it looks like it actually belongs as a public method in the closure, with that closure's methods invoked here becoming private to the closure. Makes for a narrower public API all around and keeps the loop in what appears to me in its more natural place. > > Hmm, I agree from a design perspective. However, we have two versions of the closure. For them to share the implementation of `prepare_for_compaction` as you suggest, I believe we'd need to introduce a common base class (or parent one closure to another) and make some of these methods virtual. Or were you thinking of making `prepare_for_compaction` a static member of a common base class and keeping the same signature. Or, just duplicating the implementation of `prepare_for_compaction` in each closure? Not sure if any of those are simpler than what we have here. Yes, I think you are right. When I was writing that comment, I was thinking that the loop could just be duplicated into each of the two closures. I guess may be that doesn't particularly make it any simpler and moreover leads to some avoidable duplication. I'll leave it up to you as to which makes more sense to you. Reviewed and approved! ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1503523836 From maoliang.ml at alibaba-inc.com Tue Feb 27 03:14:02 2024 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 27 Feb 2024 11:14:02 +0800 Subject: =?UTF-8?B?UmU6IExvYWQgcmVmZXJlbmNlIGJhcnJpZXJzIGFuZCBzYWZlcG9pbnRz?= In-Reply-To: <87plwjypml.fsf@mid.deneb.enyo.de> References: <878r38s78d.fsf@mid.deneb.enyo.de> , <87plwjypml.fsf@mid.deneb.enyo.de> Message-ID: <70082f5d-d036-4c82-a26a-577b1f7eb4dd.maoliang.ml@alibaba-inc.com> That's a good question and java thread won't throw any exceptions since evacuation failure is an internal implementation detail which should not affect java behavior. As we know we cannot do immediately gc in a slow path call when evacuation failure happens and we have to return from the LRB leaf call when evacuation failed(return the from space oop) but there could be races that thread A returned a to-space oop in a successful evacuation and thread B returned a from-space oop in a failed evacuation which broke to-space invariance. The solution is an evacuation failure protocol described in detail in shenandoahEvacOOMHandler.hpp. The root trick is that the java thread which encounters evacuation failure will block other evacuation to happen and wait for other successful evacuations to finish then resolve the oop. Thanks, Liang ------------------------------------------------------------------ From:Florian Weimer Send Time:2024 Feb. 27 (Tue.) 03:47 To:"MAO, Liang" Cc:shenandoah-dev Subject:Re: Load reference barriers and safepoints * Liang Mao: > No. Load reference barriers don't have oopmap. And the slow path > runtime call of LRB is a leaf call which doesn't have oopmap and > cannot suspend in safepoint. Thanks. What happens on evacuation failure? Is that impossible for mutator threads, or will the thread throw a pre-allocated exception? -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at openjdk.org Tue Feb 27 10:44:01 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 27 Feb 2024 10:44:01 GMT Subject: RFR: 8326626: GenShen: Remove dead code associated with non-elastic TLABS In-Reply-To: References: Message-ID: On Mon, 26 Feb 2024 00:21:55 GMT, Kelvin Nilsen wrote: > Remove extraneous remnant of code that is no longer needed because all LABs are not elastic. Marked as reviewed by shade (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/400#pullrequestreview-1903153220 From kdnilsen at openjdk.org Tue Feb 27 16:01:09 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 27 Feb 2024 16:01:09 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp [v2] In-Reply-To: <5MmbPyhhvix0vS4NTAMHFMSboGbUcmtAaqGWzKQ1mso=.8f876068-aae0-49ef-81a5-78988cb478fc@github.com> References: <5MmbPyhhvix0vS4NTAMHFMSboGbUcmtAaqGWzKQ1mso=.8f876068-aae0-49ef-81a5-78988cb478fc@github.com> Message-ID: <9YWCdPAFA2KrhRHrlQZ_87QT5TpHZ1aybAL4CUXNluQ=.745e7864-8d0e-4874-9389-650489fc3bbe@github.com> On Mon, 26 Feb 2024 23:29:18 GMT, William Kemper wrote: >> This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Improve names and comments Thanks for this work and thanks Ramki for thorough review. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/398#pullrequestreview-1903972471 From wkemper at openjdk.org Tue Feb 27 17:25:03 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 27 Feb 2024 17:25:03 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp [v2] In-Reply-To: References: Message-ID: <8U81k7nZYC55lzopIE39u8hrHOKtdY6HPwoE0KQSUS4=.d398972a-c981-4f44-85fe-a9e5da31bc75@github.com> On Tue, 27 Feb 2024 01:37:00 GMT, Y. Srinivas Ramakrishna wrote: >> Hmm, I agree from a design perspective. However, we have two versions of the closure. For them to share the implementation of `prepare_for_compaction` as you suggest, I believe we'd need to introduce a common base class (or parent one closure to another) and make some of these methods virtual. Or were you thinking of making `prepare_for_compaction` a static member of a common base class and keeping the same signature. Or, just duplicating the implementation of `prepare_for_compaction` in each closure? Not sure if any of those are simpler than what we have here. > > Yes, I think you are right. When I was writing that comment, I was thinking that the loop could just be duplicated into each of the two closures. I guess may be that doesn't particularly make it any simpler and moreover leads to some avoidable duplication. I'll leave it up to you as to which makes more sense to you. > > Reviewed and approved! I'll leave it for now. This form also reduces some of the delta with upstream (that is, with fewer changes upstream). ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/398#discussion_r1504670723 From wkemper at openjdk.org Tue Feb 27 19:09:17 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 27 Feb 2024 19:09:17 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp [v3] In-Reply-To: References: Message-ID: > This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge remote-tracking branch 'shenandoah/master' into isolate-full-gc - Improve names and comments - Fix zero build - Initialize member - Fix typo, remove TODOs - Small cleanup - Move some smaller chunks of code out shFullGC - Fix warnings - Move some big chunks of code out of shFullGC ------------- Changes: https://git.openjdk.org/shenandoah/pull/398/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=398&range=02 Stats: 1187 lines in 11 files changed: 650 ins; 496 del; 41 mod Patch: https://git.openjdk.org/shenandoah/pull/398.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/398/head:pull/398 PR: https://git.openjdk.org/shenandoah/pull/398 From wkemper at openjdk.org Tue Feb 27 19:15:08 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 27 Feb 2024 19:15:08 GMT Subject: RFR: 8325807: Shenandoah: Refactor full gc in preparation for generational mode changes [v2] In-Reply-To: References: Message-ID: <5ASaVMyhHNOVYY7rsd2BVCCDJmpeWbL5MBRYdjawQLk=.e159a808-0542-4a1a-a3f5-ea9e61e0413d@github.com> > Changes to prepare for generational mode: > * A `phase5_epilogue` method is added to run the final steps of the gc > * Prepare for mark operation is run by multiple worker threads > * `finish_region` method of compacting preparation closure is renamed to `finish` > * The prepare for compaction loop is extracted to a template method, parameterized on closure type William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'jdk/master' into prepare-isolate-full-gc - Factor out epilogue and template function to prepare for compaction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17894/files - new: https://git.openjdk.org/jdk/pull/17894/files/c631bde4..afc10227 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17894&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17894&range=00-01 Stats: 33400 lines in 1435 files changed: 14700 ins; 12499 del; 6201 mod Patch: https://git.openjdk.org/jdk/pull/17894.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17894/head:pull/17894 PR: https://git.openjdk.org/jdk/pull/17894 From kdnilsen at openjdk.org Tue Feb 27 20:58:05 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 27 Feb 2024 20:58:05 GMT Subject: Integrated: 8326626: GenShen: Remove dead code associated with non-elastic TLABS In-Reply-To: References: Message-ID: On Mon, 26 Feb 2024 00:21:55 GMT, Kelvin Nilsen wrote: > Remove extraneous remnant of code that is no longer needed because all LABs are not elastic. This pull request has now been integrated. Changeset: 9fde64ec Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/9fde64ecf7f4954dbd4028a57cdeb400de8b0268 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod 8326626: GenShen: Remove dead code associated with non-elastic TLABS Reviewed-by: wkemper, shade ------------- PR: https://git.openjdk.org/shenandoah/pull/400 From wkemper at openjdk.org Tue Feb 27 23:07:10 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 27 Feb 2024 23:07:10 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp [v4] In-Reply-To: References: Message-ID: > This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. William Kemper has updated the pull request incrementally with one additional commit since the last revision: More comments, more naming improvements ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/398/files - new: https://git.openjdk.org/shenandoah/pull/398/files/82317543..62790ff2 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=398&range=03 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=398&range=02-03 Stats: 23 lines in 6 files changed: 11 ins; 1 del; 11 mod Patch: https://git.openjdk.org/shenandoah/pull/398.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/398/head:pull/398 PR: https://git.openjdk.org/shenandoah/pull/398 From wkemper at openjdk.org Tue Feb 27 23:08:54 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 27 Feb 2024 23:08:54 GMT Subject: Integrated: 8325807: Shenandoah: Refactor full gc in preparation for generational mode changes In-Reply-To: References: Message-ID: On Fri, 16 Feb 2024 19:49:07 GMT, William Kemper wrote: > Changes to prepare for generational mode: > * A `phase5_epilogue` method is added to run the final steps of the gc > * Prepare for mark operation is run by multiple worker threads > * `finish_region` method of compacting preparation closure is renamed to `finish` > * The prepare for compaction loop is extracted to a template method, parameterized on closure type This pull request has now been integrated. Changeset: 33f23827 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/33f23827676dc9ff12bd5c0891170cd813b93b96 Stats: 76 lines in 2 files changed: 33 ins; 14 del; 29 mod 8325807: Shenandoah: Refactor full gc in preparation for generational mode changes Reviewed-by: shade, ysr, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/17894 From wkemper at openjdk.org Wed Feb 28 18:03:33 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 28 Feb 2024 18:03:33 GMT Subject: RFR: 8325808: GenShen: Move generational mode code out of shFullGC.cpp [v5] In-Reply-To: References: Message-ID: > This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Backout incomplete refactoring ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/398/files - new: https://git.openjdk.org/shenandoah/pull/398/files/62790ff2..625dd1e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=398&range=04 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=398&range=03-04 Stats: 5 lines in 1 file changed: 4 ins; 1 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/398.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/398/head:pull/398 PR: https://git.openjdk.org/shenandoah/pull/398 From wkemper at openjdk.org Thu Feb 29 14:24:21 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 29 Feb 2024 14:24:21 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master Message-ID: Merges tag jdk-21.0.4+0 ------------- Commit messages: - 8323664: java/awt/font/JNICheck/FreeTypeScalerJNICheck.java still fails with JNI warning on some Windows configurations - 8325496: Make TrimNativeHeapInterval a product switch - 8321151: JDK-8294427 breaks Windows L&F on all older Windows versions - 8325876: crashes in docker container tests on Linuxppc64le Power8 machines - 8325470: [AIX] use fclose after fopen in read_psinfo - 8320303: Allow PassFailJFrame to accept single window creator - 8326000: Remove obsolete comments for class sun.security.ssl.SunJSSE - 8314835: gtest wrappers should be marked as flagless - 8325254: CKA_TOKEN private and secret keys are not necessarily sensitive - 8311893: Interactive component with ARIA role 'tabpanel' does not have a programmatically associated name - ... and 34 more: https://git.openjdk.org/shenandoah-jdk21u/compare/57956950...36b5ac46 The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/23/files Stats: 3168 lines in 175 files changed: 2422 ins; 194 del; 552 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/23.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/23/head:pull/23 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/23 From wkemper at openjdk.org Thu Feb 29 17:38:08 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 29 Feb 2024 17:38:08 GMT Subject: Integrated: 8325808: GenShen: Move generational mode code out of shFullGC.cpp In-Reply-To: References: Message-ID: On Fri, 16 Feb 2024 19:42:54 GMT, William Kemper wrote: > This change reduces the differences from the upstream branch by moving large chunks of generational mode code into separate files. This pull request has now been integrated. Changeset: b7c68eb3 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/b7c68eb36eccd8b202d2c1a86c804c066e0bb8b8 Stats: 1209 lines in 13 files changed: 664 ins; 497 del; 48 mod 8325808: GenShen: Move generational mode code out of shFullGC.cpp Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/398 From kdnilsen at openjdk.org Thu Feb 29 19:10:21 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 29 Feb 2024 19:10:21 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v14] In-Reply-To: References: Message-ID: > Several objectives: > 1. Reduce humongous allocation failures by segregating regular regions from humongous regions > 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB > 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations > 4. Treat collector reserves as available for Mutator allocations after evacuation completes > 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah > > We have compared performance of existing FreeSet implementation with the proposed PR over a broad set of performance workloads and see that the impact is mostly neutral. > > Comparing 105235.0 metrics from control, 220638.0 from experiment. > Compare: 0.589s > Most impacted benchmarks | Most impacted metrics > ------------------------------------------------------------------------------------------------------- > Shenandoah/jython | cwr_total > > > Only in experiment | Only in control > ------------------------------------------------------------------------------------------------------- > crypto.signverify/trigger_failure | crypto.rsa/cmr_thread_roots > extremem-large-31g/adjust_pointers | scimark.sparse.small/concurrent_thread_roots > extremem-large-31g/calculate_addresses | xml.transform/concurrent_thread_roots > crypto.signverify/class_unloading_rendezvous | mpegaudio/concurrent_weak_roots > serial/cmr_total | crypto.rsa/ctr_thread_roots > > Shenandoah > ------------------------------------------------------------------------------------------------------- > +5.64% jython/cwr_total p=0.00037 > Control: 1.928ms (+/-272.40us) 170 > Test: 2.037ms (+/-322.73us) 344 Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 60 commits: - Remove instrumentation and cleanup magic numbers Two places, I had used 63 when I should have used _bits_per_array_element -1. - Address 32-bit compile issues - Fix 32-bit size formatting in log messages - Fix white space - Merge remote-tracking branch 'origin/master' into restructure-free-set - Merge branch 'openjdk:master' into master - Two bug fixes for better performance 1. Collector reserve size is based on memory available within regions rather than the region size (oops). 2. If an attempt to allocate within a region fails and the region has already provided the percentage presumed by ShenandoahEvacWaste, retire this region. This is motivated by observations that otherwise, we end up with large numbers of regions that have only a small amount of memory within them (e.g. 4k) and every allocation request has to wade through all of these regions before it eventually finds a region that has a sufficiently large amount of available memory. In the original Shenandoah free-set implementation, the behavior was to retire the first time an allocation within that regions fails, regardless of whether the region has already reached the ShenandoahEvacWaste threshold. - Fix off-by-one error in is_forward_consecutive_ones() - Fix whitespace - Bug fixes and performance improvements 1. Correct off-b-one-error in count of trailingones 2. Speed up search for contiguous regions (for humongous allocations) by sliding window instead of initiating new search each time 3. Bias regular region allocations to favor regions that are already partially consumed 4. Fix bug in move_regions_from_collector_to_mutator which caused some non-empty regions to be ignored. - ... and 50 more: https://git.openjdk.org/jdk/compare/be2b92bd...1aa5a3e6 ------------- Changes: https://git.openjdk.org/jdk/pull/17561/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17561&range=13 Stats: 1847 lines in 5 files changed: 1488 ins; 168 del; 191 mod Patch: https://git.openjdk.org/jdk/pull/17561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17561/head:pull/17561 PR: https://git.openjdk.org/jdk/pull/17561 From kdnilsen at openjdk.org Thu Feb 29 19:10:21 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 29 Feb 2024 19:10:21 GMT Subject: RFR: 8324649: Shenandoah: refactor implementation of free set [v13] In-Reply-To: References: Message-ID: On Tue, 20 Feb 2024 23:36:10 GMT, Kelvin Nilsen wrote: >> Several objectives: >> 1. Reduce humongous allocation failures by segregating regular regions from humongous regions >> 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB >> 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations >> 4. Treat collector reserves as available for Mutator allocations after evacuation completes >> 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah >> >> We have compared performance of existing FreeSet implementation with the proposed PR over a broad set of performance workloads and see that the impact is mostly neutral. >> >> Comparing 105235.0 metrics from control, 220638.0 from experiment. >> Compare: 0.589s >> Most impacted benchmarks | Most impacted metrics >> ------------------------------------------------------------------------------------------------------- >> Shenandoah/jython | cwr_total >> >> >> Only in experiment | Only in control >> ------------------------------------------------------------------------------------------------------- >> crypto.signverify/trigger_failure | crypto.rsa/cmr_thread_roots >> extremem-large-31g/adjust_pointers | scimark.sparse.small/concurrent_thread_roots >> extremem-large-31g/calculate_addresses | xml.transform/concurrent_thread_roots >> crypto.signverify/class_unloading_rendezvous | mpegaudio/concurrent_weak_roots >> serial/cmr_total | crypto.rsa/ctr_thread_roots >> >> Shenandoah >> ------------------------------------------------------------------------------------------------------- >> +5.64% jython/cwr_total p=0.00037 >> Control: 1.928ms (+/-272.40us) 170 >> Test: 2.037ms (+/-322.73us) 344 > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix an error in search for contiguous regions I'm working on some performance regressions identified on certain tests. Converting to draft until these are resolved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17561#issuecomment-1957350110