From tschatzl at openjdk.org Mon Dec 2 06:35:41 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Dec 2024 06:35:41 GMT Subject: RFR: 8345173: BlockLocationPrinter::print_location misses a ResourceMark In-Reply-To: References:

Message-ID: <6CkyYuRw1-i8jedbYK5P_3hbiD8ijvUMvx5k1DIc71g=.2bc72623-c9c7-4b35-8b4e-0fc4d3fa7a14@github.com> On Fri, 29 Nov 2024 13:53:19 GMT, Stefan Johansson wrote: >> Hi all, >> >> please review this small change that adds a missing `ResourceMark` to `BlockLocationPrinter`; it can be called at very arbitrary places (e.g. the stop() method of MacroAssembler), and without this change it might fail with a "Missing ResourceMark error - possible memory leak" error instead of providing the stop() output (and then faiiling). >> >> Testing: local testing, after the change the ResourceMark crash goes away, gha >> >> Thanks, >> Thomas > > Marked as reviewed by sjohanss (Reviewer). Thanks @kstefanj @walulyai for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/22455#issuecomment-2510688071 From tschatzl at openjdk.org Mon Dec 2 06:35:42 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Dec 2024 06:35:42 GMT Subject: Integrated: 8345173: BlockLocationPrinter::print_location misses a ResourceMark In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 09:36:16 GMT, Thomas Schatzl wrote: > Hi all, > > please review this small change that adds a missing `ResourceMark` to `BlockLocationPrinter`; it can be called at very arbitrary places (e.g. the stop() method of MacroAssembler), and without this change it might fail with a "Missing ResourceMark error - possible memory leak" error instead of providing the stop() output (and then faiiling). > > Testing: local testing, after the change the ResourceMark crash goes away, gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: f5ebda43 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/f5ebda43709984214a25e23926860fea2ba5819a Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8345173: BlockLocationPrinter::print_location misses a ResourceMark Reviewed-by: sjohanss, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/22455 From tschatzl at openjdk.org Mon Dec 2 08:57:37 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Dec 2024 08:57:37 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References:

Message-ID: <9ElRBvXwzJLiC1KpFLPvS7CkGkhhN3QYaynNYM2P1f4=.16f0bb3b-316f-439c-b22f-d28fe3fb7891@github.com> On Wed, 20 Nov 2024 19:23:34 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas Review Sorry for the late reply. Some more comments need update, other than that it seems fine. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 342: > 340: // regions in retained collection set candidates Retained collection set candidates are aged out, ie. > 341: // made to regular old regions without remembered sets after a few attempts to save computation costs > 342: // of keeping them candidates for very long living pinned regions. Suggestion: // The current mechanism for evacuating pinned old regions is as below: // * pinned regions in the marking collection set candidate list (available during mixed gc) are evacuated like // pinned young regions to avoid the complexity of dealing with pinned regions that are part of a // collection group sharing a single cardset. These regions will be partially evacuated and added to the // retained collection set by the evacuation failure handling mechanism. // * evacuating pinned regions out of retained collection set candidates would also just take up time // with no actual space freed in old gen. Better to concentrate on others. So we skip over pinned // regions in retained collection set candidates. Retained collection set candidates are aged out, ie. // made to regular old regions without remembered sets after a few attempts to save computation costs // of keeping them candidates for very long living pinned regions. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 522: > 520: bool fits_in_remaining_time = predicted_time_ms <= time_remaining_ms; > 521: > 522: G1CollectionSetCandidateInfo* ci = group->at(0); // we only have one region in the group Suggestion: G1CollectionSetCandidateInfo* ci = group->at(0); // We only have one region in the group. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 65: > 63: // All regions in the group share a G1CardSet instance, which tracks remembered set entries for the > 64: // regions in the group. We do not have track to cross-region references for regions that are in the > 65: // same group. Suggestion: // G1CSetCandidateGroup groups candidate regions that will be selected for evacuation at the same time. // Grouping occurs both for candidates from marking or regions retained during evacuation failure, but a group // can not contain regions from both types of regions. // // Humongous objects are excluded from the candidate groups because regions associated with these // objects are never selected for evacuation. // // All regions in the group share a G1CardSet instance, which tracks remembered set entries for the // regions in the group. We do not have track to cross-region references for regions that are in the // same group saving memory. src/hotspot/share/gc/g1/g1_globals.hpp line 283: > 281: "The maximum number of old CSet regions in a collection group. " \ > 282: "These will be evacuated in the same GC pause. The first group " \ > 283: "may exceed this limit depending on G1MixedGCCountTarget.") \ Maybe this is better, not sure. We should explain why the "first group" is special. Suggestion: product(uint, G1OldCSetGroupSize, 5, EXPERIMENTAL, \ "The maximum number of old CSet regions in a collection group. " \ "All regions in a group will be evacuated in the same GC pause. The first group calculated after marking from marking candidates " \ "may exceed this limit as it is calculated based on G1MixedGCCountTarget.") \ ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22015#pullrequestreview-2472141691 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1865490259 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1865503030 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1865494143 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1865498063 From tschatzl at openjdk.org Mon Dec 2 10:00:39 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Dec 2024 10:00:39 GMT Subject: RFR: 8345217: Parallel: Refactor PSParallelCompact::next_src_region In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:22:33 GMT, Albert Mingkun Yang wrote: > Simple removing some unnecessary calculations in locating the next source-region during full-gc. > > Test: tier1-5 I think I understood the changes, I added this understanding, hopefully making it easier for other reviewers. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2142: > 2140: // Found the first non-empty region in the same space. > 2141: src_region_idx = sd.region(src_region_ptr); > 2142: closure.set_source(sd.region_to_addr(src_region_idx)); Just to make sure I understand: the only change here is the removal of the condition `src_region_addr > closure.source()` because at worst we can set the same value into `closure._source` anyway, and the additional check is kind of superfluous. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2169: > 2167: src_space_id = SpaceId(space_id); > 2168: src_space_top = top; > 2169: closure.set_source(region_start_addr); The reason for removing the search for the first live word is because all callers will scan the bitmap anyway? ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22441#pullrequestreview-2472240795 PR Review Comment: https://git.openjdk.org/jdk/pull/22441#discussion_r1865554912 PR Review Comment: https://git.openjdk.org/jdk/pull/22441#discussion_r1865555787 From ayang at openjdk.org Mon Dec 2 10:33:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Dec 2024 10:33:43 GMT Subject: RFR: 8345220: Serial: Refactor TenuredGeneration::promotion_attempt_is_safe In-Reply-To: References: Message-ID: <1ZMnG9zpv7e3f-5g098hXWzqZm9V1miTNNld91Xxb5A=.9e2e2a2d-aa1e-4b3f-8d4d-5a61a9d6184a@github.com> On Thu, 28 Nov 2024 15:50:20 GMT, Albert Mingkun Yang wrote: > Trivial using MIN2 to replace `>=` and `||` for better readability. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22444#issuecomment-2511151109 From ayang at openjdk.org Mon Dec 2 10:33:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Dec 2024 10:33:43 GMT Subject: Integrated: 8345220: Serial: Refactor TenuredGeneration::promotion_attempt_is_safe In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:50:20 GMT, Albert Mingkun Yang wrote: > Trivial using MIN2 to replace `>=` and `||` for better readability. This pull request has now been integrated. Changeset: 0b0f83c0 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/0b0f83c01e30587ca2e23b46493bdc7fcb21559f Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod 8345220: Serial: Refactor TenuredGeneration::promotion_attempt_is_safe Reviewed-by: tschatzl, mli ------------- PR: https://git.openjdk.org/jdk/pull/22444 From rkennke at openjdk.org Mon Dec 2 11:15:12 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 2 Dec 2024 11:15:12 GMT Subject: RFR: 8345293: Fix generational Shenandoah with compact headers Message-ID: See bug for crash details. The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. Testing: - [x] hotspot_gc_shenandoah +UCOH ------------- Commit messages: - 8345293: Fix generational Shenandoah with compact headers Changes: https://git.openjdk.org/jdk/pull/22477/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22477&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345293 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22477/head:pull/22477 PR: https://git.openjdk.org/jdk/pull/22477 From aboldtch at openjdk.org Mon Dec 2 11:21:22 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 2 Dec 2024 11:21:22 GMT Subject: RFR: 8344414: ZGC: Another division by zero in rule_major_allocation_rate [v2] In-Reply-To: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: > This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. > > As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. > > There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Always trigger OC, even when old_garbage is 0 - Merge tag 'jdk-24+26' into JDK-8344414 Added tag jdk-24+26 for changeset 8485cb1c - 8344414: ZGC: Another division by zero in rule_major_allocation_rate (ubsan) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22228/files - new: https://git.openjdk.org/jdk/pull/22228/files/3bc3ff4b..98c0acb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22228&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22228&range=00-01 Stats: 54447 lines in 1247 files changed: 27258 ins; 21097 del; 6092 mod Patch: https://git.openjdk.org/jdk/pull/22228.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22228/head:pull/22228 PR: https://git.openjdk.org/jdk/pull/22228 From ayang at openjdk.org Mon Dec 2 11:43:42 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Dec 2024 11:43:42 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References:

Message-ID: On Fri, 29 Nov 2024 16:09:15 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: > > - num_allocators => max_allocators > - fix comment typo > - use struct/union instead of constants Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2472494759 From shade at openjdk.org Mon Dec 2 11:58:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Dec 2024 11:58:39 GMT Subject: RFR: 8345293: Fix generational Shenandoah with compact headers In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 11:09:37 GMT, Roman Kennke wrote: > See bug for crash details. > > The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. > > The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. > > Testing: > - [x] hotspot_gc_shenandoah +UCOH Looks good. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22477#pullrequestreview-2472499004 From iwalulya at openjdk.org Mon Dec 2 12:06:31 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 2 Dec 2024 12:06:31 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v3] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/gc/g1/g1_globals.hpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/4aa4d6b2..fbff7d78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=01-02 Stats: 13 lines in 3 files changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From stuefe at openjdk.org Mon Dec 2 13:52:37 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 2 Dec 2024 13:52:37 GMT Subject: RFR: 8345293: Fix generational Shenandoah with compact headers In-Reply-To: References: Message-ID: <4oavGU_b5J322sriy37B9AC_zoIC6mvfADPFSRpcDYs=.24381e90-ebd6-47ff-9ee7-5e098f9c1aab@github.com> On Mon, 2 Dec 2024 11:09:37 GMT, Roman Kennke wrote: > See bug for crash details. > > The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. > > The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. > > Testing: > - [x] hotspot_gc_shenandoah +UCOH Good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22477#pullrequestreview-2472754459 From ysr at openjdk.org Mon Dec 2 15:47:42 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 2 Dec 2024 15:47:42 GMT Subject: RFR: 8345293: Fix generational Shenandoah with compact headers In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 11:09:37 GMT, Roman Kennke wrote: > See bug for crash details. > > The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. > > The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. > > Testing: > - [x] hotspot_gc_shenandoah +UCOH Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22477#pullrequestreview-2473077721 From kbarrett at openjdk.org Mon Dec 2 16:01:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 2 Dec 2024 16:01:43 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References:

Message-ID: On Mon, 2 Dec 2024 11:23:53 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: >> >> - num_allocators => max_allocators >> - fix comment typo >> - use struct/union instead of constants > > src/hotspot/share/gc/shared/partialArrayState.hpp line 181: > >> 179: // allocator counters as a single unit for atomic manipulation. >> 180: using CounterValues = LP64_ONLY(uint64_t) NOT_LP64(uint32_t); >> 181: using Counter = LP64_ONLY(uint32_t) NOT_LP64(uint16_t); > > Given the max-value has type `uint`, using the larger type on both 32/64 bit systems should be simpler and it should not cause any noticeable perf regression, since registering/releasing allocators should be infrequent. WDYT? I assumed 16bits of worker threads was quite sufficient for a 32bit platform. And I misremembered and thought 32bit platforms couldn't be relied upon for a 64bit atomic add and maybe other 64bit operations. And this code is definitely not super performance critical. So yeah, I could drop the platform-conditional definition of Counter. I don't think it makes much difference to the code. I guess the type aliases could be dropped and just use bare uint32/64_t. Not sure that's actually an improvement. > src/hotspot/share/gc/shared/partialArrayState.hpp line 189: > >> 187: // allocators. The counters are atomic to permit concurrent construction, >> 188: // and to permit concurrent destruction. They are an atomic unit to detect >> 189: // and reject mixing the two phases, without concern for questions of > > It's nice that this library can detect and reject misuse (such as mixing two phases), but I?m not sure why so much effort was spent preventing this. None of the existing users of the library are expected to mix phases in the near future. Could we instead document that mixing two phases is not permitted, and if someone chooses to do so, they do so at their own risk? So far, only 2 of the nearly a dozen(?) potential clients are using this. I'm not sure that none of them are going to have workers that do some of their setup after being started. Hence the desire to support concurrency. And if that, then I feel better about it if there's some usage validation. But maybe it would be better to just throw a lock at the problem. And if it turns out none of the use-cases end up needing that concurrency, then I won't object to a little bit of code simplification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1866111712 PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1866111801 From eosterlund at openjdk.org Mon Dec 2 21:29:40 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Dec 2024 21:29:40 GMT Subject: RFR: 8344414: ZGC: Another division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: On Mon, 2 Dec 2024 11:21:22 GMT, Axel Boldt-Christmas wrote: >> This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. >> >> As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. >> >> There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Always trigger OC, even when old_garbage is 0 > - Merge tag 'jdk-24+26' into JDK-8344414 > > Added tag jdk-24+26 for changeset 8485cb1c > - 8344414: ZGC: Another division by zero in rule_major_allocation_rate (ubsan) Even better. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22228#pullrequestreview-2473964964 From wkemper at openjdk.org Mon Dec 2 22:51:50 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 2 Dec 2024 22:51:50 GMT Subject: RFR: 8345346: Shenandoah: Description of ShenandoahGCMode still refers to incremental update mode Message-ID: The incremental update mode has been removed and is no longer supported. ------------- Commit messages: - Remove reference to incremental update mode Changes: https://git.openjdk.org/jdk/pull/22502/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22502&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345346 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22502.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22502/head:pull/22502 PR: https://git.openjdk.org/jdk/pull/22502 From ysr at openjdk.org Mon Dec 2 22:57:45 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 2 Dec 2024 22:57:45 GMT Subject: RFR: 8345346: Shenandoah: Description of ShenandoahGCMode still refers to incremental update mode In-Reply-To: References: Message-ID: <3YqrRBP2ACAzhlHZRXaKMmB-awPFRuiHXyDs66fpQOw=.37cfe055-529b-4f4b-b4d3-ad0c98ce0926@github.com> On Mon, 2 Dec 2024 22:46:25 GMT, William Kemper wrote: > The incremental update mode has been removed and is no longer supported. Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22502#pullrequestreview-2474111019 From wkemper at openjdk.org Mon Dec 2 22:57:46 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 2 Dec 2024 22:57:46 GMT Subject: Integrated: 8345346: Shenandoah: Description of ShenandoahGCMode still refers to incremental update mode In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 22:46:25 GMT, William Kemper wrote: > The incremental update mode has been removed and is no longer supported. This pull request has now been integrated. Changeset: 1997e89d Author: William Kemper URL: https://git.openjdk.org/jdk/commit/1997e89ddf9fba7c6eea6c96bd0b5426576d4460 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8345346: Shenandoah: Description of ShenandoahGCMode still refers to incremental update mode Reviewed-by: ysr ------------- PR: https://git.openjdk.org/jdk/pull/22502 From ysr at openjdk.org Tue Dec 3 02:47:08 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 3 Dec 2024 02:47:08 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks Message-ID: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Fix documentation comment. I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. ------------- Commit messages: - Fix up documentation comment. Changes: https://git.openjdk.org/jdk/pull/22507/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344593 Stats: 10 lines in 1 file changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22507/head:pull/22507 PR: https://git.openjdk.org/jdk/pull/22507 From tschatzl at openjdk.org Tue Dec 3 09:01:39 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 3 Dec 2024 09:01:39 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v3] In-Reply-To: References:

Message-ID: <2pLpq6Qd4lHfxQeSEvIIZZGlejS8qn9boGzO6s5MoXU=.bad59007-1f23-4d77-95a9-2f2a0bfc29aa@github.com> On Mon, 2 Dec 2024 12:06:31 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with four additional commits since the last revision: > > - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1_globals.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22015#pullrequestreview-2474979494 From ayang at openjdk.org Tue Dec 3 09:24:40 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 3 Dec 2024 09:24:40 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References:

Message-ID: On Mon, 2 Dec 2024 15:58:50 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/shared/partialArrayState.hpp line 181: >> >>> 179: // allocator counters as a single unit for atomic manipulation. >>> 180: using CounterValues = LP64_ONLY(uint64_t) NOT_LP64(uint32_t); >>> 181: using Counter = LP64_ONLY(uint32_t) NOT_LP64(uint16_t); >> >> Given the max-value has type `uint`, using the larger type on both 32/64 bit systems should be simpler and it should not cause any noticeable perf regression, since registering/releasing allocators should be infrequent. WDYT? > > I assumed 16bits of worker threads was quite sufficient for a 32bit platform. > And I misremembered and thought 32bit platforms couldn't be relied upon for a > 64bit atomic add and maybe other 64bit operations. And this code is definitely > not super performance critical. So yeah, I could drop the platform-conditional > definition of Counter. I don't think it makes much difference to the code. I > guess the type aliases could be dropped and just use bare uint32/64_t. Not > sure that's actually an improvement. I think unifying 32 and 64 bit system is an improvement -- being able to reason with concrete types. As for type aliases, it's rather subjective; I find `uint*_t` more familiar, but up to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1867335846 From stefank at openjdk.org Tue Dec 3 09:50:39 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 3 Dec 2024 09:50:39 GMT Subject: RFR: 8344414: ZGC: Another division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: On Mon, 2 Dec 2024 11:21:22 GMT, Axel Boldt-Christmas wrote: >> This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. >> >> As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. >> >> There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Always trigger OC, even when old_garbage is 0 > - Merge tag 'jdk-24+26' into JDK-8344414 > > Added tag jdk-24+26 for changeset 8485cb1c > - 8344414: ZGC: Another division by zero in rule_major_allocation_rate (ubsan) Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22228#pullrequestreview-2475122868 From aboldtch at openjdk.org Tue Dec 3 10:45:45 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 3 Dec 2024 10:45:45 GMT Subject: RFR: 8344414: ZGC: Another division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: On Mon, 2 Dec 2024 11:21:22 GMT, Axel Boldt-Christmas wrote: >> This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. >> >> As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. >> >> There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Always trigger OC, even when old_garbage is 0 > - Merge tag 'jdk-24+26' into JDK-8344414 > > Added tag jdk-24+26 for changeset 8485cb1c > - 8344414: ZGC: Another division by zero in rule_major_allocation_rate (ubsan) Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22228#issuecomment-2514181776 From aboldtch at openjdk.org Tue Dec 3 10:45:46 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 3 Dec 2024 10:45:46 GMT Subject: Integrated: 8344414: ZGC: Another division by zero in rule_major_allocation_rate In-Reply-To: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: On Tue, 19 Nov 2024 07:18:20 GMT, Axel Boldt-Christmas wrote: > This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. > > As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. > > There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. This pull request has now been integrated. Changeset: 63af2f42 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/63af2f42b7abe9504897d7c3f3b4cc0b57123694 Stats: 28 lines in 3 files changed: 11 ins; 1 del; 16 mod 8344414: ZGC: Another division by zero in rule_major_allocation_rate Reviewed-by: eosterlund, stefank ------------- PR: https://git.openjdk.org/jdk/pull/22228 From ayang at openjdk.org Tue Dec 3 13:55:49 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 3 Dec 2024 13:55:49 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v3] In-Reply-To: References:

Message-ID: On Mon, 2 Dec 2024 12:06:31 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with four additional commits since the last revision: > > - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1_globals.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> src/hotspot/share/gc/g1/g1CardSet.cpp line 788: > 786: G1HeapRegion* r = G1CollectedHeap::heap()->region_at(region_idx); > 787: assert(r->rem_set()->card_set() != this, "must be"); > 788: #endif Since this introduces local vars, can they be grouped in a `{}` scope? src/hotspot/share/gc/g1/g1CollectionSet.cpp line 358: > 356: } > 357: > 358: uint num_optional_regions = _optional_groups.num_regions(); Seems unused. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 516: > 514: > 515: for (G1CSetCandidateGroup* group : *retained_groups) { > 516: assert(group->length() == 1, "Retained groups should have only 1 region"); Should this property be documented in where `_retain_groups` is defined, if it is an invariant? src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 320: > 318: } > 319: > 320: _from_marking_groups.append(current); I wonder if this part can be written somehow to eliminate some "duplicate" code, so that the following occur only once. _from_marking_groups.append(current); current = new G1CSetCandidateGroup(G1CollectedHeap::heap()->card_set_config()); num_added_to_group = 0; src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 74: > 72: G1CardSet _card_set; > 73: > 74: // Missing comment? src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 113: > 111: void clear(); > 112: > 113: void abandon(); It's not obvious how the two APIs differ, and which one to use in a certain scenario. Some docs would be nice. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 172: > 170: }; > 171: > 172: // Tracks all collection set candidates, i.e. regions that could/should be evacuated soon. Seems outdated now that fields are group list. src/hotspot/share/gc/g1/g1_globals.hpp line 285: > 283: "may exceed this limit as it is calculated based on G1MixedGCCountTarget.") \ > 284: range(1, 256) \ > 285: \ Better wrap text to align ``. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867639848 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867744730 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867758198 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867699763 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867657960 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867718224 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867760746 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867636495 From kbarrett at openjdk.org Tue Dec 3 13:55:59 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 13:55:59 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: Message-ID: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: remove phase invariant checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22287/files - new: https://git.openjdk.org/jdk/pull/22287/files/f1a1be24..2eb1814e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=02-03 Stats: 67 lines in 2 files changed: 1 ins; 50 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/22287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22287/head:pull/22287 PR: https://git.openjdk.org/jdk/pull/22287 From kbarrett at openjdk.org Tue Dec 3 13:56:00 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 13:56:00 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References:

Message-ID: <_dQ3A6UZTdGWF9cEaFFGL5z7FfxAX1C7bOhJ4grX3ro=.b3a1e3d7-7ee2-4c26-992c-aae217aa97be@github.com> On Mon, 2 Dec 2024 11:40:44 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: >> >> - num_allocators => max_allocators >> - fix comment typo >> - use struct/union instead of constants > > Some minor comments/suggestions. Since y'all (especially @albertnetymk ) seem to really dislike the phase checking, here's a version without. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22287#issuecomment-2514620559 From ayang at openjdk.org Tue Dec 3 13:59:42 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 3 Dec 2024 13:59:42 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 13:55:59 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove phase invariant checks Thank you for the simplification. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2475755016 From tschatzl at openjdk.org Tue Dec 3 14:21:39 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 3 Dec 2024 14:21:39 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 13:55:59 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove phase invariant checks still good. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2475825896 From zgu at openjdk.org Tue Dec 3 14:25:47 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 3 Dec 2024 14:25:47 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 13:55:59 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove phase invariant checks src/hotspot/share/gc/shared/partialArrayState.cpp line 110: > 108: _max_allocators(max_allocators), > 109: _registered_allocators(0), > 110: _released_allocators(0) `_released_allocators` is a debug only variable, should fail in release build ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1867815956 From kbarrett at openjdk.org Tue Dec 3 14:43:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 14:43:43 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 14:23:14 GMT, Zhengyu Gu wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> remove phase invariant checks > > src/hotspot/share/gc/shared/partialArrayState.cpp line 110: > >> 108: _max_allocators(max_allocators), >> 109: _registered_allocators(0), >> 110: _released_allocators(0) > > `_released_allocators` is a debug only variable, should fail in release build Well spotted. It seems I haven't done a release build since my final touch-up to make it debug-only. I'll push an update once it's been through our CI. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1867847941 From zgu at openjdk.org Tue Dec 3 15:17:41 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 3 Dec 2024 15:17:41 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 13:55:59 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove phase invariant checks src/hotspot/share/gc/shared/partialArrayState.cpp line 96: > 94: > 95: void PartialArrayStateAllocator::release(PartialArrayState* state) { > 96: size_t refcount = Atomic::sub(&state->_refcount, size_t(1), memory_order_release); Could you explain why `release` order is needed here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1867917303 From kbarrett at openjdk.org Tue Dec 3 15:53:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 15:53:43 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 15:15:10 GMT, Zhengyu Gu wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> remove phase invariant checks > > src/hotspot/share/gc/shared/partialArrayState.cpp line 96: > >> 94: >> 95: void PartialArrayStateAllocator::release(PartialArrayState* state) { >> 96: size_t refcount = Atomic::sub(&state->_refcount, size_t(1), memory_order_release); > > Could you explain why `release` order is needed here? This is part of the usual reference counting dance. Except, where did the acquire disappear to? There should be an acquire on the refcount == 0 branch! Looks like I accidentally deleted it. Sigh. Not too surprisingly, lots of tests were run without noticing that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1867980602 From kbarrett at openjdk.org Tue Dec 3 16:15:10 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 16:15:10 GMT Subject: RFR: 8345397: Remove from g1HeapRegionRemSet.cpp Message-ID: Please review this trivial removal of an unnecessary and improperly placed include of ``. Testing: mach5 tier1 ------------- Commit messages: - remove unneeded include Changes: https://git.openjdk.org/jdk/pull/22519/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22519&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345397 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22519/head:pull/22519 PR: https://git.openjdk.org/jdk/pull/22519 From shade at openjdk.org Tue Dec 3 16:16:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Dec 2024 16:16:40 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: On Tue, 3 Dec 2024 02:41:26 GMT, Y. Srinivas Ramakrishna wrote: > Fix documentation comment. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. I still don't quite understand if we need to implement `CardTableBarrierSet::on_slowpath_allocation_exit`. I see `SharedRuntime::on_slowpath_allocation_exit` is called from different places in VM. Are those really subsumed by safepoints? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22507#issuecomment-2514996606 From shade at openjdk.org Tue Dec 3 16:23:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Dec 2024 16:23:38 GMT Subject: RFR: 8345397: Remove from g1HeapRegionRemSet.cpp In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 16:09:29 GMT, Kim Barrett wrote: > Please review this trivial removal of an unnecessary and improperly placed > include of ``. > > Testing: mach5 tier1 Good and trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22519#pullrequestreview-2476182562 From kbarrett at openjdk.org Tue Dec 3 16:33:48 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 16:33:48 GMT Subject: RFR: 8345397: Remove from g1HeapRegionRemSet.cpp In-Reply-To: References:

Message-ID: <-6E7jPhk_wnalVUpCu-CP0wRGQdcPwjSePJTbFpnC9c=.830dc03e-7a3c-43ca-8e99-66be2b81797d@github.com> On Tue, 3 Dec 2024 16:21:01 GMT, Aleksey Shipilev wrote: >> Please review this trivial removal of an unnecessary and improperly placed >> include of ``. >> >> Testing: mach5 tier1 > > Good and trivial. Thanks for reviewing @shipilev ------------- PR Comment: https://git.openjdk.org/jdk/pull/22519#issuecomment-2515033509 From kbarrett at openjdk.org Tue Dec 3 16:33:49 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 16:33:49 GMT Subject: Integrated: 8345397: Remove from g1HeapRegionRemSet.cpp In-Reply-To: References: Message-ID: <_phjJjMuPSpvy_JJabyDGhqeAY__6Cd75D_-KwWsAbg=.48f6982f-34a8-4ad2-824e-192eef0e1865@github.com> On Tue, 3 Dec 2024 16:09:29 GMT, Kim Barrett wrote: > Please review this trivial removal of an unnecessary and improperly placed > include of ``. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: e1910f2d Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/e1910f2d19fce5cc78058154c7ddaaa8718973dc Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8345397: Remove from g1HeapRegionRemSet.cpp Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/22519 From rkennke at openjdk.org Tue Dec 3 16:48:42 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 3 Dec 2024 16:48:42 GMT Subject: RFR: 8345293: Fix generational Shenandoah with compact headers In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 11:09:37 GMT, Roman Kennke wrote: > See bug for crash details. > > The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. > > The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. > > Testing: > - [x] hotspot_gc_shenandoah +UCOH Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22477#issuecomment-2515074738 From rkennke at openjdk.org Tue Dec 3 16:48:42 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 3 Dec 2024 16:48:42 GMT Subject: Integrated: 8345293: Fix generational Shenandoah with compact headers In-Reply-To: References: Message-ID: <-DdF0QuKZhADfsHN75TUl4hsiMfUva11bgDbxSGUpe8=.d31cc97b-d3cb-4cb6-9e3d-6f4bdec92f01@github.com> On Mon, 2 Dec 2024 11:09:37 GMT, Roman Kennke wrote: > See bug for crash details. > > The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. > > The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. > > Testing: > - [x] hotspot_gc_shenandoah +UCOH This pull request has now been integrated. Changeset: e9f6ba05 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/e9f6ba05264ecb2f1ca3983ea503778f301bf280 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8345293: Fix generational Shenandoah with compact headers Reviewed-by: shade, stuefe, ysr ------------- PR: https://git.openjdk.org/jdk/pull/22477 From ysr at openjdk.org Tue Dec 3 17:28:41 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 3 Dec 2024 17:28:41 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks In-Reply-To: References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: On Tue, 3 Dec 2024 16:13:49 GMT, Aleksey Shipilev wrote: > I still don't quite understand if we need to implement `CardTableBarrierSet::on_slowpath_allocation_exit`. I see `SharedRuntime::on_slowpath_allocation_exit` is called from different places in VM. Are those really subsumed by safepoints? Slow path allocations in GenShen also happen only in young regions, never directly in the old generation, and do not need card-marks. I was hoping to convey that in the comment. Please let me know if I misunderstood your concern, and am missing a different mechanism through which initializing writes may happen in the old generation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22507#issuecomment-2515165468 From iwalulya at openjdk.org Tue Dec 3 19:56:23 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 3 Dec 2024 19:56:23 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v4] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Albert Review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/fbff7d78..e687b0cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=02-03 Stats: 71 lines in 4 files changed: 23 ins; 24 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From iwalulya at openjdk.org Tue Dec 3 19:59:50 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 3 Dec 2024 19:59:50 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v3] In-Reply-To: References:

Message-ID: On Tue, 3 Dec 2024 12:29:01 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request incrementally with four additional commits since the last revision: >> >> - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> >> - Update src/hotspot/share/gc/g1/g1_globals.hpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> >> - Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> >> - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > > src/hotspot/share/gc/g1/g1CardSet.cpp line 788: > >> 786: G1HeapRegion* r = G1CollectedHeap::heap()->region_at(region_idx); >> 787: assert(r->rem_set()->card_set() != this, "must be"); >> 788: #endif > > Since this introduces local vars, can they be grouped in a `{}` scope? It's possible, but I have not seen this done in the hotspot code. > src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 320: > >> 318: } >> 319: >> 320: _from_marking_groups.append(current); > > I wonder if this part can be written somehow to eliminate some "duplicate" code, so that the following occur only once. > > > _from_marking_groups.append(current); > current = new G1CSetCandidateGroup(G1CollectedHeap::heap()->card_set_config()); > num_added_to_group = 0; Suggestions are welcome, I failed to find a way to handle the corner case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1868289268 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1868289641 From ysr at openjdk.org Tue Dec 3 21:02:37 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 3 Dec 2024 21:02:37 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: <-kdMFM2kLRG0GOt9nD3Y4IR0mGBlVuAxM3vNnRI-R8U=.9a7400c2-94b8-45f4-a0d6-f940f30bc9f5@github.com> On Tue, 3 Dec 2024 02:41:26 GMT, Y. Srinivas Ramakrishna wrote: > Fix documentation comment. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. I chatted w/Aleksey and will take a slightly more conservative and future-proof approach to this. Withdrawing this PR to draft until I have made those changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22507#issuecomment-2515540383 From kbarrett at openjdk.org Tue Dec 3 21:51:01 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 21:51:01 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v5] In-Reply-To: References: Message-ID: > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - rescue lost acquire - Merge branch 'master' into new-pas-alloc - fix ref to debug-only member - remove phase invariant checks - num_allocators => max_allocators - fix comment typo - use struct/union instead of constants - simplify pas allocator destruction and manager phase tracking - parallel youngen uses new PAS - g1 uses refactored PAS - ... and 1 more: https://git.openjdk.org/jdk/compare/b0801928...5716bb5a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22287/files - new: https://git.openjdk.org/jdk/pull/22287/files/2eb1814e..5716bb5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=03-04 Stats: 116618 lines in 1650 files changed: 82592 ins; 25661 del; 8365 mod Patch: https://git.openjdk.org/jdk/pull/22287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22287/head:pull/22287 PR: https://git.openjdk.org/jdk/pull/22287 From kbarrett at openjdk.org Tue Dec 3 21:56:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 21:56:52 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v6] In-Reply-To: References: Message-ID: > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: - revert removal of orderAccess include - remove unused include of checkedCast.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22287/files - new: https://git.openjdk.org/jdk/pull/22287/files/5716bb5a..4fc0b5dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22287/head:pull/22287 PR: https://git.openjdk.org/jdk/pull/22287 From wkemper at openjdk.org Tue Dec 3 22:24:17 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 3 Dec 2024 22:24:17 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v10] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge jdk/master - Use count of regions uncommitted to compute uncommit delta - Decouple polling interval from uncommit time out - Log uncommitted delta and capacity - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread - Restore logging format, show change in committed heap, rather than usage - Allow commits initially - Use idiomatic name for CADR class - Improve comments - Do not notify uncommit thread when uncommit is forbidden - ... and 11 more: https://git.openjdk.org/jdk/compare/05ee562a...e70d874e ------------- Changes: https://git.openjdk.org/jdk/pull/22019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=09 Stats: 514 lines in 9 files changed: 387 ins; 94 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From ysr at openjdk.org Wed Dec 4 01:29:24 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Dec 2024 01:29:24 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v2] In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: > Fix documentation comment, and add an assertion check upon slowpath allocation. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. Y. Srinivas Ramakrishna has updated the pull request incrementally with three additional commits since the last revision: - virtual -> override missed in previous delta. Fix zero build (ReduceInitialCardMarks is defined only in JVMCI/Compiler2) - virtual -> override in derived class ShenandoahBarrierSet. - Refine previous change and future-proof ReduceInitialCardMarks for GenShen. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22507/files - new: https://git.openjdk.org/jdk/pull/22507/files/3bcd441f..23b8103d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=00-01 Stats: 19 lines in 3 files changed: 13 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22507/head:pull/22507 PR: https://git.openjdk.org/jdk/pull/22507 From ysr at openjdk.org Wed Dec 4 01:35:41 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Dec 2024 01:35:41 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks In-Reply-To: <-kdMFM2kLRG0GOt9nD3Y4IR0mGBlVuAxM3vNnRI-R8U=.9a7400c2-94b8-45f4-a0d6-f940f30bc9f5@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> <-kdMFM2kLRG0GOt9nD3Y4IR0mGBlVuAxM3vNnRI-R8U=.9a7400c2-94b8-45f4-a0d6-f940f30bc9f5@github.com> Message-ID: On Tue, 3 Dec 2024 20:59:56 GMT, Y. Srinivas Ramakrishna wrote: > I chatted w/Aleksey and will take a slightly more conservative and future-proof approach to this. Withdrawing this PR to draft until I have made those changes. Made a few changes; testing is in progress but PR is open again for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22507#issuecomment-2515969743 From cslucas at openjdk.org Wed Dec 4 02:07:37 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 4 Dec 2024 02:07:37 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v2] In-Reply-To: References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: On Wed, 4 Dec 2024 01:29:24 GMT, Y. Srinivas Ramakrishna wrote: >> Fix documentation comment, and add an assertion check upon slowpath allocation. >> >> I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. > > Y. Srinivas Ramakrishna has updated the pull request incrementally with three additional commits since the last revision: > > - virtual -> override missed in previous delta. > Fix zero build (ReduceInitialCardMarks is defined only in > JVMCI/Compiler2) > - virtual -> override in derived class ShenandoahBarrierSet. > - Refine previous change and future-proof ReduceInitialCardMarks for > GenShen. LGTM ------------- Marked as reviewed by cslucas (Author). PR Review: https://git.openjdk.org/jdk/pull/22507#pullrequestreview-2477104700 From shade at openjdk.org Wed Dec 4 09:51:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 09:51:40 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v2] In-Reply-To: References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: <_aZAH2XY5f1s57YkntosDO02T6OIyfk1CsK1BGbvRns=.887cb130-66c5-42ea-a631-b71136dff7f2@github.com> On Wed, 4 Dec 2024 01:29:24 GMT, Y. Srinivas Ramakrishna wrote: >> Fix documentation comment, and add an assertion check upon slowpath allocation. >> >> I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. > > Y. Srinivas Ramakrishna has updated the pull request incrementally with three additional commits since the last revision: > > - virtual -> override missed in previous delta. > Fix zero build (ReduceInitialCardMarks is defined only in > JVMCI/Compiler2) > - virtual -> override in derived class ShenandoahBarrierSet. > - Refine previous change and future-proof ReduceInitialCardMarks for > GenShen. Yes, good. Let's see if we catch any failure with this assert. src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp line 93: > 91: void ShenandoahBarrierSet::on_slowpath_allocation_exit(JavaThread* thread, oop new_obj) { > 92: #if COMPILER2_OR_JVMCI > 93: assert(!(ReduceInitialCardMarks && ShenandoahCardBarrier) || ShenandoahGenerationalHeap::heap()->is_in_young(new_obj), Not sure why first two are grouped, looks more understandable if written like this? Your call. Suggestion: assert(!ReduceInitialCardMarks || !ShenandoahCardBarrier || ShenandoahGenerationalHeap::heap()->is_in_young(new_obj), ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22507#pullrequestreview-2477889118 PR Review Comment: https://git.openjdk.org/jdk/pull/22507#discussion_r1869089707 From shade at openjdk.org Wed Dec 4 11:17:43 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 11:17:43 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v10] In-Reply-To: References:

Message-ID: <1XixmNqCS5lLRkkel0t8O9bDHJ7itL2zOy968aNNFsk=.742dca37-11b2-4338-8686-fabbc4ffc5c2@github.com> On Tue, 3 Dec 2024 22:24:17 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge jdk/master > - Use count of regions uncommitted to compute uncommit delta > - Decouple polling interval from uncommit time out > - Log uncommitted delta and capacity > - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread > - Restore logging format, show change in committed heap, rather than usage > - Allow commits initially > - Use idiomatic name for CADR class > - Improve comments > - Do not notify uncommit thread when uncommit is forbidden > - ... and 11 more: https://git.openjdk.org/jdk/compare/05ee562a...e70d874e Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2478196184 From zgu at openjdk.org Wed Dec 4 13:43:40 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 4 Dec 2024 13:43:40 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com>

Message-ID: On Tue, 3 Dec 2024 15:51:19 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/shared/partialArrayState.cpp line 96: >> >>> 94: >>> 95: void PartialArrayStateAllocator::release(PartialArrayState* state) { >>> 96: size_t refcount = Atomic::sub(&state->_refcount, size_t(1), memory_order_release); >> >> Could you explain why `release` order is needed here? > > This is part of the usual reference counting dance. Except, where did the > acquire disappear to? There should be an acquire on the refcount == 0 branch! > Looks like I accidentally deleted it. Sigh. Not too surprisingly, lots of > tests were run without noticing that. Make sense. My next question is that, if `PartialArrayState` is ever crossed thread boundaries, it is through job stealing via task queues. Can we depend on barriers of task queue to ensure the memory safety? because we don't need any additional barriers for objects popped/stole from task queues in other places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1869520669 From kbarrett at openjdk.org Wed Dec 4 15:19:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 4 Dec 2024 15:19:42 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com>

Message-ID: On Wed, 4 Dec 2024 13:40:55 GMT, Zhengyu Gu wrote: >> This is part of the usual reference counting dance. Except, where did the >> acquire disappear to? There should be an acquire on the refcount == 0 branch! >> Looks like I accidentally deleted it. Sigh. Not too surprisingly, lots of >> tests were run without noticing that. > > Make sense. My next question is that, if `PartialArrayState` is ever crossed thread boundaries, it is through job stealing via task queues. Can we depend on barriers of task queue to ensure the memory safety? because we don't need any additional barriers for objects popped/stole from task queues in other places. Yes, it's needed. The purpose of this release/acquire pair is to ensure there is a happens-before chain between the use of a State and it's cleanup/reuse. Transfers through the taskqueue don't help with that. In other places, we have operations on one side of the taskqueue that need to be ordered wrto operations on the other side of the taskqueue. We don't have that here. Consider two threads which have both obtained access to a State. (At least one of them must have obtained it via stealing from another thread.) Assume these are the last two references to the State (it's refcount == 2), and no further tasks for it will be needed. These two threads will use the State (getting source/destination, claiming a chunk), and then release the State, decrementing its refcount. So one of them will decrement to 0. We need to ensure the cleanup that follows that can't corrupt the use by the accesses made by the other thread, by ensuring those accesses happen-before the cleanup. There is no intervening taskqueue manipulation in this scenario. The operations we need to order are all on the same side of the taskqueue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1869759960 From xpeng at openjdk.org Wed Dec 4 16:04:06 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 16:04:06 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup Message-ID: Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: TIP: [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms Parallelized: [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` ### Additional test - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah ------------- Commit messages: - Remove _trash_regions - Completely remove heap lock from recycling trashed regions - Void reordering - Rename recycling to _recycling - Remove comments - Parallelize concurrent cleanup and make recycling trashed regions mostly lock-free Changes: https://git.openjdk.org/jdk/pull/22538/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345423 Stats: 211 lines in 13 files changed: 83 ins; 55 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From ysr at openjdk.org Wed Dec 4 17:54:44 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Dec 2024 17:54:44 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v2] In-Reply-To: <_aZAH2XY5f1s57YkntosDO02T6OIyfk1CsK1BGbvRns=.887cb130-66c5-42ea-a631-b71136dff7f2@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> <_aZAH2XY5f1s57YkntosDO02T6OIyfk1CsK1BGbvRns=.887cb130-66c5-42ea-a631-b71136dff7f2@github.com> Message-ID: On Wed, 4 Dec 2024 09:48:20 GMT, Aleksey Shipilev wrote: >> Y. Srinivas Ramakrishna has updated the pull request incrementally with three additional commits since the last revision: >> >> - virtual -> override missed in previous delta. >> Fix zero build (ReduceInitialCardMarks is defined only in >> JVMCI/Compiler2) >> - virtual -> override in derived class ShenandoahBarrierSet. >> - Refine previous change and future-proof ReduceInitialCardMarks for >> GenShen. > > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp line 93: > >> 91: void ShenandoahBarrierSet::on_slowpath_allocation_exit(JavaThread* thread, oop new_obj) { >> 92: #if COMPILER2_OR_JVMCI >> 93: assert(!(ReduceInitialCardMarks && ShenandoahCardBarrier) || ShenandoahGenerationalHeap::heap()->is_in_young(new_obj), > > Not sure why first two are grouped, looks more understandable if written like this? Your call. > > Suggestion: > > assert(!ReduceInitialCardMarks || !ShenandoahCardBarrier || ShenandoahGenerationalHeap::heap()->is_in_young(new_obj), The deMorganization of the first disjunct does make it easier to read as you state. My reason to write it in its first form was because I tend to think of `(not A) or B` as `A implies B` (and written in the first form because C lacks an `implies` operator). I'll rewrite as you suggest before I push this. Thanks for the review! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22507#discussion_r1870023768 From xpeng at openjdk.org Wed Dec 4 19:06:55 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 19:06:55 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v2] In-Reply-To: References: Message-ID: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into parallel-cleanup - Remove _trash_regions - Completely remove heap lock from recycling trashed regions - Void reordering - Rename recycling to _recycling - Remove comments - Parallelize concurrent cleanup and make recycling trashed regions mostly lock-free ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/f3b8dff4..7f7b370a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=00-01 Stats: 10324 lines in 396 files changed: 5350 ins; 3376 del; 1598 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From wkemper at openjdk.org Wed Dec 4 19:06:58 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Dec 2024 19:06:58 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v2] In-Reply-To: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> References: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> Message-ID: On Wed, 4 Dec 2024 19:02:21 GMT, Xiaolong Peng wrote: >> Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> ### Additional test >> - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into parallel-cleanup > - Remove _trash_regions > - Completely remove heap lock from recycling trashed regions > - Void reordering > - Rename recycling to _recycling > - Remove comments > - Parallelize concurrent cleanup and make recycling trashed regions mostly lock-free Changes look good. Left some nits. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1049: > 1047: > 1048: void ShenandoahConcurrentGC::op_cleanup_early() { > 1049: ShenandoahWorkerScope scope(ShenandoahHeap::heap()->workers(), Can we align these arguments with the first argument after the `(`? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1262: > 1260: public: > 1261: ShenandoahRecycleTrashedRegionTask() : > 1262: WorkerTask("Shenandoah Recycle trashed region.") {} Should be "Shenandoah Recycle Trashed Regions" (no period, title case). src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1265: > 1263: > 1264: void work(uint worker_id) { > 1265: const ShenandoahHeap* heap = ShenandoahHeap::heap(); `heap` looks unused here. src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 890: > 888: size_t ShenandoahGeneration::decrement_affiliated_region_count() { > 889: // Assertions only hold true for Java threads since they call this method under heap lock. > 890: bool const is_java_thread = Thread::current()->is_Java_thread(); Prefer not to check `Thread::current` to gate assertions. Could we use an `#ifdef ASSERT` block here? Could this be `decrease_affiliated_region_count(1)` instead? or should we have a separate `decrement_under_lock` method? src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 592: > 590: shenandoah_assert_heaplocked(); > 591: if (is_trash() && _recycling.try_set()) { > 592: if (is_trash()) { Is it necessary to check `is_trash` a second time while the heap lock is held? Also, if it _is_ necessary, then it seems like we should `_recycling.unset` in the scope where `_recycling.try_set` happened. As it is, if the second check for `is_trash` was `false`, then the code would not `_recycling.unset`. ------------- Changes requested by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2479579023 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870112931 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870083766 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870085143 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870092090 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870100238 From zgu at openjdk.org Wed Dec 4 19:56:39 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 4 Dec 2024 19:56:39 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v6] In-Reply-To: References:

Message-ID: On Wed, 4 Dec 2024 15:16:30 GMT, Kim Barrett wrote: >> Make sense. My next question is that, if `PartialArrayState` is ever crossed thread boundaries, it is through job stealing via task queues. Can we depend on barriers of task queue to ensure the memory safety? because we don't need any additional barriers for objects popped/stole from task queues in other places. > > Yes, it's needed. The purpose of this release/acquire pair is to ensure there > is a happens-before chain between the use of a State and it's cleanup/reuse. > Transfers through the taskqueue don't help with that. > > In other places, we have operations on one side of the taskqueue that need to > be ordered wrto operations on the other side of the taskqueue. We don't have > that here. > > Consider two threads which have both obtained access to a State. (At least > one of them must have obtained it via stealing from another thread.) Assume > these are the last two references to the State (it's refcount == 2), and no > further tasks for it will be needed. These two threads will use the State > (getting source/destination, claiming a chunk), and then release the State, > decrementing its refcount. So one of them will decrement to 0. We need to > ensure the cleanup that follows that can't corrupt the use by the accesses > made by the other thread, by ensuring those accesses happen-before the > cleanup. There is no intervening taskqueue manipulation in this scenario. > The operations we need to order are all on the same side of the taskqueue. I see. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1870196252 From wkemper at openjdk.org Wed Dec 4 20:52:55 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Dec 2024 20:52:55 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v11] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread - Merge jdk/master - Use count of regions uncommitted to compute uncommit delta - Decouple polling interval from uncommit time out - Log uncommitted delta and capacity - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread - Restore logging format, show change in committed heap, rather than usage - Allow commits initially - Use idiomatic name for CADR class - Improve comments - ... and 12 more: https://git.openjdk.org/jdk/compare/1a73c76d...c39be0f9 ------------- Changes: https://git.openjdk.org/jdk/pull/22019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=10 Stats: 514 lines in 9 files changed: 387 ins; 94 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From wkemper at openjdk.org Wed Dec 4 20:52:56 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Dec 2024 20:52:56 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v8] In-Reply-To: References:

Message-ID: On Tue, 26 Nov 2024 10:16:58 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 75: >> >>> 73: MonitorLocker locker(&_stop_lock, Mutex::_no_safepoint_check_flag); >>> 74: if (!_stop_requested.is_set()) { >>> 75: locker.wait((int64_t)shrink_period); >> >> I tried to test this on some of my toy examples, and realized this particular line may end up as `locker.wait(0)`, which means "wait indefinitely, until notified". This breaks periodic commits. The old code rode on control thread doing `MAX2(1, ...)`, so we never feed `0` into `wait`. I am also confused about units. The comment above says `shrink_period` is in seconds, but `locker.wait` accepts milliseconds? > > It sounds like this line should be: > > > locker.wait(MAX2(1, shrink_period * 1000)); Sorry, I missed your comments here. I noticed the same and have refactored this code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1870257677 From xpeng at openjdk.org Wed Dec 4 21:26:38 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 21:26:38 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v2] In-Reply-To: References: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> Message-ID: On Wed, 4 Dec 2024 18:54:13 GMT, William Kemper wrote: > Changes look good. Left some nits. Thanks for looking into it, I'll fix them and update the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22538#issuecomment-2518591374 From xpeng at openjdk.org Wed Dec 4 21:41:02 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 21:41:02 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v3] In-Reply-To: References: Message-ID: > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/7f7b370a..50e633f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=01-02 Stats: 9 lines in 3 files changed: 0 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From xpeng at openjdk.org Wed Dec 4 21:43:52 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 21:43:52 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v4] In-Reply-To: References: Message-ID: <0zXiV5vIfQnOWKstBgeUMlbjqem_BoQyzqt3laUw030=.665d49f2-019c-4f98-b276-8aafe1494513@github.com> > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix naming issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/50e633f2..404f7f98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From xpeng at openjdk.org Wed Dec 4 21:53:40 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 21:53:40 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v4] In-Reply-To: References: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> Message-ID: On Wed, 4 Dec 2024 18:42:42 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix naming issue > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 592: > >> 590: shenandoah_assert_heaplocked(); >> 591: if (is_trash() && _recycling.try_set()) { >> 592: if (is_trash()) { > > Is it necessary to check `is_trash` a second time while the heap lock is held? Also, if it _is_ necessary, then it seems like we should `_recycling.unset` in the scope where `_recycling.try_set` happened. As it is, if the second check for `is_trash` was `false`, then the code would not `_recycling.unset`. This method is only called by mutators with heap lock, is_trash is not tested before calling the method, it might be worth to test before calling _recycling.try_set(), otherwise mutator will mostly(fast path) behaves like: 1. _recycling.try_set() -> true (always try to perform CAS, it is slower, we want to void it in fast path). 2. is_trash() -> false and skip the recycling. But we want to fast path for mutator to be like: `is_trash() -> false && _recycling.is_set() -> false`. I have remove the `is_trash` from the code path executed by concurrent cleanup in the new version, that one is not needed since `is_trash` is tested in `ShenandoahRecycleTrashedRegionsTask` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870327496 From xpeng at openjdk.org Wed Dec 4 22:08:39 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 22:08:39 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v4] In-Reply-To: References: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> Message-ID: On Wed, 4 Dec 2024 18:36:28 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix naming issue > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 890: > >> 888: size_t ShenandoahGeneration::decrement_affiliated_region_count() { >> 889: // Assertions only hold true for Java threads since they call this method under heap lock. >> 890: bool const is_java_thread = Thread::current()->is_Java_thread(); > > Prefer not to check `Thread::current` to gate assertions. Could we use an `#ifdef ASSERT` block here? Could this be `decrease_affiliated_region_count(1)` instead? or should we have a separate `decrement_under_lock` method? It will be weird if I only change `decrement_affiliated_region_count` to `decrement_affiliated_region_count_under_lock` in this file, while all the other `decrement_x` / `decrease_x` /`increment_x` / `increase_x` methods in files follow the same convention. It is probably better to add new one like `decrement_affiliated_region_count_without_lock` and not change the existing methods' behavior. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870342598 From xpeng at openjdk.org Wed Dec 4 22:16:53 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 22:16:53 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v5] In-Reply-To: References: Message-ID: > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add decrement_affiliated_region_count_without_lock ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/404f7f98..75cb902c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=03-04 Stats: 17 lines in 3 files changed: 6 ins; 6 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From ysr at openjdk.org Wed Dec 4 23:45:07 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Dec 2024 23:45:07 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v3] In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: > Fix documentation comment, and add an assertion check upon slowpath allocation. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into ricm - virtual -> override missed in previous delta. Fix zero build (ReduceInitialCardMarks is defined only in JVMCI/Compiler2) - virtual -> override in derived class ShenandoahBarrierSet. - Refine previous change and future-proof ReduceInitialCardMarks for GenShen. - Fix up documentation comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22507/files - new: https://git.openjdk.org/jdk/pull/22507/files/23b8103d..76ab8f3b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=01-02 Stats: 10622 lines in 392 files changed: 5385 ins; 3453 del; 1784 mod Patch: https://git.openjdk.org/jdk/pull/22507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22507/head:pull/22507 PR: https://git.openjdk.org/jdk/pull/22507 From xpeng at openjdk.org Thu Dec 5 08:58:52 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 5 Dec 2024 08:58:52 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v6] In-Reply-To: References: Message-ID: > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Ensure atomicity when access region state - Bug fix and move is_trash test into try_recycle ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/75cb902c..11941c57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=04-05 Stats: 29 lines in 2 files changed: 6 ins; 4 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From iwalulya at openjdk.org Thu Dec 5 11:03:40 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 5 Dec 2024 11:03:40 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v6] In-Reply-To: References:

Message-ID: On Tue, 3 Dec 2024 21:56:52 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - revert removal of orderAccess include > - remove unused include of checkedCast.hpp Thanks for all the reviews and discussion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22287#issuecomment-2520968876 From kbarrett at openjdk.org Thu Dec 5 17:50:46 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 5 Dec 2024 17:50:46 GMT Subject: Integrated: 8344665: Refactor PartialArrayState allocation for reuse In-Reply-To: References: Message-ID: <1htnBYY8OrM-SL5dfvr33utwqvCS1xGtGSV_cIiNXRY=.0201fe46-446c-4f69-b162-f42b46603c0c@github.com> On Wed, 20 Nov 2024 23:40:41 GMT, Kim Barrett wrote: > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 This pull request has now been integrated. Changeset: dbf48a53 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/dbf48a53eca74380b279ce6be3bab2a6a248f7f2 Stats: 265 lines in 14 files changed: 136 ins; 54 del; 75 mod 8344665: Refactor PartialArrayState allocation for reuse Reviewed-by: tschatzl, ayang, iwalulya, zgu ------------- PR: https://git.openjdk.org/jdk/pull/22287 From wkemper at openjdk.org Thu Dec 5 17:58:46 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Dec 2024 17:58:46 GMT Subject: Integrated: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 17:31:58 GMT, William Kemper wrote: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. This pull request has now been integrated. Changeset: bedb68ab Author: William Kemper URL: https://git.openjdk.org/jdk/commit/bedb68aba126c6400ce9f2182105b5294ff42021 Stats: 514 lines in 9 files changed: 387 ins; 94 del; 33 mod 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread Reviewed-by: shade, kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/22019 From wkemper at openjdk.org Thu Dec 5 18:22:46 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Dec 2024 18:22:46 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v6] In-Reply-To: References: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com>

Message-ID: On Wed, 4 Dec 2024 21:50:26 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 592: >> >>> 590: shenandoah_assert_heaplocked(); >>> 591: if (is_trash() && _recycling.try_set()) { >>> 592: if (is_trash()) { >> >> Is it necessary to check `is_trash` a second time while the heap lock is held? Also, if it _is_ necessary, then it seems like we should `_recycling.unset` in the scope where `_recycling.try_set` happened. As it is, if the second check for `is_trash` was `false`, then the code would not `_recycling.unset`. > > This method is only called by mutators with heap lock, is_trash is not tested before calling the method, it is worth to test before calling _recycling.try_set(), otherwise mutator will mostly(fast path) behaves like: > 1. _recycling.try_set() -> true (always try to perform CAS, it is slower, we want to void it in fast path). > 2. is_trash() -> false and skip the recycling. > 3. _recycling.unset() (Should be also avoided in fast path) > > But we want to fast path for mutator to be like: `is_trash() -> false && _recycling.is_set() -> false`. > > > I have removed `is_trash` test from the code path executed by concurrent cleanup in the new version, that one is not needed since `is_trash` is tested in `ShenandoahRecycleTrashedRegionsTask` Okay, I get it. The second test on line 593 is necessary because the gc workers don't hold the lock and could _in theory_ recycle the region between the first `is_trash` check on 592 and the `_recycling.try_set`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1871889122 From wkemper at openjdk.org Thu Dec 5 18:25:39 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Dec 2024 18:25:39 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v6] In-Reply-To: References:

Message-ID: On Thu, 5 Dec 2024 08:58:52 GMT, Xiaolong Peng wrote: >> Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> ### Additional test >> - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: > > - Ensure atomicity when access region state > - Bug fix and move is_trash test into try_recycle Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 629: > 627: _recycling.unset(); > 628: } else { > 629: while (_recycling.is_set()) { Why are adding this? Won't this make the calling worker thread wait on another worker to recycle the region? src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 385: > 383: void print_on(outputStream* st) const; > 384: > 385: void recycle_under_lock(); Should be `try_recycle_under_lock` for consistency. ------------- PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2482551965 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1871890730 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1871891590 From xpeng at openjdk.org Thu Dec 5 18:41:47 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 5 Dec 2024 18:41:47 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v6] In-Reply-To: References:

Message-ID: On Thu, 5 Dec 2024 18:21:59 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: >> >> - Ensure atomicity when access region state >> - Bug fix and move is_trash test into try_recycle > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 629: > >> 627: _recycling.unset(); >> 628: } else { >> 629: while (_recycling.is_set()) { > > Why are adding this? Won't this make the calling worker thread wait on another worker to recycle the region? hmm, didn't include this intentionally, forgot to remove it from commit, sorry I'll remove it, thanks for catching it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1871911601 From xpeng at openjdk.org Thu Dec 5 18:47:55 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 5 Dec 2024 18:47:55 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v7] In-Reply-To: References: Message-ID: > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Renaming and remove unnecessary code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/11941c57..368c6aae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=05-06 Stats: 10 lines in 4 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From xpeng at openjdk.org Thu Dec 5 18:50:41 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 5 Dec 2024 18:50:41 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v6] In-Reply-To: References:

Message-ID: On Thu, 5 Dec 2024 18:22:46 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: >> >> - Ensure atomicity when access region state >> - Bug fix and move is_trash test into try_recycle > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 385: > >> 383: void print_on(outputStream* st) const; >> 384: >> 385: void recycle_under_lock(); > > Should be `try_recycle_under_lock` for consistency. Thanks! I have renamed it for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1871923713 From ysr at openjdk.org Thu Dec 5 19:50:26 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 5 Dec 2024 19:50:26 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v4] In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: > Fix documentation comment, and add an assertion check upon slowpath allocation. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: Demorganization of clause in assert per review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22507/files - new: https://git.openjdk.org/jdk/pull/22507/files/76ab8f3b..040b7e36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22507/head:pull/22507 PR: https://git.openjdk.org/jdk/pull/22507 From shade at openjdk.org Thu Dec 5 19:50:26 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 5 Dec 2024 19:50:26 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v4] In-Reply-To: References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: <1Oe4Zice-3cTcLULF41aFpWTEYtdSWvtKGPjU5vS-OI=.8ca333b0-e3be-48ae-bae5-3b4e7302b5aa@github.com> On Thu, 5 Dec 2024 19:47:00 GMT, Y. Srinivas Ramakrishna wrote: >> Fix documentation comment, and add an assertion check upon slowpath allocation. >> >> I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. > > Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: > > Demorganization of clause in assert per review feedback Still fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22507#pullrequestreview-2482712946 From ysr at openjdk.org Thu Dec 5 19:50:28 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 5 Dec 2024 19:50:28 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v3] In-Reply-To: References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: <0UllfVoUy2tqPKQQHI1wozH1uw0X60ziff5--HiONJg=.57e4f59c-0e59-41d4-accc-38532eb215a8@github.com> On Wed, 4 Dec 2024 23:45:07 GMT, Y. Srinivas Ramakrishna wrote: >> Fix documentation comment, and add an assertion check upon slowpath allocation. >> >> I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. > > Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into ricm > - virtual -> override missed in previous delta. > Fix zero build (ReduceInitialCardMarks is defined only in > JVMCI/Compiler2) > - virtual -> override in derived class ShenandoahBarrierSet. > - Refine previous change and future-proof ReduceInitialCardMarks for > GenShen. > - Fix up documentation comment. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22507#issuecomment-2521256371 From ysr at openjdk.org Thu Dec 5 19:50:28 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 5 Dec 2024 19:50:28 GMT Subject: Integrated: 8344593: GenShen: Review of ReduceInitialCardMarks In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: On Tue, 3 Dec 2024 02:41:26 GMT, Y. Srinivas Ramakrishna wrote: > Fix documentation comment, and add an assertion check upon slowpath allocation. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. This pull request has now been integrated. Changeset: a97dca52 Author: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/jdk/commit/a97dca52c9257121fc96613a4b591920c1c3e31a Stats: 28 lines in 3 files changed: 18 ins; 0 del; 10 mod 8344593: GenShen: Review of ReduceInitialCardMarks Reviewed-by: shade, cslucas ------------- PR: https://git.openjdk.org/jdk/pull/22507 From stefank at openjdk.org Fri Dec 6 10:21:46 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 6 Dec 2024 10:21:46 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code Message-ID: The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. Tested with tier1-3 ------------- Commit messages: - 8345659: Fix broken alignment after ReservedSpace splitting in GC code Changes: https://git.openjdk.org/jdk/pull/22602/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22602&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345659 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22602.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22602/head:pull/22602 PR: https://git.openjdk.org/jdk/pull/22602 From ayang at openjdk.org Mon Dec 9 09:07:41 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 9 Dec 2024 09:07:41 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v3] In-Reply-To: References:

Message-ID: On Tue, 3 Dec 2024 19:56:54 GMT, Ivan Walulya wrote: >> src/hotspot/share/gc/g1/g1CardSet.cpp line 788: >> >>> 786: G1HeapRegion* r = G1CollectedHeap::heap()->region_at(region_idx); >>> 787: assert(r->rem_set()->card_set() != this, "must be"); >>> 788: #endif >> >> Since this introduces local vars, can they be grouped in a `{}` scope? > > It's possible, but I have not seen this done in the hotspot code. With a quick search, I can find some in runtime code, though not universal. Regardless, it's better encapsulation, IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1875612042 From ayang at openjdk.org Mon Dec 9 09:07:45 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 9 Dec 2024 09:07:45 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v4] In-Reply-To: References:

Message-ID: <1NPhKMIwfLaQe7sm34Mi6aDIfXavGGCGniWkUgOlgbs=.51f1c252-9616-4985-993b-415e1f5d90e8@github.com> On Tue, 3 Dec 2024 19:56:23 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Albert Review src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3055: > 3053: } > 3054: > 3055: void G1CollectedHeap::prepare_group_cardsets_for_scan () { Pre-existing: extra space. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 655: > 653: G1HeapRegion* r = ci._r; > 654: r->uninstall_group_cardset(); > 655: r->rem_set()->set_state_complete(); Why changing the remset state here? I'd expect it's already complete; otherwise, how can it be added to cset? src/hotspot/share/gc/g1/g1CollectionSet.inline.hpp line 32: > 30: > 31: template > 32: inline void G1CollectionSet::merge_cardsets_for_collection_groups(G1CollectedHeap* g1h, CardOrRangeVisitor& cl, uint worker_id, uint num_workers) { The first arg seems unused. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 38: > 36: { } > 37: > 38: void G1CSetCandidateGroup::add(G1HeapRegion* hr) { I believe this method is only for retained regions; if so, one can make that explicit by naming it sth like `add_region_region`. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 45: > 43: void G1CSetCandidateGroup::add(G1CollectionSetCandidateInfo& hr_info) { > 44: G1HeapRegion* hr = hr_info._r; > 45: assert(!hr->is_young(), "should be flagged as survivor region"); Can one assert region is Old here? src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 180: > 178: void G1CSetCandidateGroupList::prepare_for_scan() { > 179: for (G1CSetCandidateGroup* gr : _groups) { > 180: gr->card_set()->reset_table_scanner(); This is a group card set, so why not calling `reset_table_scanner_for_groups`? src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 345: > 343: G1CSetCandidateGroupList other_marking_groups; > 344: G1CSetCandidateGroupList other_retained_groups; > 345: Extra blank line. src/hotspot/share/gc/g1/g1HeapRegion.cpp line 144: > 142: if (is_young() || is_free()) { > 143: return -1.0; > 144: } I don't get why young-regions are treated specially. Also, it's weird that "free" region needs to have a gc-efficiency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1874099261 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1874028649 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1873380805 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1873276256 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1873280032 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1874100872 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1873300822 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1873254478 From sjohanss at openjdk.org Mon Dec 9 09:18:37 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 9 Dec 2024 09:18:37 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 12:04:20 GMT, Albert Mingkun Yang wrote: > This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). > > Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. Looks good, thanks for doing the fix. Just a small comment on the comment. src/hotspot/share/gc/shared/genArguments.cpp line 41: > 39: > 40: // If InitialHeapSize or MinHeapSize is not set on cmdline, this variable, > 41: // together with NewSize, are used to derive them. Suggestion: // together with NewSize, is used to derive them. ------------- Marked as reviewed by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22575#pullrequestreview-2488189045 PR Review Comment: https://git.openjdk.org/jdk/pull/22575#discussion_r1875622629 From ayang at openjdk.org Mon Dec 9 10:27:53 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 9 Dec 2024 10:27:53 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully [v2] In-Reply-To: References: Message-ID: > This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). > > Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/gc/shared/genArguments.cpp Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22575/files - new: https://git.openjdk.org/jdk/pull/22575/files/3a28b9c5..c3600d5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22575&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22575&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22575/head:pull/22575 PR: https://git.openjdk.org/jdk/pull/22575 From sjohanss at openjdk.org Mon Dec 9 11:52:38 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 9 Dec 2024 11:52:38 GMT Subject: RFR: 8345217: Parallel: Refactor PSParallelCompact::next_src_region In-Reply-To: References: Message-ID: <1MuJ-CNF894oRU90Aadcm5PUC1dbKP2hJp38oDycp4M=.2656a056-f497-4048-9905-5e4653780361@github.com> On Thu, 28 Nov 2024 15:22:33 GMT, Albert Mingkun Yang wrote: > Simple removing some unnecessary calculations in locating the next source-region during full-gc. > > Test: tier1-5 Marked as reviewed by sjohanss (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22441#pullrequestreview-2488566791 From xpeng at openjdk.org Mon Dec 9 20:42:21 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 9 Dec 2024 20:42:21 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References: Message-ID: > Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Use parallel_heap_region_iterate to walk the regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/368c6aae..4507656e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=06-07 Stats: 13 lines in 1 file changed: 1 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From wkemper at openjdk.org Mon Dec 9 22:13:38 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 9 Dec 2024 22:13:38 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References:

Message-ID: <7dozpUw8xDi1lZPEjbaKvAwsaQJrM6piABij7_hwXzI=.33525ec5-f57d-40cd-b312-c0eed413034b@github.com> On Mon, 9 Dec 2024 20:42:21 GMT, Xiaolong Peng wrote: >> Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> ### Additional test >> - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Use parallel_heap_region_iterate to walk the regions Thanks for the updates. It looks good to me! ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2490222353 From zgu at openjdk.org Tue Dec 10 00:19:46 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 10 Dec 2024 00:19:46 GMT Subject: RFR: 8345217: Parallel: Refactor PSParallelCompact::next_src_region In-Reply-To: References: Message-ID: <9qPBcD4Tupk4Q_w5KqbryVVukyoCxaU7GgcVUfDH60M=.ac5fbc93-a783-4003-b24b-78863efc5c13@github.com> On Thu, 28 Nov 2024 15:22:33 GMT, Albert Mingkun Yang wrote: > Simple removing some unnecessary calculations in locating the next source-region during full-gc. > > Test: tier1-5 LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22441#pullrequestreview-2490452909 From xpeng at openjdk.org Tue Dec 10 01:18:52 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 10 Dec 2024 01:18:52 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References:

Message-ID: On Mon, 9 Dec 2024 20:42:21 GMT, Xiaolong Peng wrote: >> Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> For the same test test, but with large heap with 32G memory, the improvement on concurrent cleanup is much smaller, which might be related t... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Use parallel_heap_region_iterate to walk the regions @kdnilsen @ysramakrishna @shipilev I'm gonna need more reviews for the change, thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22538#issuecomment-2529963162 From ayang at openjdk.org Tue Dec 10 08:31:45 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 10 Dec 2024 08:31:45 GMT Subject: RFR: 8345217: Parallel: Refactor PSParallelCompact::next_src_region In-Reply-To: References: Message-ID: <7xCEvXMFQ_QlFFRNmzfpM8Zpo84v4JdoqA6HRtER5NM=.8c6972d5-47e8-4dc3-ba9f-f663727589b1@github.com> On Thu, 28 Nov 2024 15:22:33 GMT, Albert Mingkun Yang wrote: > Simple removing some unnecessary calculations in locating the next source-region during full-gc. > > Test: tier1-5 Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22441#issuecomment-2530787590 From ayang at openjdk.org Tue Dec 10 08:31:46 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 10 Dec 2024 08:31:46 GMT Subject: Integrated: 8345217: Parallel: Refactor PSParallelCompact::next_src_region In-Reply-To: References: Message-ID: <0kgbjdw_0-tnv06LPwXnaJ4QtCs9ALu7gig0-0eZt1w=.efe14207-8e9f-40b9-90b0-d1f9905aed9c@github.com> On Thu, 28 Nov 2024 15:22:33 GMT, Albert Mingkun Yang wrote: > Simple removing some unnecessary calculations in locating the next source-region during full-gc. > > Test: tier1-5 This pull request has now been integrated. Changeset: 7e73c436 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/7e73c436ef5cc035304347bf64ae8e2b4ce45ab1 Stats: 10 lines in 1 file changed: 0 ins; 7 del; 3 mod 8345217: Parallel: Refactor PSParallelCompact::next_src_region Reviewed-by: tschatzl, sjohanss, zgu ------------- PR: https://git.openjdk.org/jdk/pull/22441 From tschatzl at openjdk.org Tue Dec 10 11:06:38 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 10 Dec 2024 11:06:38 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully [v2] In-Reply-To: References:

Message-ID: On Mon, 9 Dec 2024 10:27:53 GMT, Albert Mingkun Yang wrote: >> This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). >> >> Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/gc/shared/genArguments.cpp > > Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> src/hotspot/share/gc/shared/genArguments.cpp line 43: > 41: // together with NewSize, is used to derive them. > 42: // Using the same value when it was a configurable flag to avoid breakage. > 43: // See more in JDK-8345323 I do not like referrals to the bug tracker in the code, and/or referring to some code the past ("when it was configurable"). Better explicitly state the problem with heap sizing and large pages and file a follow-up RFE (not mentioning it here). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22575#discussion_r1877886973 From tschatzl at openjdk.org Tue Dec 10 11:10:38 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 10 Dec 2024 11:10:38 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully [v2] In-Reply-To: References:

Message-ID: On Tue, 10 Dec 2024 11:04:09 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/gc/shared/genArguments.cpp >> >> Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> > > src/hotspot/share/gc/shared/genArguments.cpp line 43: > >> 41: // together with NewSize, is used to derive them. >> 42: // Using the same value when it was a configurable flag to avoid breakage. >> 43: // See more in JDK-8345323 > > I do not like referrals to the bug tracker in the code, and/or referring to some code the past ("when it was configurable"). > Better explicitly state the problem with heap sizing and large pages and file a follow-up RFE (not mentioning it here). I.e. something like "If the default value of OldSize is too small, then ..., leading to the generations not aligned and not being able to allocate large pages" or so. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22575#discussion_r1877892382 From kbarrett at openjdk.org Tue Dec 10 16:42:49 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 10 Dec 2024 16:42:49 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n Message-ID: Please review this change to zUtils.cpp to use a for-loop to fill a block of memory rather than using the std::fill_n algorithm. Use of is currently not permitted in HotSpot. Testing: mach5 tier1 ------------- Commit messages: - zUtils remove Changes: https://git.openjdk.org/jdk/pull/22667/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22667&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337995 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22667/head:pull/22667 PR: https://git.openjdk.org/jdk/pull/22667 From ysr at openjdk.org Tue Dec 10 18:13:46 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 10 Dec 2024 18:13:46 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References: