From tschatzl at openjdk.org Mon Dec 2 06:35:41 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Dec 2024 06:35:41 GMT Subject: RFR: 8345173: BlockLocationPrinter::print_location misses a ResourceMark In-Reply-To: References: Message-ID: <6CkyYuRw1-i8jedbYK5P_3hbiD8ijvUMvx5k1DIc71g=.2bc72623-c9c7-4b35-8b4e-0fc4d3fa7a14@github.com> On Fri, 29 Nov 2024 13:53:19 GMT, Stefan Johansson wrote: >> Hi all, >> >> please review this small change that adds a missing `ResourceMark` to `BlockLocationPrinter`; it can be called at very arbitrary places (e.g. the stop() method of MacroAssembler), and without this change it might fail with a "Missing ResourceMark error - possible memory leak" error instead of providing the stop() output (and then faiiling). >> >> Testing: local testing, after the change the ResourceMark crash goes away, gha >> >> Thanks, >> Thomas > > Marked as reviewed by sjohanss (Reviewer). Thanks @kstefanj @walulyai for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/22455#issuecomment-2510688071 From tschatzl at openjdk.org Mon Dec 2 06:35:42 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Dec 2024 06:35:42 GMT Subject: Integrated: 8345173: BlockLocationPrinter::print_location misses a ResourceMark In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 09:36:16 GMT, Thomas Schatzl wrote: > Hi all, > > please review this small change that adds a missing `ResourceMark` to `BlockLocationPrinter`; it can be called at very arbitrary places (e.g. the stop() method of MacroAssembler), and without this change it might fail with a "Missing ResourceMark error - possible memory leak" error instead of providing the stop() output (and then faiiling). > > Testing: local testing, after the change the ResourceMark crash goes away, gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: f5ebda43 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/f5ebda43709984214a25e23926860fea2ba5819a Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8345173: BlockLocationPrinter::print_location misses a ResourceMark Reviewed-by: sjohanss, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/22455 From tschatzl at openjdk.org Mon Dec 2 08:57:37 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Dec 2024 08:57:37 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 16:09:15 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: > > - num_allocators => max_allocators > - fix comment typo > - use struct/union instead of constants Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2472094579 From tschatzl at openjdk.org Mon Dec 2 09:31:41 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Dec 2024 09:31:41 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v2] In-Reply-To: References: Message-ID: <9ElRBvXwzJLiC1KpFLPvS7CkGkhhN3QYaynNYM2P1f4=.16f0bb3b-316f-439c-b22f-d28fe3fb7891@github.com> On Wed, 20 Nov 2024 19:23:34 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas Review Sorry for the late reply. Some more comments need update, other than that it seems fine. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 342: > 340: // regions in retained collection set candidates Retained collection set candidates are aged out, ie. > 341: // made to regular old regions without remembered sets after a few attempts to save computation costs > 342: // of keeping them candidates for very long living pinned regions. Suggestion: // The current mechanism for evacuating pinned old regions is as below: // * pinned regions in the marking collection set candidate list (available during mixed gc) are evacuated like // pinned young regions to avoid the complexity of dealing with pinned regions that are part of a // collection group sharing a single cardset. These regions will be partially evacuated and added to the // retained collection set by the evacuation failure handling mechanism. // * evacuating pinned regions out of retained collection set candidates would also just take up time // with no actual space freed in old gen. Better to concentrate on others. So we skip over pinned // regions in retained collection set candidates. Retained collection set candidates are aged out, ie. // made to regular old regions without remembered sets after a few attempts to save computation costs // of keeping them candidates for very long living pinned regions. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 522: > 520: bool fits_in_remaining_time = predicted_time_ms <= time_remaining_ms; > 521: > 522: G1CollectionSetCandidateInfo* ci = group->at(0); // we only have one region in the group Suggestion: G1CollectionSetCandidateInfo* ci = group->at(0); // We only have one region in the group. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 65: > 63: // All regions in the group share a G1CardSet instance, which tracks remembered set entries for the > 64: // regions in the group. We do not have track to cross-region references for regions that are in the > 65: // same group. Suggestion: // G1CSetCandidateGroup groups candidate regions that will be selected for evacuation at the same time. // Grouping occurs both for candidates from marking or regions retained during evacuation failure, but a group // can not contain regions from both types of regions. // // Humongous objects are excluded from the candidate groups because regions associated with these // objects are never selected for evacuation. // // All regions in the group share a G1CardSet instance, which tracks remembered set entries for the // regions in the group. We do not have track to cross-region references for regions that are in the // same group saving memory. src/hotspot/share/gc/g1/g1_globals.hpp line 283: > 281: "The maximum number of old CSet regions in a collection group. " \ > 282: "These will be evacuated in the same GC pause. The first group " \ > 283: "may exceed this limit depending on G1MixedGCCountTarget.") \ Maybe this is better, not sure. We should explain why the "first group" is special. Suggestion: product(uint, G1OldCSetGroupSize, 5, EXPERIMENTAL, \ "The maximum number of old CSet regions in a collection group. " \ "All regions in a group will be evacuated in the same GC pause. The first group calculated after marking from marking candidates " \ "may exceed this limit as it is calculated based on G1MixedGCCountTarget.") \ ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22015#pullrequestreview-2472141691 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1865490259 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1865503030 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1865494143 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1865498063 From tschatzl at openjdk.org Mon Dec 2 10:00:39 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Dec 2024 10:00:39 GMT Subject: RFR: 8345217: Parallel: Refactor PSParallelCompact::next_src_region In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:22:33 GMT, Albert Mingkun Yang wrote: > Simple removing some unnecessary calculations in locating the next source-region during full-gc. > > Test: tier1-5 I think I understood the changes, I added this understanding, hopefully making it easier for other reviewers. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2142: > 2140: // Found the first non-empty region in the same space. > 2141: src_region_idx = sd.region(src_region_ptr); > 2142: closure.set_source(sd.region_to_addr(src_region_idx)); Just to make sure I understand: the only change here is the removal of the condition `src_region_addr > closure.source()` because at worst we can set the same value into `closure._source` anyway, and the additional check is kind of superfluous. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2169: > 2167: src_space_id = SpaceId(space_id); > 2168: src_space_top = top; > 2169: closure.set_source(region_start_addr); The reason for removing the search for the first live word is because all callers will scan the bitmap anyway? ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22441#pullrequestreview-2472240795 PR Review Comment: https://git.openjdk.org/jdk/pull/22441#discussion_r1865554912 PR Review Comment: https://git.openjdk.org/jdk/pull/22441#discussion_r1865555787 From ayang at openjdk.org Mon Dec 2 10:33:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Dec 2024 10:33:43 GMT Subject: RFR: 8345220: Serial: Refactor TenuredGeneration::promotion_attempt_is_safe In-Reply-To: References: Message-ID: <1ZMnG9zpv7e3f-5g098hXWzqZm9V1miTNNld91Xxb5A=.9e2e2a2d-aa1e-4b3f-8d4d-5a61a9d6184a@github.com> On Thu, 28 Nov 2024 15:50:20 GMT, Albert Mingkun Yang wrote: > Trivial using MIN2 to replace `>=` and `||` for better readability. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22444#issuecomment-2511151109 From ayang at openjdk.org Mon Dec 2 10:33:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Dec 2024 10:33:43 GMT Subject: Integrated: 8345220: Serial: Refactor TenuredGeneration::promotion_attempt_is_safe In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:50:20 GMT, Albert Mingkun Yang wrote: > Trivial using MIN2 to replace `>=` and `||` for better readability. This pull request has now been integrated. Changeset: 0b0f83c0 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/0b0f83c01e30587ca2e23b46493bdc7fcb21559f Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod 8345220: Serial: Refactor TenuredGeneration::promotion_attempt_is_safe Reviewed-by: tschatzl, mli ------------- PR: https://git.openjdk.org/jdk/pull/22444 From rkennke at openjdk.org Mon Dec 2 11:15:12 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 2 Dec 2024 11:15:12 GMT Subject: RFR: 8345293: Fix generational Shenandoah with compact headers Message-ID: See bug for crash details. The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. Testing: - [x] hotspot_gc_shenandoah +UCOH ------------- Commit messages: - 8345293: Fix generational Shenandoah with compact headers Changes: https://git.openjdk.org/jdk/pull/22477/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22477&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345293 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22477/head:pull/22477 PR: https://git.openjdk.org/jdk/pull/22477 From aboldtch at openjdk.org Mon Dec 2 11:21:22 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 2 Dec 2024 11:21:22 GMT Subject: RFR: 8344414: ZGC: Another division by zero in rule_major_allocation_rate [v2] In-Reply-To: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: > This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. > > As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. > > There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Always trigger OC, even when old_garbage is 0 - Merge tag 'jdk-24+26' into JDK-8344414 Added tag jdk-24+26 for changeset 8485cb1c - 8344414: ZGC: Another division by zero in rule_major_allocation_rate (ubsan) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22228/files - new: https://git.openjdk.org/jdk/pull/22228/files/3bc3ff4b..98c0acb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22228&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22228&range=00-01 Stats: 54447 lines in 1247 files changed: 27258 ins; 21097 del; 6092 mod Patch: https://git.openjdk.org/jdk/pull/22228.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22228/head:pull/22228 PR: https://git.openjdk.org/jdk/pull/22228 From ayang at openjdk.org Mon Dec 2 11:43:42 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Dec 2024 11:43:42 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 16:09:15 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: > > - num_allocators => max_allocators > - fix comment typo > - use struct/union instead of constants Some minor comments/suggestions. src/hotspot/share/gc/shared/partialArrayState.hpp line 181: > 179: // allocator counters as a single unit for atomic manipulation. > 180: using CounterValues = LP64_ONLY(uint64_t) NOT_LP64(uint32_t); > 181: using Counter = LP64_ONLY(uint32_t) NOT_LP64(uint16_t); Given the max-value has type `uint`, using the larger type on both 32/64 bit systems should be simpler and it should not cause any noticeable perf regression, since registering/releasing allocators should be infrequent. WDYT? src/hotspot/share/gc/shared/partialArrayState.hpp line 189: > 187: // allocators. The counters are atomic to permit concurrent construction, > 188: // and to permit concurrent destruction. They are an atomic unit to detect > 189: // and reject mixing the two phases, without concern for questions of It's nice that this library can detect and reject misuse (such as mixing two phases), but I?m not sure why so much effort was spent preventing this. None of the existing users of the library are expected to mix phases in the near future. Could we instead document that mixing two phases is not permitted, and if someone chooses to do so, they do so at their own risk? ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2472436639 PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1865681792 PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1865687546 From iwalulya at openjdk.org Mon Dec 2 11:56:41 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 2 Dec 2024 11:56:41 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 16:09:15 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: > > - num_allocators => max_allocators > - fix comment typo > - use struct/union instead of constants Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2472494759 From shade at openjdk.org Mon Dec 2 11:58:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Dec 2024 11:58:39 GMT Subject: RFR: 8345293: Fix generational Shenandoah with compact headers In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 11:09:37 GMT, Roman Kennke wrote: > See bug for crash details. > > The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. > > The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. > > Testing: > - [x] hotspot_gc_shenandoah +UCOH Looks good. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22477#pullrequestreview-2472499004 From iwalulya at openjdk.org Mon Dec 2 12:06:31 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 2 Dec 2024 12:06:31 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v3] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/gc/g1/g1_globals.hpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/4aa4d6b2..fbff7d78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=01-02 Stats: 13 lines in 3 files changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From stuefe at openjdk.org Mon Dec 2 13:52:37 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 2 Dec 2024 13:52:37 GMT Subject: RFR: 8345293: Fix generational Shenandoah with compact headers In-Reply-To: References: Message-ID: <4oavGU_b5J322sriy37B9AC_zoIC6mvfADPFSRpcDYs=.24381e90-ebd6-47ff-9ee7-5e098f9c1aab@github.com> On Mon, 2 Dec 2024 11:09:37 GMT, Roman Kennke wrote: > See bug for crash details. > > The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. > > The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. > > Testing: > - [x] hotspot_gc_shenandoah +UCOH Good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22477#pullrequestreview-2472754459 From ysr at openjdk.org Mon Dec 2 15:47:42 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 2 Dec 2024 15:47:42 GMT Subject: RFR: 8345293: Fix generational Shenandoah with compact headers In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 11:09:37 GMT, Roman Kennke wrote: > See bug for crash details. > > The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. > > The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. > > Testing: > - [x] hotspot_gc_shenandoah +UCOH Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22477#pullrequestreview-2473077721 From kbarrett at openjdk.org Mon Dec 2 16:01:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 2 Dec 2024 16:01:43 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 11:23:53 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: >> >> - num_allocators => max_allocators >> - fix comment typo >> - use struct/union instead of constants > > src/hotspot/share/gc/shared/partialArrayState.hpp line 181: > >> 179: // allocator counters as a single unit for atomic manipulation. >> 180: using CounterValues = LP64_ONLY(uint64_t) NOT_LP64(uint32_t); >> 181: using Counter = LP64_ONLY(uint32_t) NOT_LP64(uint16_t); > > Given the max-value has type `uint`, using the larger type on both 32/64 bit systems should be simpler and it should not cause any noticeable perf regression, since registering/releasing allocators should be infrequent. WDYT? I assumed 16bits of worker threads was quite sufficient for a 32bit platform. And I misremembered and thought 32bit platforms couldn't be relied upon for a 64bit atomic add and maybe other 64bit operations. And this code is definitely not super performance critical. So yeah, I could drop the platform-conditional definition of Counter. I don't think it makes much difference to the code. I guess the type aliases could be dropped and just use bare uint32/64_t. Not sure that's actually an improvement. > src/hotspot/share/gc/shared/partialArrayState.hpp line 189: > >> 187: // allocators. The counters are atomic to permit concurrent construction, >> 188: // and to permit concurrent destruction. They are an atomic unit to detect >> 189: // and reject mixing the two phases, without concern for questions of > > It's nice that this library can detect and reject misuse (such as mixing two phases), but I?m not sure why so much effort was spent preventing this. None of the existing users of the library are expected to mix phases in the near future. Could we instead document that mixing two phases is not permitted, and if someone chooses to do so, they do so at their own risk? So far, only 2 of the nearly a dozen(?) potential clients are using this. I'm not sure that none of them are going to have workers that do some of their setup after being started. Hence the desire to support concurrency. And if that, then I feel better about it if there's some usage validation. But maybe it would be better to just throw a lock at the problem. And if it turns out none of the use-cases end up needing that concurrency, then I won't object to a little bit of code simplification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1866111712 PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1866111801 From eosterlund at openjdk.org Mon Dec 2 21:29:40 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Dec 2024 21:29:40 GMT Subject: RFR: 8344414: ZGC: Another division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: On Mon, 2 Dec 2024 11:21:22 GMT, Axel Boldt-Christmas wrote: >> This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. >> >> As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. >> >> There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Always trigger OC, even when old_garbage is 0 > - Merge tag 'jdk-24+26' into JDK-8344414 > > Added tag jdk-24+26 for changeset 8485cb1c > - 8344414: ZGC: Another division by zero in rule_major_allocation_rate (ubsan) Even better. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22228#pullrequestreview-2473964964 From wkemper at openjdk.org Mon Dec 2 22:51:50 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 2 Dec 2024 22:51:50 GMT Subject: RFR: 8345346: Shenandoah: Description of ShenandoahGCMode still refers to incremental update mode Message-ID: The incremental update mode has been removed and is no longer supported. ------------- Commit messages: - Remove reference to incremental update mode Changes: https://git.openjdk.org/jdk/pull/22502/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22502&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345346 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22502.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22502/head:pull/22502 PR: https://git.openjdk.org/jdk/pull/22502 From ysr at openjdk.org Mon Dec 2 22:57:45 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 2 Dec 2024 22:57:45 GMT Subject: RFR: 8345346: Shenandoah: Description of ShenandoahGCMode still refers to incremental update mode In-Reply-To: References: Message-ID: <3YqrRBP2ACAzhlHZRXaKMmB-awPFRuiHXyDs66fpQOw=.37cfe055-529b-4f4b-b4d3-ad0c98ce0926@github.com> On Mon, 2 Dec 2024 22:46:25 GMT, William Kemper wrote: > The incremental update mode has been removed and is no longer supported. Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22502#pullrequestreview-2474111019 From wkemper at openjdk.org Mon Dec 2 22:57:46 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 2 Dec 2024 22:57:46 GMT Subject: Integrated: 8345346: Shenandoah: Description of ShenandoahGCMode still refers to incremental update mode In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 22:46:25 GMT, William Kemper wrote: > The incremental update mode has been removed and is no longer supported. This pull request has now been integrated. Changeset: 1997e89d Author: William Kemper URL: https://git.openjdk.org/jdk/commit/1997e89ddf9fba7c6eea6c96bd0b5426576d4460 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8345346: Shenandoah: Description of ShenandoahGCMode still refers to incremental update mode Reviewed-by: ysr ------------- PR: https://git.openjdk.org/jdk/pull/22502 From ysr at openjdk.org Tue Dec 3 02:47:08 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 3 Dec 2024 02:47:08 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks Message-ID: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Fix documentation comment. I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. ------------- Commit messages: - Fix up documentation comment. Changes: https://git.openjdk.org/jdk/pull/22507/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344593 Stats: 10 lines in 1 file changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22507/head:pull/22507 PR: https://git.openjdk.org/jdk/pull/22507 From tschatzl at openjdk.org Tue Dec 3 09:01:39 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 3 Dec 2024 09:01:39 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v3] In-Reply-To: References: Message-ID: <2pLpq6Qd4lHfxQeSEvIIZZGlejS8qn9boGzO6s5MoXU=.bad59007-1f23-4d77-95a9-2f2a0bfc29aa@github.com> On Mon, 2 Dec 2024 12:06:31 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with four additional commits since the last revision: > > - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1_globals.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22015#pullrequestreview-2474979494 From ayang at openjdk.org Tue Dec 3 09:24:40 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 3 Dec 2024 09:24:40 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 15:58:50 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/shared/partialArrayState.hpp line 181: >> >>> 179: // allocator counters as a single unit for atomic manipulation. >>> 180: using CounterValues = LP64_ONLY(uint64_t) NOT_LP64(uint32_t); >>> 181: using Counter = LP64_ONLY(uint32_t) NOT_LP64(uint16_t); >> >> Given the max-value has type `uint`, using the larger type on both 32/64 bit systems should be simpler and it should not cause any noticeable perf regression, since registering/releasing allocators should be infrequent. WDYT? > > I assumed 16bits of worker threads was quite sufficient for a 32bit platform. > And I misremembered and thought 32bit platforms couldn't be relied upon for a > 64bit atomic add and maybe other 64bit operations. And this code is definitely > not super performance critical. So yeah, I could drop the platform-conditional > definition of Counter. I don't think it makes much difference to the code. I > guess the type aliases could be dropped and just use bare uint32/64_t. Not > sure that's actually an improvement. I think unifying 32 and 64 bit system is an improvement -- being able to reason with concrete types. As for type aliases, it's rather subjective; I find `uint*_t` more familiar, but up to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1867335846 From stefank at openjdk.org Tue Dec 3 09:50:39 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 3 Dec 2024 09:50:39 GMT Subject: RFR: 8344414: ZGC: Another division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: On Mon, 2 Dec 2024 11:21:22 GMT, Axel Boldt-Christmas wrote: >> This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. >> >> As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. >> >> There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Always trigger OC, even when old_garbage is 0 > - Merge tag 'jdk-24+26' into JDK-8344414 > > Added tag jdk-24+26 for changeset 8485cb1c > - 8344414: ZGC: Another division by zero in rule_major_allocation_rate (ubsan) Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22228#pullrequestreview-2475122868 From aboldtch at openjdk.org Tue Dec 3 10:45:45 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 3 Dec 2024 10:45:45 GMT Subject: RFR: 8344414: ZGC: Another division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: On Mon, 2 Dec 2024 11:21:22 GMT, Axel Boldt-Christmas wrote: >> This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. >> >> As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. >> >> There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Always trigger OC, even when old_garbage is 0 > - Merge tag 'jdk-24+26' into JDK-8344414 > > Added tag jdk-24+26 for changeset 8485cb1c > - 8344414: ZGC: Another division by zero in rule_major_allocation_rate (ubsan) Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22228#issuecomment-2514181776 From aboldtch at openjdk.org Tue Dec 3 10:45:46 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 3 Dec 2024 10:45:46 GMT Subject: Integrated: 8344414: ZGC: Another division by zero in rule_major_allocation_rate In-Reply-To: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: On Tue, 19 Nov 2024 07:18:20 GMT, Axel Boldt-Christmas wrote: > This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. > > As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. > > There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. This pull request has now been integrated. Changeset: 63af2f42 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/63af2f42b7abe9504897d7c3f3b4cc0b57123694 Stats: 28 lines in 3 files changed: 11 ins; 1 del; 16 mod 8344414: ZGC: Another division by zero in rule_major_allocation_rate Reviewed-by: eosterlund, stefank ------------- PR: https://git.openjdk.org/jdk/pull/22228 From ayang at openjdk.org Tue Dec 3 13:55:49 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 3 Dec 2024 13:55:49 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v3] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 12:06:31 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with four additional commits since the last revision: > > - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1_globals.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> src/hotspot/share/gc/g1/g1CardSet.cpp line 788: > 786: G1HeapRegion* r = G1CollectedHeap::heap()->region_at(region_idx); > 787: assert(r->rem_set()->card_set() != this, "must be"); > 788: #endif Since this introduces local vars, can they be grouped in a `{}` scope? src/hotspot/share/gc/g1/g1CollectionSet.cpp line 358: > 356: } > 357: > 358: uint num_optional_regions = _optional_groups.num_regions(); Seems unused. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 516: > 514: > 515: for (G1CSetCandidateGroup* group : *retained_groups) { > 516: assert(group->length() == 1, "Retained groups should have only 1 region"); Should this property be documented in where `_retain_groups` is defined, if it is an invariant? src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 320: > 318: } > 319: > 320: _from_marking_groups.append(current); I wonder if this part can be written somehow to eliminate some "duplicate" code, so that the following occur only once. _from_marking_groups.append(current); current = new G1CSetCandidateGroup(G1CollectedHeap::heap()->card_set_config()); num_added_to_group = 0; src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 74: > 72: G1CardSet _card_set; > 73: > 74: // Missing comment? src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 113: > 111: void clear(); > 112: > 113: void abandon(); It's not obvious how the two APIs differ, and which one to use in a certain scenario. Some docs would be nice. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 172: > 170: }; > 171: > 172: // Tracks all collection set candidates, i.e. regions that could/should be evacuated soon. Seems outdated now that fields are group list. src/hotspot/share/gc/g1/g1_globals.hpp line 285: > 283: "may exceed this limit as it is calculated based on G1MixedGCCountTarget.") \ > 284: range(1, 256) \ > 285: \ Better wrap text to align ``. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867639848 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867744730 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867758198 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867699763 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867657960 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867718224 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867760746 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1867636495 From kbarrett at openjdk.org Tue Dec 3 13:55:59 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 13:55:59 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: Message-ID: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: remove phase invariant checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22287/files - new: https://git.openjdk.org/jdk/pull/22287/files/f1a1be24..2eb1814e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=02-03 Stats: 67 lines in 2 files changed: 1 ins; 50 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/22287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22287/head:pull/22287 PR: https://git.openjdk.org/jdk/pull/22287 From kbarrett at openjdk.org Tue Dec 3 13:56:00 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 13:56:00 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References: Message-ID: <_dQ3A6UZTdGWF9cEaFFGL5z7FfxAX1C7bOhJ4grX3ro=.b3a1e3d7-7ee2-4c26-992c-aae217aa97be@github.com> On Mon, 2 Dec 2024 11:40:44 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: >> >> - num_allocators => max_allocators >> - fix comment typo >> - use struct/union instead of constants > > Some minor comments/suggestions. Since y'all (especially @albertnetymk ) seem to really dislike the phase checking, here's a version without. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22287#issuecomment-2514620559 From ayang at openjdk.org Tue Dec 3 13:59:42 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 3 Dec 2024 13:59:42 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 13:55:59 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove phase invariant checks Thank you for the simplification. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2475755016 From tschatzl at openjdk.org Tue Dec 3 14:21:39 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 3 Dec 2024 14:21:39 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 13:55:59 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove phase invariant checks still good. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2475825896 From zgu at openjdk.org Tue Dec 3 14:25:47 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 3 Dec 2024 14:25:47 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 13:55:59 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove phase invariant checks src/hotspot/share/gc/shared/partialArrayState.cpp line 110: > 108: _max_allocators(max_allocators), > 109: _registered_allocators(0), > 110: _released_allocators(0) `_released_allocators` is a debug only variable, should fail in release build ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1867815956 From kbarrett at openjdk.org Tue Dec 3 14:43:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 14:43:43 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 14:23:14 GMT, Zhengyu Gu wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> remove phase invariant checks > > src/hotspot/share/gc/shared/partialArrayState.cpp line 110: > >> 108: _max_allocators(max_allocators), >> 109: _registered_allocators(0), >> 110: _released_allocators(0) > > `_released_allocators` is a debug only variable, should fail in release build Well spotted. It seems I haven't done a release build since my final touch-up to make it debug-only. I'll push an update once it's been through our CI. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1867847941 From zgu at openjdk.org Tue Dec 3 15:17:41 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 3 Dec 2024 15:17:41 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 13:55:59 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove phase invariant checks src/hotspot/share/gc/shared/partialArrayState.cpp line 96: > 94: > 95: void PartialArrayStateAllocator::release(PartialArrayState* state) { > 96: size_t refcount = Atomic::sub(&state->_refcount, size_t(1), memory_order_release); Could you explain why `release` order is needed here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1867917303 From kbarrett at openjdk.org Tue Dec 3 15:53:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 15:53:43 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 15:15:10 GMT, Zhengyu Gu wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> remove phase invariant checks > > src/hotspot/share/gc/shared/partialArrayState.cpp line 96: > >> 94: >> 95: void PartialArrayStateAllocator::release(PartialArrayState* state) { >> 96: size_t refcount = Atomic::sub(&state->_refcount, size_t(1), memory_order_release); > > Could you explain why `release` order is needed here? This is part of the usual reference counting dance. Except, where did the acquire disappear to? There should be an acquire on the refcount == 0 branch! Looks like I accidentally deleted it. Sigh. Not too surprisingly, lots of tests were run without noticing that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1867980602 From kbarrett at openjdk.org Tue Dec 3 16:15:10 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 16:15:10 GMT Subject: RFR: 8345397: Remove from g1HeapRegionRemSet.cpp Message-ID: Please review this trivial removal of an unnecessary and improperly placed include of ``. Testing: mach5 tier1 ------------- Commit messages: - remove unneeded include Changes: https://git.openjdk.org/jdk/pull/22519/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22519&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345397 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22519/head:pull/22519 PR: https://git.openjdk.org/jdk/pull/22519 From shade at openjdk.org Tue Dec 3 16:16:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Dec 2024 16:16:40 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: On Tue, 3 Dec 2024 02:41:26 GMT, Y. Srinivas Ramakrishna wrote: > Fix documentation comment. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. I still don't quite understand if we need to implement `CardTableBarrierSet::on_slowpath_allocation_exit`. I see `SharedRuntime::on_slowpath_allocation_exit` is called from different places in VM. Are those really subsumed by safepoints? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22507#issuecomment-2514996606 From shade at openjdk.org Tue Dec 3 16:23:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Dec 2024 16:23:38 GMT Subject: RFR: 8345397: Remove from g1HeapRegionRemSet.cpp In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 16:09:29 GMT, Kim Barrett wrote: > Please review this trivial removal of an unnecessary and improperly placed > include of ``. > > Testing: mach5 tier1 Good and trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22519#pullrequestreview-2476182562 From kbarrett at openjdk.org Tue Dec 3 16:33:48 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 16:33:48 GMT Subject: RFR: 8345397: Remove from g1HeapRegionRemSet.cpp In-Reply-To: References: Message-ID: <-6E7jPhk_wnalVUpCu-CP0wRGQdcPwjSePJTbFpnC9c=.830dc03e-7a3c-43ca-8e99-66be2b81797d@github.com> On Tue, 3 Dec 2024 16:21:01 GMT, Aleksey Shipilev wrote: >> Please review this trivial removal of an unnecessary and improperly placed >> include of ``. >> >> Testing: mach5 tier1 > > Good and trivial. Thanks for reviewing @shipilev ------------- PR Comment: https://git.openjdk.org/jdk/pull/22519#issuecomment-2515033509 From kbarrett at openjdk.org Tue Dec 3 16:33:49 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 16:33:49 GMT Subject: Integrated: 8345397: Remove from g1HeapRegionRemSet.cpp In-Reply-To: References: Message-ID: <_phjJjMuPSpvy_JJabyDGhqeAY__6Cd75D_-KwWsAbg=.48f6982f-34a8-4ad2-824e-192eef0e1865@github.com> On Tue, 3 Dec 2024 16:09:29 GMT, Kim Barrett wrote: > Please review this trivial removal of an unnecessary and improperly placed > include of ``. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: e1910f2d Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/e1910f2d19fce5cc78058154c7ddaaa8718973dc Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8345397: Remove from g1HeapRegionRemSet.cpp Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/22519 From rkennke at openjdk.org Tue Dec 3 16:48:42 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 3 Dec 2024 16:48:42 GMT Subject: RFR: 8345293: Fix generational Shenandoah with compact headers In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 11:09:37 GMT, Roman Kennke wrote: > See bug for crash details. > > The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. > > The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. > > Testing: > - [x] hotspot_gc_shenandoah +UCOH Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22477#issuecomment-2515074738 From rkennke at openjdk.org Tue Dec 3 16:48:42 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 3 Dec 2024 16:48:42 GMT Subject: Integrated: 8345293: Fix generational Shenandoah with compact headers In-Reply-To: References: Message-ID: <-DdF0QuKZhADfsHN75TUl4hsiMfUva11bgDbxSGUpe8=.d31cc97b-d3cb-4cb6-9e3d-6f4bdec92f01@github.com> On Mon, 2 Dec 2024 11:09:37 GMT, Roman Kennke wrote: > See bug for crash details. > > The problem is in the code that gets the object age out of the mark-word. That code has a special cases for when an object is monitor locked, in which case it fetches the displaced header out of the monitor and extracts the age from there. However, with compact headers, we're running with ObjectMonitorTable, and decoding the monitor-locked mark-word crashes. > > The fix is simple: when we are running with ObjectMonitorTable, the mark-word never gets overloaded by locking, so we can return the age straight out of the mark-word. > > Testing: > - [x] hotspot_gc_shenandoah +UCOH This pull request has now been integrated. Changeset: e9f6ba05 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/e9f6ba05264ecb2f1ca3983ea503778f301bf280 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8345293: Fix generational Shenandoah with compact headers Reviewed-by: shade, stuefe, ysr ------------- PR: https://git.openjdk.org/jdk/pull/22477 From ysr at openjdk.org Tue Dec 3 17:28:41 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 3 Dec 2024 17:28:41 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks In-Reply-To: References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: On Tue, 3 Dec 2024 16:13:49 GMT, Aleksey Shipilev wrote: > I still don't quite understand if we need to implement `CardTableBarrierSet::on_slowpath_allocation_exit`. I see `SharedRuntime::on_slowpath_allocation_exit` is called from different places in VM. Are those really subsumed by safepoints? Slow path allocations in GenShen also happen only in young regions, never directly in the old generation, and do not need card-marks. I was hoping to convey that in the comment. Please let me know if I misunderstood your concern, and am missing a different mechanism through which initializing writes may happen in the old generation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22507#issuecomment-2515165468 From iwalulya at openjdk.org Tue Dec 3 19:56:23 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 3 Dec 2024 19:56:23 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v4] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Albert Review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/fbff7d78..e687b0cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=02-03 Stats: 71 lines in 4 files changed: 23 ins; 24 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From iwalulya at openjdk.org Tue Dec 3 19:59:50 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 3 Dec 2024 19:59:50 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v3] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 12:29:01 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request incrementally with four additional commits since the last revision: >> >> - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> >> - Update src/hotspot/share/gc/g1/g1_globals.hpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> >> - Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> >> - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp >> >> Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > > src/hotspot/share/gc/g1/g1CardSet.cpp line 788: > >> 786: G1HeapRegion* r = G1CollectedHeap::heap()->region_at(region_idx); >> 787: assert(r->rem_set()->card_set() != this, "must be"); >> 788: #endif > > Since this introduces local vars, can they be grouped in a `{}` scope? It's possible, but I have not seen this done in the hotspot code. > src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 320: > >> 318: } >> 319: >> 320: _from_marking_groups.append(current); > > I wonder if this part can be written somehow to eliminate some "duplicate" code, so that the following occur only once. > > > _from_marking_groups.append(current); > current = new G1CSetCandidateGroup(G1CollectedHeap::heap()->card_set_config()); > num_added_to_group = 0; Suggestions are welcome, I failed to find a way to handle the corner case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1868289268 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1868289641 From ysr at openjdk.org Tue Dec 3 21:02:37 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 3 Dec 2024 21:02:37 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: <-kdMFM2kLRG0GOt9nD3Y4IR0mGBlVuAxM3vNnRI-R8U=.9a7400c2-94b8-45f4-a0d6-f940f30bc9f5@github.com> On Tue, 3 Dec 2024 02:41:26 GMT, Y. Srinivas Ramakrishna wrote: > Fix documentation comment. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. I chatted w/Aleksey and will take a slightly more conservative and future-proof approach to this. Withdrawing this PR to draft until I have made those changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22507#issuecomment-2515540383 From kbarrett at openjdk.org Tue Dec 3 21:51:01 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 21:51:01 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v5] In-Reply-To: References: Message-ID: > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - rescue lost acquire - Merge branch 'master' into new-pas-alloc - fix ref to debug-only member - remove phase invariant checks - num_allocators => max_allocators - fix comment typo - use struct/union instead of constants - simplify pas allocator destruction and manager phase tracking - parallel youngen uses new PAS - g1 uses refactored PAS - ... and 1 more: https://git.openjdk.org/jdk/compare/b0801928...5716bb5a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22287/files - new: https://git.openjdk.org/jdk/pull/22287/files/2eb1814e..5716bb5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=03-04 Stats: 116618 lines in 1650 files changed: 82592 ins; 25661 del; 8365 mod Patch: https://git.openjdk.org/jdk/pull/22287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22287/head:pull/22287 PR: https://git.openjdk.org/jdk/pull/22287 From kbarrett at openjdk.org Tue Dec 3 21:56:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Dec 2024 21:56:52 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v6] In-Reply-To: References: Message-ID: > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: - revert removal of orderAccess include - remove unused include of checkedCast.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22287/files - new: https://git.openjdk.org/jdk/pull/22287/files/5716bb5a..4fc0b5dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22287/head:pull/22287 PR: https://git.openjdk.org/jdk/pull/22287 From wkemper at openjdk.org Tue Dec 3 22:24:17 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 3 Dec 2024 22:24:17 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v10] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge jdk/master - Use count of regions uncommitted to compute uncommit delta - Decouple polling interval from uncommit time out - Log uncommitted delta and capacity - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread - Restore logging format, show change in committed heap, rather than usage - Allow commits initially - Use idiomatic name for CADR class - Improve comments - Do not notify uncommit thread when uncommit is forbidden - ... and 11 more: https://git.openjdk.org/jdk/compare/05ee562a...e70d874e ------------- Changes: https://git.openjdk.org/jdk/pull/22019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=09 Stats: 514 lines in 9 files changed: 387 ins; 94 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From ysr at openjdk.org Wed Dec 4 01:29:24 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Dec 2024 01:29:24 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v2] In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: > Fix documentation comment, and add an assertion check upon slowpath allocation. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. Y. Srinivas Ramakrishna has updated the pull request incrementally with three additional commits since the last revision: - virtual -> override missed in previous delta. Fix zero build (ReduceInitialCardMarks is defined only in JVMCI/Compiler2) - virtual -> override in derived class ShenandoahBarrierSet. - Refine previous change and future-proof ReduceInitialCardMarks for GenShen. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22507/files - new: https://git.openjdk.org/jdk/pull/22507/files/3bcd441f..23b8103d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=00-01 Stats: 19 lines in 3 files changed: 13 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22507/head:pull/22507 PR: https://git.openjdk.org/jdk/pull/22507 From ysr at openjdk.org Wed Dec 4 01:35:41 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Dec 2024 01:35:41 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks In-Reply-To: <-kdMFM2kLRG0GOt9nD3Y4IR0mGBlVuAxM3vNnRI-R8U=.9a7400c2-94b8-45f4-a0d6-f940f30bc9f5@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> <-kdMFM2kLRG0GOt9nD3Y4IR0mGBlVuAxM3vNnRI-R8U=.9a7400c2-94b8-45f4-a0d6-f940f30bc9f5@github.com> Message-ID: On Tue, 3 Dec 2024 20:59:56 GMT, Y. Srinivas Ramakrishna wrote: > I chatted w/Aleksey and will take a slightly more conservative and future-proof approach to this. Withdrawing this PR to draft until I have made those changes. Made a few changes; testing is in progress but PR is open again for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22507#issuecomment-2515969743 From cslucas at openjdk.org Wed Dec 4 02:07:37 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 4 Dec 2024 02:07:37 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v2] In-Reply-To: References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: On Wed, 4 Dec 2024 01:29:24 GMT, Y. Srinivas Ramakrishna wrote: >> Fix documentation comment, and add an assertion check upon slowpath allocation. >> >> I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. > > Y. Srinivas Ramakrishna has updated the pull request incrementally with three additional commits since the last revision: > > - virtual -> override missed in previous delta. > Fix zero build (ReduceInitialCardMarks is defined only in > JVMCI/Compiler2) > - virtual -> override in derived class ShenandoahBarrierSet. > - Refine previous change and future-proof ReduceInitialCardMarks for > GenShen. LGTM ------------- Marked as reviewed by cslucas (Author). PR Review: https://git.openjdk.org/jdk/pull/22507#pullrequestreview-2477104700 From shade at openjdk.org Wed Dec 4 09:51:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 09:51:40 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v2] In-Reply-To: References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: <_aZAH2XY5f1s57YkntosDO02T6OIyfk1CsK1BGbvRns=.887cb130-66c5-42ea-a631-b71136dff7f2@github.com> On Wed, 4 Dec 2024 01:29:24 GMT, Y. Srinivas Ramakrishna wrote: >> Fix documentation comment, and add an assertion check upon slowpath allocation. >> >> I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. > > Y. Srinivas Ramakrishna has updated the pull request incrementally with three additional commits since the last revision: > > - virtual -> override missed in previous delta. > Fix zero build (ReduceInitialCardMarks is defined only in > JVMCI/Compiler2) > - virtual -> override in derived class ShenandoahBarrierSet. > - Refine previous change and future-proof ReduceInitialCardMarks for > GenShen. Yes, good. Let's see if we catch any failure with this assert. src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp line 93: > 91: void ShenandoahBarrierSet::on_slowpath_allocation_exit(JavaThread* thread, oop new_obj) { > 92: #if COMPILER2_OR_JVMCI > 93: assert(!(ReduceInitialCardMarks && ShenandoahCardBarrier) || ShenandoahGenerationalHeap::heap()->is_in_young(new_obj), Not sure why first two are grouped, looks more understandable if written like this? Your call. Suggestion: assert(!ReduceInitialCardMarks || !ShenandoahCardBarrier || ShenandoahGenerationalHeap::heap()->is_in_young(new_obj), ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22507#pullrequestreview-2477889118 PR Review Comment: https://git.openjdk.org/jdk/pull/22507#discussion_r1869089707 From shade at openjdk.org Wed Dec 4 11:17:43 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 11:17:43 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v10] In-Reply-To: References: Message-ID: <1XixmNqCS5lLRkkel0t8O9bDHJ7itL2zOy968aNNFsk=.742dca37-11b2-4338-8686-fabbc4ffc5c2@github.com> On Tue, 3 Dec 2024 22:24:17 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge jdk/master > - Use count of regions uncommitted to compute uncommit delta > - Decouple polling interval from uncommit time out > - Log uncommitted delta and capacity > - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread > - Restore logging format, show change in committed heap, rather than usage > - Allow commits initially > - Use idiomatic name for CADR class > - Improve comments > - Do not notify uncommit thread when uncommit is forbidden > - ... and 11 more: https://git.openjdk.org/jdk/compare/05ee562a...e70d874e Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2478196184 From zgu at openjdk.org Wed Dec 4 13:43:40 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 4 Dec 2024 13:43:40 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Tue, 3 Dec 2024 15:51:19 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/shared/partialArrayState.cpp line 96: >> >>> 94: >>> 95: void PartialArrayStateAllocator::release(PartialArrayState* state) { >>> 96: size_t refcount = Atomic::sub(&state->_refcount, size_t(1), memory_order_release); >> >> Could you explain why `release` order is needed here? > > This is part of the usual reference counting dance. Except, where did the > acquire disappear to? There should be an acquire on the refcount == 0 branch! > Looks like I accidentally deleted it. Sigh. Not too surprisingly, lots of > tests were run without noticing that. Make sense. My next question is that, if `PartialArrayState` is ever crossed thread boundaries, it is through job stealing via task queues. Can we depend on barriers of task queue to ensure the memory safety? because we don't need any additional barriers for objects popped/stole from task queues in other places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1869520669 From kbarrett at openjdk.org Wed Dec 4 15:19:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 4 Dec 2024 15:19:42 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Wed, 4 Dec 2024 13:40:55 GMT, Zhengyu Gu wrote: >> This is part of the usual reference counting dance. Except, where did the >> acquire disappear to? There should be an acquire on the refcount == 0 branch! >> Looks like I accidentally deleted it. Sigh. Not too surprisingly, lots of >> tests were run without noticing that. > > Make sense. My next question is that, if `PartialArrayState` is ever crossed thread boundaries, it is through job stealing via task queues. Can we depend on barriers of task queue to ensure the memory safety? because we don't need any additional barriers for objects popped/stole from task queues in other places. Yes, it's needed. The purpose of this release/acquire pair is to ensure there is a happens-before chain between the use of a State and it's cleanup/reuse. Transfers through the taskqueue don't help with that. In other places, we have operations on one side of the taskqueue that need to be ordered wrto operations on the other side of the taskqueue. We don't have that here. Consider two threads which have both obtained access to a State. (At least one of them must have obtained it via stealing from another thread.) Assume these are the last two references to the State (it's refcount == 2), and no further tasks for it will be needed. These two threads will use the State (getting source/destination, claiming a chunk), and then release the State, decrementing its refcount. So one of them will decrement to 0. We need to ensure the cleanup that follows that can't corrupt the use by the accesses made by the other thread, by ensuring those accesses happen-before the cleanup. There is no intervening taskqueue manipulation in this scenario. The operations we need to order are all on the same side of the taskqueue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1869759960 From xpeng at openjdk.org Wed Dec 4 16:04:06 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 16:04:06 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup Message-ID: Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: TIP: [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms Parallelized: [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` ### Additional test - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah ------------- Commit messages: - Remove _trash_regions - Completely remove heap lock from recycling trashed regions - Void reordering - Rename recycling to _recycling - Remove comments - Parallelize concurrent cleanup and make recycling trashed regions mostly lock-free Changes: https://git.openjdk.org/jdk/pull/22538/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345423 Stats: 211 lines in 13 files changed: 83 ins; 55 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From ysr at openjdk.org Wed Dec 4 17:54:44 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Dec 2024 17:54:44 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v2] In-Reply-To: <_aZAH2XY5f1s57YkntosDO02T6OIyfk1CsK1BGbvRns=.887cb130-66c5-42ea-a631-b71136dff7f2@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> <_aZAH2XY5f1s57YkntosDO02T6OIyfk1CsK1BGbvRns=.887cb130-66c5-42ea-a631-b71136dff7f2@github.com> Message-ID: On Wed, 4 Dec 2024 09:48:20 GMT, Aleksey Shipilev wrote: >> Y. Srinivas Ramakrishna has updated the pull request incrementally with three additional commits since the last revision: >> >> - virtual -> override missed in previous delta. >> Fix zero build (ReduceInitialCardMarks is defined only in >> JVMCI/Compiler2) >> - virtual -> override in derived class ShenandoahBarrierSet. >> - Refine previous change and future-proof ReduceInitialCardMarks for >> GenShen. > > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp line 93: > >> 91: void ShenandoahBarrierSet::on_slowpath_allocation_exit(JavaThread* thread, oop new_obj) { >> 92: #if COMPILER2_OR_JVMCI >> 93: assert(!(ReduceInitialCardMarks && ShenandoahCardBarrier) || ShenandoahGenerationalHeap::heap()->is_in_young(new_obj), > > Not sure why first two are grouped, looks more understandable if written like this? Your call. > > Suggestion: > > assert(!ReduceInitialCardMarks || !ShenandoahCardBarrier || ShenandoahGenerationalHeap::heap()->is_in_young(new_obj), The deMorganization of the first disjunct does make it easier to read as you state. My reason to write it in its first form was because I tend to think of `(not A) or B` as `A implies B` (and written in the first form because C lacks an `implies` operator). I'll rewrite as you suggest before I push this. Thanks for the review! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22507#discussion_r1870023768 From xpeng at openjdk.org Wed Dec 4 19:06:55 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 19:06:55 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v2] In-Reply-To: References: Message-ID: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into parallel-cleanup - Remove _trash_regions - Completely remove heap lock from recycling trashed regions - Void reordering - Rename recycling to _recycling - Remove comments - Parallelize concurrent cleanup and make recycling trashed regions mostly lock-free ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/f3b8dff4..7f7b370a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=00-01 Stats: 10324 lines in 396 files changed: 5350 ins; 3376 del; 1598 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From wkemper at openjdk.org Wed Dec 4 19:06:58 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Dec 2024 19:06:58 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v2] In-Reply-To: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> References: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> Message-ID: On Wed, 4 Dec 2024 19:02:21 GMT, Xiaolong Peng wrote: >> Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> ### Additional test >> - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into parallel-cleanup > - Remove _trash_regions > - Completely remove heap lock from recycling trashed regions > - Void reordering > - Rename recycling to _recycling > - Remove comments > - Parallelize concurrent cleanup and make recycling trashed regions mostly lock-free Changes look good. Left some nits. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1049: > 1047: > 1048: void ShenandoahConcurrentGC::op_cleanup_early() { > 1049: ShenandoahWorkerScope scope(ShenandoahHeap::heap()->workers(), Can we align these arguments with the first argument after the `(`? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1262: > 1260: public: > 1261: ShenandoahRecycleTrashedRegionTask() : > 1262: WorkerTask("Shenandoah Recycle trashed region.") {} Should be "Shenandoah Recycle Trashed Regions" (no period, title case). src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1265: > 1263: > 1264: void work(uint worker_id) { > 1265: const ShenandoahHeap* heap = ShenandoahHeap::heap(); `heap` looks unused here. src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 890: > 888: size_t ShenandoahGeneration::decrement_affiliated_region_count() { > 889: // Assertions only hold true for Java threads since they call this method under heap lock. > 890: bool const is_java_thread = Thread::current()->is_Java_thread(); Prefer not to check `Thread::current` to gate assertions. Could we use an `#ifdef ASSERT` block here? Could this be `decrease_affiliated_region_count(1)` instead? or should we have a separate `decrement_under_lock` method? src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 592: > 590: shenandoah_assert_heaplocked(); > 591: if (is_trash() && _recycling.try_set()) { > 592: if (is_trash()) { Is it necessary to check `is_trash` a second time while the heap lock is held? Also, if it _is_ necessary, then it seems like we should `_recycling.unset` in the scope where `_recycling.try_set` happened. As it is, if the second check for `is_trash` was `false`, then the code would not `_recycling.unset`. ------------- Changes requested by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2479579023 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870112931 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870083766 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870085143 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870092090 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870100238 From zgu at openjdk.org Wed Dec 4 19:56:39 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 4 Dec 2024 19:56:39 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v6] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 21:56:52 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - revert removal of orderAccess include > - remove unused include of checkedCast.hpp LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2479769682 From zgu at openjdk.org Wed Dec 4 19:56:40 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 4 Dec 2024 19:56:40 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v4] In-Reply-To: References: <-VaAIurh8eek7fcs5zlHpHZ7PZw64jaC2EU9PgYw6kA=.92fb8806-75d0-4b76-854e-7521662dc32c@github.com> Message-ID: On Wed, 4 Dec 2024 15:16:30 GMT, Kim Barrett wrote: >> Make sense. My next question is that, if `PartialArrayState` is ever crossed thread boundaries, it is through job stealing via task queues. Can we depend on barriers of task queue to ensure the memory safety? because we don't need any additional barriers for objects popped/stole from task queues in other places. > > Yes, it's needed. The purpose of this release/acquire pair is to ensure there > is a happens-before chain between the use of a State and it's cleanup/reuse. > Transfers through the taskqueue don't help with that. > > In other places, we have operations on one side of the taskqueue that need to > be ordered wrto operations on the other side of the taskqueue. We don't have > that here. > > Consider two threads which have both obtained access to a State. (At least > one of them must have obtained it via stealing from another thread.) Assume > these are the last two references to the State (it's refcount == 2), and no > further tasks for it will be needed. These two threads will use the State > (getting source/destination, claiming a chunk), and then release the State, > decrementing its refcount. So one of them will decrement to 0. We need to > ensure the cleanup that follows that can't corrupt the use by the accesses > made by the other thread, by ensuring those accesses happen-before the > cleanup. There is no intervening taskqueue manipulation in this scenario. > The operations we need to order are all on the same side of the taskqueue. I see. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1870196252 From wkemper at openjdk.org Wed Dec 4 20:52:55 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Dec 2024 20:52:55 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v11] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread - Merge jdk/master - Use count of regions uncommitted to compute uncommit delta - Decouple polling interval from uncommit time out - Log uncommitted delta and capacity - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread - Restore logging format, show change in committed heap, rather than usage - Allow commits initially - Use idiomatic name for CADR class - Improve comments - ... and 12 more: https://git.openjdk.org/jdk/compare/1a73c76d...c39be0f9 ------------- Changes: https://git.openjdk.org/jdk/pull/22019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=10 Stats: 514 lines in 9 files changed: 387 ins; 94 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From wkemper at openjdk.org Wed Dec 4 20:52:56 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Dec 2024 20:52:56 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v8] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 10:16:58 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 75: >> >>> 73: MonitorLocker locker(&_stop_lock, Mutex::_no_safepoint_check_flag); >>> 74: if (!_stop_requested.is_set()) { >>> 75: locker.wait((int64_t)shrink_period); >> >> I tried to test this on some of my toy examples, and realized this particular line may end up as `locker.wait(0)`, which means "wait indefinitely, until notified". This breaks periodic commits. The old code rode on control thread doing `MAX2(1, ...)`, so we never feed `0` into `wait`. I am also confused about units. The comment above says `shrink_period` is in seconds, but `locker.wait` accepts milliseconds? > > It sounds like this line should be: > > > locker.wait(MAX2(1, shrink_period * 1000)); Sorry, I missed your comments here. I noticed the same and have refactored this code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1870257677 From xpeng at openjdk.org Wed Dec 4 21:26:38 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 21:26:38 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v2] In-Reply-To: References: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> Message-ID: On Wed, 4 Dec 2024 18:54:13 GMT, William Kemper wrote: > Changes look good. Left some nits. Thanks for looking into it, I'll fix them and update the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22538#issuecomment-2518591374 From xpeng at openjdk.org Wed Dec 4 21:41:02 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 21:41:02 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v3] In-Reply-To: References: Message-ID: > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/7f7b370a..50e633f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=01-02 Stats: 9 lines in 3 files changed: 0 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From xpeng at openjdk.org Wed Dec 4 21:43:52 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 21:43:52 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v4] In-Reply-To: References: Message-ID: <0zXiV5vIfQnOWKstBgeUMlbjqem_BoQyzqt3laUw030=.665d49f2-019c-4f98-b276-8aafe1494513@github.com> > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix naming issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/50e633f2..404f7f98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From xpeng at openjdk.org Wed Dec 4 21:53:40 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 21:53:40 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v4] In-Reply-To: References: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> Message-ID: On Wed, 4 Dec 2024 18:42:42 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix naming issue > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 592: > >> 590: shenandoah_assert_heaplocked(); >> 591: if (is_trash() && _recycling.try_set()) { >> 592: if (is_trash()) { > > Is it necessary to check `is_trash` a second time while the heap lock is held? Also, if it _is_ necessary, then it seems like we should `_recycling.unset` in the scope where `_recycling.try_set` happened. As it is, if the second check for `is_trash` was `false`, then the code would not `_recycling.unset`. This method is only called by mutators with heap lock, is_trash is not tested before calling the method, it might be worth to test before calling _recycling.try_set(), otherwise mutator will mostly(fast path) behaves like: 1. _recycling.try_set() -> true (always try to perform CAS, it is slower, we want to void it in fast path). 2. is_trash() -> false and skip the recycling. But we want to fast path for mutator to be like: `is_trash() -> false && _recycling.is_set() -> false`. I have remove the `is_trash` from the code path executed by concurrent cleanup in the new version, that one is not needed since `is_trash` is tested in `ShenandoahRecycleTrashedRegionsTask` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870327496 From xpeng at openjdk.org Wed Dec 4 22:08:39 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 22:08:39 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v4] In-Reply-To: References: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> Message-ID: On Wed, 4 Dec 2024 18:36:28 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix naming issue > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 890: > >> 888: size_t ShenandoahGeneration::decrement_affiliated_region_count() { >> 889: // Assertions only hold true for Java threads since they call this method under heap lock. >> 890: bool const is_java_thread = Thread::current()->is_Java_thread(); > > Prefer not to check `Thread::current` to gate assertions. Could we use an `#ifdef ASSERT` block here? Could this be `decrease_affiliated_region_count(1)` instead? or should we have a separate `decrement_under_lock` method? It will be weird if I only change `decrement_affiliated_region_count` to `decrement_affiliated_region_count_under_lock` in this file, while all the other `decrement_x` / `decrease_x` /`increment_x` / `increase_x` methods in files follow the same convention. It is probably better to add new one like `decrement_affiliated_region_count_without_lock` and not change the existing methods' behavior. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870342598 From xpeng at openjdk.org Wed Dec 4 22:16:53 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 4 Dec 2024 22:16:53 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v5] In-Reply-To: References: Message-ID: > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add decrement_affiliated_region_count_without_lock ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/404f7f98..75cb902c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=03-04 Stats: 17 lines in 3 files changed: 6 ins; 6 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From ysr at openjdk.org Wed Dec 4 23:45:07 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Dec 2024 23:45:07 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v3] In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: > Fix documentation comment, and add an assertion check upon slowpath allocation. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into ricm - virtual -> override missed in previous delta. Fix zero build (ReduceInitialCardMarks is defined only in JVMCI/Compiler2) - virtual -> override in derived class ShenandoahBarrierSet. - Refine previous change and future-proof ReduceInitialCardMarks for GenShen. - Fix up documentation comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22507/files - new: https://git.openjdk.org/jdk/pull/22507/files/23b8103d..76ab8f3b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=01-02 Stats: 10622 lines in 392 files changed: 5385 ins; 3453 del; 1784 mod Patch: https://git.openjdk.org/jdk/pull/22507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22507/head:pull/22507 PR: https://git.openjdk.org/jdk/pull/22507 From xpeng at openjdk.org Thu Dec 5 08:58:52 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 5 Dec 2024 08:58:52 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v6] In-Reply-To: References: Message-ID: > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Ensure atomicity when access region state - Bug fix and move is_trash test into try_recycle ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/75cb902c..11941c57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=04-05 Stats: 29 lines in 2 files changed: 6 ins; 4 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From iwalulya at openjdk.org Thu Dec 5 11:03:40 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 5 Dec 2024 11:03:40 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v6] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 21:56:52 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - revert removal of orderAccess include > - remove unused include of checkedCast.hpp Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2481331068 From ayang at openjdk.org Thu Dec 5 12:09:11 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 5 Dec 2024 12:09:11 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully Message-ID: This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. ------------- Commit messages: - pgc-old-size-value Changes: https://git.openjdk.org/jdk/pull/22575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22575&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345323 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22575/head:pull/22575 PR: https://git.openjdk.org/jdk/pull/22575 From ayang at openjdk.org Thu Dec 5 13:49:39 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 5 Dec 2024 13:49:39 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v6] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 21:56:52 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - revert removal of orderAccess include > - remove unused include of checkedCast.hpp Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2481755650 From tschatzl at openjdk.org Thu Dec 5 13:52:39 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 5 Dec 2024 13:52:39 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v6] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 21:56:52 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - revert removal of orderAccess include > - remove unused include of checkedCast.hpp Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22287#pullrequestreview-2481763621 From kbarrett at openjdk.org Thu Dec 5 17:23:51 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 5 Dec 2024 17:23:51 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v6] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 21:56:52 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - revert removal of orderAccess include > - remove unused include of checkedCast.hpp Thanks for all the reviews and discussion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22287#issuecomment-2520968876 From kbarrett at openjdk.org Thu Dec 5 17:50:46 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 5 Dec 2024 17:50:46 GMT Subject: Integrated: 8344665: Refactor PartialArrayState allocation for reuse In-Reply-To: References: Message-ID: <1htnBYY8OrM-SL5dfvr33utwqvCS1xGtGSV_cIiNXRY=.0201fe46-446c-4f69-b162-f42b46603c0c@github.com> On Wed, 20 Nov 2024 23:40:41 GMT, Kim Barrett wrote: > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 This pull request has now been integrated. Changeset: dbf48a53 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/dbf48a53eca74380b279ce6be3bab2a6a248f7f2 Stats: 265 lines in 14 files changed: 136 ins; 54 del; 75 mod 8344665: Refactor PartialArrayState allocation for reuse Reviewed-by: tschatzl, ayang, iwalulya, zgu ------------- PR: https://git.openjdk.org/jdk/pull/22287 From wkemper at openjdk.org Thu Dec 5 17:58:46 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Dec 2024 17:58:46 GMT Subject: Integrated: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 17:31:58 GMT, William Kemper wrote: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. This pull request has now been integrated. Changeset: bedb68ab Author: William Kemper URL: https://git.openjdk.org/jdk/commit/bedb68aba126c6400ce9f2182105b5294ff42021 Stats: 514 lines in 9 files changed: 387 ins; 94 del; 33 mod 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread Reviewed-by: shade, kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/22019 From wkemper at openjdk.org Thu Dec 5 18:22:46 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Dec 2024 18:22:46 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v6] In-Reply-To: References: <0yobXRSIWciKg1EbfQWBMX5Fhl5P7BI5TWm58IZGB_4=.34ab465c-3d30-4eed-be39-8f81c9aa226a@github.com> Message-ID: On Wed, 4 Dec 2024 21:50:26 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 592: >> >>> 590: shenandoah_assert_heaplocked(); >>> 591: if (is_trash() && _recycling.try_set()) { >>> 592: if (is_trash()) { >> >> Is it necessary to check `is_trash` a second time while the heap lock is held? Also, if it _is_ necessary, then it seems like we should `_recycling.unset` in the scope where `_recycling.try_set` happened. As it is, if the second check for `is_trash` was `false`, then the code would not `_recycling.unset`. > > This method is only called by mutators with heap lock, is_trash is not tested before calling the method, it is worth to test before calling _recycling.try_set(), otherwise mutator will mostly(fast path) behaves like: > 1. _recycling.try_set() -> true (always try to perform CAS, it is slower, we want to void it in fast path). > 2. is_trash() -> false and skip the recycling. > 3. _recycling.unset() (Should be also avoided in fast path) > > But we want to fast path for mutator to be like: `is_trash() -> false && _recycling.is_set() -> false`. > > > I have removed `is_trash` test from the code path executed by concurrent cleanup in the new version, that one is not needed since `is_trash` is tested in `ShenandoahRecycleTrashedRegionsTask` Okay, I get it. The second test on line 593 is necessary because the gc workers don't hold the lock and could _in theory_ recycle the region between the first `is_trash` check on 592 and the `_recycling.try_set`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1871889122 From wkemper at openjdk.org Thu Dec 5 18:25:39 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Dec 2024 18:25:39 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v6] In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 08:58:52 GMT, Xiaolong Peng wrote: >> Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> ### Additional test >> - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: > > - Ensure atomicity when access region state > - Bug fix and move is_trash test into try_recycle Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 629: > 627: _recycling.unset(); > 628: } else { > 629: while (_recycling.is_set()) { Why are adding this? Won't this make the calling worker thread wait on another worker to recycle the region? src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 385: > 383: void print_on(outputStream* st) const; > 384: > 385: void recycle_under_lock(); Should be `try_recycle_under_lock` for consistency. ------------- PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2482551965 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1871890730 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1871891590 From xpeng at openjdk.org Thu Dec 5 18:41:47 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 5 Dec 2024 18:41:47 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v6] In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 18:21:59 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: >> >> - Ensure atomicity when access region state >> - Bug fix and move is_trash test into try_recycle > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 629: > >> 627: _recycling.unset(); >> 628: } else { >> 629: while (_recycling.is_set()) { > > Why are adding this? Won't this make the calling worker thread wait on another worker to recycle the region? hmm, didn't include this intentionally, forgot to remove it from commit, sorry I'll remove it, thanks for catching it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1871911601 From xpeng at openjdk.org Thu Dec 5 18:47:55 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 5 Dec 2024 18:47:55 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v7] In-Reply-To: References: Message-ID: > Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Renaming and remove unnecessary code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/11941c57..368c6aae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=05-06 Stats: 10 lines in 4 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From xpeng at openjdk.org Thu Dec 5 18:50:41 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 5 Dec 2024 18:50:41 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v6] In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 18:22:46 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: >> >> - Ensure atomicity when access region state >> - Bug fix and move is_trash test into try_recycle > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 385: > >> 383: void print_on(outputStream* st) const; >> 384: >> 385: void recycle_under_lock(); > > Should be `try_recycle_under_lock` for consistency. Thanks! I have renamed it for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1871923713 From ysr at openjdk.org Thu Dec 5 19:50:26 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 5 Dec 2024 19:50:26 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v4] In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: > Fix documentation comment, and add an assertion check upon slowpath allocation. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: Demorganization of clause in assert per review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22507/files - new: https://git.openjdk.org/jdk/pull/22507/files/76ab8f3b..040b7e36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22507&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22507/head:pull/22507 PR: https://git.openjdk.org/jdk/pull/22507 From shade at openjdk.org Thu Dec 5 19:50:26 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 5 Dec 2024 19:50:26 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v4] In-Reply-To: References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: <1Oe4Zice-3cTcLULF41aFpWTEYtdSWvtKGPjU5vS-OI=.8ca333b0-e3be-48ae-bae5-3b4e7302b5aa@github.com> On Thu, 5 Dec 2024 19:47:00 GMT, Y. Srinivas Ramakrishna wrote: >> Fix documentation comment, and add an assertion check upon slowpath allocation. >> >> I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. > > Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: > > Demorganization of clause in assert per review feedback Still fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22507#pullrequestreview-2482712946 From ysr at openjdk.org Thu Dec 5 19:50:28 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 5 Dec 2024 19:50:28 GMT Subject: RFR: 8344593: GenShen: Review of ReduceInitialCardMarks [v3] In-Reply-To: References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: <0UllfVoUy2tqPKQQHI1wozH1uw0X60ziff5--HiONJg=.57e4f59c-0e59-41d4-accc-38532eb215a8@github.com> On Wed, 4 Dec 2024 23:45:07 GMT, Y. Srinivas Ramakrishna wrote: >> Fix documentation comment, and add an assertion check upon slowpath allocation. >> >> I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. > > Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into ricm > - virtual -> override missed in previous delta. > Fix zero build (ReduceInitialCardMarks is defined only in > JVMCI/Compiler2) > - virtual -> override in derived class ShenandoahBarrierSet. > - Refine previous change and future-proof ReduceInitialCardMarks for > GenShen. > - Fix up documentation comment. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22507#issuecomment-2521256371 From ysr at openjdk.org Thu Dec 5 19:50:28 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 5 Dec 2024 19:50:28 GMT Subject: Integrated: 8344593: GenShen: Review of ReduceInitialCardMarks In-Reply-To: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> References: <-VSOzYldcT3fuR13S0xOXlf9e1bnXaTXl-bvGqcsuFw=.f04f82b7-a195-4906-bb1e-ec005a8f53d8@github.com> Message-ID: On Tue, 3 Dec 2024 02:41:26 GMT, Y. Srinivas Ramakrishna wrote: > Fix documentation comment, and add an assertion check upon slowpath allocation. > > I also checked the impact of +/-ReduceInitialCardMarks on GenShen using SPECjbb and didn't see any difference. We've left it enabled by default because less card marking is better in this case. This pull request has now been integrated. Changeset: a97dca52 Author: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/jdk/commit/a97dca52c9257121fc96613a4b591920c1c3e31a Stats: 28 lines in 3 files changed: 18 ins; 0 del; 10 mod 8344593: GenShen: Review of ReduceInitialCardMarks Reviewed-by: shade, cslucas ------------- PR: https://git.openjdk.org/jdk/pull/22507 From stefank at openjdk.org Fri Dec 6 10:21:46 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 6 Dec 2024 10:21:46 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code Message-ID: The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. Tested with tier1-3 ------------- Commit messages: - 8345659: Fix broken alignment after ReservedSpace splitting in GC code Changes: https://git.openjdk.org/jdk/pull/22602/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22602&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345659 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22602.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22602/head:pull/22602 PR: https://git.openjdk.org/jdk/pull/22602 From ayang at openjdk.org Mon Dec 9 09:07:41 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 9 Dec 2024 09:07:41 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v3] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 19:56:54 GMT, Ivan Walulya wrote: >> src/hotspot/share/gc/g1/g1CardSet.cpp line 788: >> >>> 786: G1HeapRegion* r = G1CollectedHeap::heap()->region_at(region_idx); >>> 787: assert(r->rem_set()->card_set() != this, "must be"); >>> 788: #endif >> >> Since this introduces local vars, can they be grouped in a `{}` scope? > > It's possible, but I have not seen this done in the hotspot code. With a quick search, I can find some in runtime code, though not universal. Regardless, it's better encapsulation, IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1875612042 From ayang at openjdk.org Mon Dec 9 09:07:45 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 9 Dec 2024 09:07:45 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v4] In-Reply-To: References: Message-ID: <1NPhKMIwfLaQe7sm34Mi6aDIfXavGGCGniWkUgOlgbs=.51f1c252-9616-4985-993b-415e1f5d90e8@github.com> On Tue, 3 Dec 2024 19:56:23 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Albert Review src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3055: > 3053: } > 3054: > 3055: void G1CollectedHeap::prepare_group_cardsets_for_scan () { Pre-existing: extra space. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 655: > 653: G1HeapRegion* r = ci._r; > 654: r->uninstall_group_cardset(); > 655: r->rem_set()->set_state_complete(); Why changing the remset state here? I'd expect it's already complete; otherwise, how can it be added to cset? src/hotspot/share/gc/g1/g1CollectionSet.inline.hpp line 32: > 30: > 31: template > 32: inline void G1CollectionSet::merge_cardsets_for_collection_groups(G1CollectedHeap* g1h, CardOrRangeVisitor& cl, uint worker_id, uint num_workers) { The first arg seems unused. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 38: > 36: { } > 37: > 38: void G1CSetCandidateGroup::add(G1HeapRegion* hr) { I believe this method is only for retained regions; if so, one can make that explicit by naming it sth like `add_region_region`. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 45: > 43: void G1CSetCandidateGroup::add(G1CollectionSetCandidateInfo& hr_info) { > 44: G1HeapRegion* hr = hr_info._r; > 45: assert(!hr->is_young(), "should be flagged as survivor region"); Can one assert region is Old here? src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 180: > 178: void G1CSetCandidateGroupList::prepare_for_scan() { > 179: for (G1CSetCandidateGroup* gr : _groups) { > 180: gr->card_set()->reset_table_scanner(); This is a group card set, so why not calling `reset_table_scanner_for_groups`? src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 345: > 343: G1CSetCandidateGroupList other_marking_groups; > 344: G1CSetCandidateGroupList other_retained_groups; > 345: Extra blank line. src/hotspot/share/gc/g1/g1HeapRegion.cpp line 144: > 142: if (is_young() || is_free()) { > 143: return -1.0; > 144: } I don't get why young-regions are treated specially. Also, it's weird that "free" region needs to have a gc-efficiency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1874099261 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1874028649 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1873380805 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1873276256 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1873280032 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1874100872 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1873300822 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1873254478 From sjohanss at openjdk.org Mon Dec 9 09:18:37 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 9 Dec 2024 09:18:37 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 12:04:20 GMT, Albert Mingkun Yang wrote: > This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). > > Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. Looks good, thanks for doing the fix. Just a small comment on the comment. src/hotspot/share/gc/shared/genArguments.cpp line 41: > 39: > 40: // If InitialHeapSize or MinHeapSize is not set on cmdline, this variable, > 41: // together with NewSize, are used to derive them. Suggestion: // together with NewSize, is used to derive them. ------------- Marked as reviewed by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22575#pullrequestreview-2488189045 PR Review Comment: https://git.openjdk.org/jdk/pull/22575#discussion_r1875622629 From ayang at openjdk.org Mon Dec 9 10:27:53 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 9 Dec 2024 10:27:53 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully [v2] In-Reply-To: References: Message-ID: > This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). > > Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/gc/shared/genArguments.cpp Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22575/files - new: https://git.openjdk.org/jdk/pull/22575/files/3a28b9c5..c3600d5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22575&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22575&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22575/head:pull/22575 PR: https://git.openjdk.org/jdk/pull/22575 From sjohanss at openjdk.org Mon Dec 9 11:52:38 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 9 Dec 2024 11:52:38 GMT Subject: RFR: 8345217: Parallel: Refactor PSParallelCompact::next_src_region In-Reply-To: References: Message-ID: <1MuJ-CNF894oRU90Aadcm5PUC1dbKP2hJp38oDycp4M=.2656a056-f497-4048-9905-5e4653780361@github.com> On Thu, 28 Nov 2024 15:22:33 GMT, Albert Mingkun Yang wrote: > Simple removing some unnecessary calculations in locating the next source-region during full-gc. > > Test: tier1-5 Marked as reviewed by sjohanss (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22441#pullrequestreview-2488566791 From xpeng at openjdk.org Mon Dec 9 20:42:21 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 9 Dec 2024 20:42:21 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References: Message-ID: > Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > ### Additional test > - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Use parallel_heap_region_iterate to walk the regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/368c6aae..4507656e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=06-07 Stats: 13 lines in 1 file changed: 1 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From wkemper at openjdk.org Mon Dec 9 22:13:38 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 9 Dec 2024 22:13:38 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References: Message-ID: <7dozpUw8xDi1lZPEjbaKvAwsaQJrM6piABij7_hwXzI=.33525ec5-f57d-40cd-b312-c0eed413034b@github.com> On Mon, 9 Dec 2024 20:42:21 GMT, Xiaolong Peng wrote: >> Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> ### Additional test >> - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Use parallel_heap_region_iterate to walk the regions Thanks for the updates. It looks good to me! ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2490222353 From zgu at openjdk.org Tue Dec 10 00:19:46 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 10 Dec 2024 00:19:46 GMT Subject: RFR: 8345217: Parallel: Refactor PSParallelCompact::next_src_region In-Reply-To: References: Message-ID: <9qPBcD4Tupk4Q_w5KqbryVVukyoCxaU7GgcVUfDH60M=.ac5fbc93-a783-4003-b24b-78863efc5c13@github.com> On Thu, 28 Nov 2024 15:22:33 GMT, Albert Mingkun Yang wrote: > Simple removing some unnecessary calculations in locating the next source-region during full-gc. > > Test: tier1-5 LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22441#pullrequestreview-2490452909 From xpeng at openjdk.org Tue Dec 10 01:18:52 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 10 Dec 2024 01:18:52 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References: Message-ID: On Mon, 9 Dec 2024 20:42:21 GMT, Xiaolong Peng wrote: >> Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> For the same test test, but with large heap with 32G memory, the improvement on concurrent cleanup is much smaller, which might be related t... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Use parallel_heap_region_iterate to walk the regions @kdnilsen @ysramakrishna @shipilev I'm gonna need more reviews for the change, thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22538#issuecomment-2529963162 From ayang at openjdk.org Tue Dec 10 08:31:45 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 10 Dec 2024 08:31:45 GMT Subject: RFR: 8345217: Parallel: Refactor PSParallelCompact::next_src_region In-Reply-To: References: Message-ID: <7xCEvXMFQ_QlFFRNmzfpM8Zpo84v4JdoqA6HRtER5NM=.8c6972d5-47e8-4dc3-ba9f-f663727589b1@github.com> On Thu, 28 Nov 2024 15:22:33 GMT, Albert Mingkun Yang wrote: > Simple removing some unnecessary calculations in locating the next source-region during full-gc. > > Test: tier1-5 Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22441#issuecomment-2530787590 From ayang at openjdk.org Tue Dec 10 08:31:46 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 10 Dec 2024 08:31:46 GMT Subject: Integrated: 8345217: Parallel: Refactor PSParallelCompact::next_src_region In-Reply-To: References: Message-ID: <0kgbjdw_0-tnv06LPwXnaJ4QtCs9ALu7gig0-0eZt1w=.efe14207-8e9f-40b9-90b0-d1f9905aed9c@github.com> On Thu, 28 Nov 2024 15:22:33 GMT, Albert Mingkun Yang wrote: > Simple removing some unnecessary calculations in locating the next source-region during full-gc. > > Test: tier1-5 This pull request has now been integrated. Changeset: 7e73c436 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/7e73c436ef5cc035304347bf64ae8e2b4ce45ab1 Stats: 10 lines in 1 file changed: 0 ins; 7 del; 3 mod 8345217: Parallel: Refactor PSParallelCompact::next_src_region Reviewed-by: tschatzl, sjohanss, zgu ------------- PR: https://git.openjdk.org/jdk/pull/22441 From tschatzl at openjdk.org Tue Dec 10 11:06:38 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 10 Dec 2024 11:06:38 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully [v2] In-Reply-To: References: Message-ID: On Mon, 9 Dec 2024 10:27:53 GMT, Albert Mingkun Yang wrote: >> This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). >> >> Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/gc/shared/genArguments.cpp > > Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> src/hotspot/share/gc/shared/genArguments.cpp line 43: > 41: // together with NewSize, is used to derive them. > 42: // Using the same value when it was a configurable flag to avoid breakage. > 43: // See more in JDK-8345323 I do not like referrals to the bug tracker in the code, and/or referring to some code the past ("when it was configurable"). Better explicitly state the problem with heap sizing and large pages and file a follow-up RFE (not mentioning it here). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22575#discussion_r1877886973 From tschatzl at openjdk.org Tue Dec 10 11:10:38 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 10 Dec 2024 11:10:38 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully [v2] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 11:04:09 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/gc/shared/genArguments.cpp >> >> Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> > > src/hotspot/share/gc/shared/genArguments.cpp line 43: > >> 41: // together with NewSize, is used to derive them. >> 42: // Using the same value when it was a configurable flag to avoid breakage. >> 43: // See more in JDK-8345323 > > I do not like referrals to the bug tracker in the code, and/or referring to some code the past ("when it was configurable"). > Better explicitly state the problem with heap sizing and large pages and file a follow-up RFE (not mentioning it here). I.e. something like "If the default value of OldSize is too small, then ..., leading to the generations not aligned and not being able to allocate large pages" or so. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22575#discussion_r1877892382 From kbarrett at openjdk.org Tue Dec 10 16:42:49 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 10 Dec 2024 16:42:49 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n Message-ID: Please review this change to zUtils.cpp to use a for-loop to fill a block of memory rather than using the std::fill_n algorithm. Use of is currently not permitted in HotSpot. Testing: mach5 tier1 ------------- Commit messages: - zUtils remove Changes: https://git.openjdk.org/jdk/pull/22667/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22667&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337995 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22667/head:pull/22667 PR: https://git.openjdk.org/jdk/pull/22667 From ysr at openjdk.org Tue Dec 10 18:13:46 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 10 Dec 2024 18:13:46 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References: Message-ID: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> On Mon, 9 Dec 2024 20:42:21 GMT, Xiaolong Peng wrote: >> Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> For the same test test, but with large heap with 32G memory, the improvement on concurrent cleanup is much smaller, which might be related t... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Use parallel_heap_region_iterate to walk the regions Good improvement. Just some minor comments. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1272: > 1270: void ShenandoahFreeSet::recycle_trash() { > 1271: // lock is not reentrable, check we don't have it > 1272: shenandoah_assert_not_heaplocked(); Not your change but may be a good time to fix: "not reentrable" -> "non-reentrant" (which is the more traditional term) src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 884: > 882: // During full gc, multiple GC worker threads may change region affiliations without a lock. No lock is enforced > 883: // on read and write of _affiliated_region_count. At the end of full gc, a single thread overwrites the count with > 884: // a coherent value. Is the comment in its entirety still valid now? The part about "No lock is enforced" seems a bit dubious given the atomic op. Similarly the comment in `decrement_...` below. src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 91: > 89: SpaceMangler::mangle_region(MemRegion(_bottom, _end)); > 90: } > 91: _recycling.unset(); Was this necessary, given the c'tor of the struct ShenandoiahFlag is called for the `_recycling` field? To check, I'd assert: assert(!_recycling.is_set(), "C'tor should have been called by now."); src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 574: > 572: > 573: > 574: void ShenandoahHeapRegion::recycle_internal() { A paranoid assertion would be: ```assert(_recycling.is_set() && is_trash(), "Wrong state");``` But may be this is too paranoid since callers already check. src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 192: > 190: void make_committed_bypass(); > 191: > 192: // Individual states: // Primitive state predicates src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 199: > 197: > 198: bool is_empty_state(RegionState state) const { return state == _empty_committed || state == _empty_uncommitted; } > 199: bool is_humongous_start_state(RegionState state) const { return state == _humongous_start || state == _pinned_humongous_start; } These should move below line 201 which states: // Participation in logical groups: src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 201: > 199: bool is_humongous_start_state(RegionState state) const { return state == _humongous_start || state == _pinned_humongous_start; } > 200: > 201: // Participation in logical groups: // Derived state predicates (boolean combinations of individual states) src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 210: > 208: bool is_cset() const { auto cur_state = state(); return cur_state == _cset || cur_state == _pinned_cset; } > 209: bool is_pinned() const { auto cur_state = state(); return cur_state == _pinned || cur_state == _pinned_cset || cur_state == _pinned_humongous_start; } > 210: bool is_regular_pinned() const { return state() == _pinned; } Should go up into the primitive list at the top. src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 268: > 266: CENSUS_NOISE(uint _youth;) // tracks epochs of retrograde ageing (rejuvenation) > 267: > 268: ShenandoahSharedFlag _recycling; 1-line documentation of what it represents. // Used to indicate that the region is being recycled; see try_recycle*(). ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2490587106 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1877048744 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1877055108 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1877164961 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1877167040 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1877076166 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1877075569 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1877092813 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1877113173 PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1877134362 From xpeng at openjdk.org Tue Dec 10 19:48:40 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 10 Dec 2024 19:48:40 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> References: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> Message-ID: On Tue, 10 Dec 2024 02:42:50 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Use parallel_heap_region_iterate to walk the regions > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 91: > >> 89: SpaceMangler::mangle_region(MemRegion(_bottom, _end)); >> 90: } >> 91: _recycling.unset(); > > Was this necessary, given the c'tor of the struct ShenandoiahFlag is called for the `_recycling` field? To check, I'd assert: > > assert(!_recycling.is_set(), "C'tor should have been called by now."); There could be race condition that other caller immoderately set the flag, hence the assert may fail, notice similar race condition in test, that is why the double check for is_trash() was added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1878729609 From xpeng at openjdk.org Tue Dec 10 19:55:20 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 10 Dec 2024 19:55:20 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v9] In-Reply-To: References: Message-ID: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> > Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > For the same test test, but with large heap with 32G memory, the improvement on concurrent cleanup is much smaller, which might be related to less race and contention with mutator threads when the heap size i... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22538/files - new: https://git.openjdk.org/jdk/pull/22538/files/4507656e..1bce0d7e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22538&range=07-08 Stats: 13 lines in 3 files changed: 3 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22538/head:pull/22538 PR: https://git.openjdk.org/jdk/pull/22538 From xpeng at openjdk.org Tue Dec 10 21:02:42 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 10 Dec 2024 21:02:42 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> References: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> Message-ID: On Tue, 10 Dec 2024 01:52:41 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Use parallel_heap_region_iterate to walk the regions > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 884: > >> 882: // During full gc, multiple GC worker threads may change region affiliations without a lock. No lock is enforced >> 883: // on read and write of _affiliated_region_count. At the end of full gc, a single thread overwrites the count with >> 884: // a coherent value. > > Is the comment in its entirety still valid now? The part about "No lock is enforced" seems a bit dubious given the atomic op. > > Similarly the comment in `decrement_...` below. Yes It is atomic, the lock/safepoint seems not needed. I'll probably keep the comment as it is in this PR, since the are called from different places of FullGC and concurrentGC, we can cleanup these methods later I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1878849089 From xpeng at openjdk.org Tue Dec 10 21:05:41 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 10 Dec 2024 21:05:41 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> References: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> Message-ID: <_lfJtyNS419Ur3V2hhwLdcP0EPntP-nmP8Rn8HdJxm4=.a654a75b-d72d-4436-bd80-317178bac8f6@github.com> On Tue, 10 Dec 2024 18:10:46 GMT, Y. Srinivas Ramakrishna wrote: > Good improvement. > > Just some minor comments. Thank you @ysramakrishna, I have addressed all the comments except the one for comment on decrement_/increment_ methods. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22538#issuecomment-2532885946 From ysr at openjdk.org Tue Dec 10 23:10:40 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 10 Dec 2024 23:10:40 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> Message-ID: <79DiuHxGpHadOKlUi5pVkg28taCXNWLIl3G08y_9jBY=.c543ba6b-0af6-458e-95f9-8e51e5033efb@github.com> On Tue, 10 Dec 2024 19:46:08 GMT, Xiaolong Peng wrote: > There could be race condition that other caller immoderately set the flag, hence the assert may fail, Which other caller? We are asserting here in the constructor of the SHR object. Is this object visible to anyone other than the constructing thread at this point? I am not sure I understand the reason for a race here. It's possible I am missing something in the lifecycle of the SHR object here. If so, it would be good to add a brief comment on why this needs to occur here despite the constructor for the `_recycling` flag which should have been called by this point, so it should already be unset by now. > notice similar race condition in test, that is why the double check for is_trash() was added. Yes, I understood that race, which is between multiple threads potentially racing to recycle a trashed region, and resolves such a race in favor of the thread that manages to CAS true into `_recycling` with interlocking checks for its `trash`ness. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1879042788 From ysr at openjdk.org Tue Dec 10 23:15:41 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 10 Dec 2024 23:15:41 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v9] In-Reply-To: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> References: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> Message-ID: On Tue, 10 Dec 2024 19:55:20 GMT, Xiaolong Peng wrote: >> Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> For the same test test, but with large heap with 32G memory, the improvement on concurrent cleanup is much smaller, which might be related t... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Reviewed and left a couple of comments. No need for a re-review, since neither of my comments is a correctness issue that necessarily needs any code changes. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2493890753 From ysr at openjdk.org Tue Dec 10 23:15:42 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 10 Dec 2024 23:15:42 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> Message-ID: On Tue, 10 Dec 2024 21:00:00 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 884: >> >>> 882: // During full gc, multiple GC worker threads may change region affiliations without a lock. No lock is enforced >>> 883: // on read and write of _affiliated_region_count. At the end of full gc, a single thread overwrites the count with >>> 884: // a coherent value. >> >> Is the comment in its entirety still valid now? The part about "No lock is enforced" seems a bit dubious given the atomic op. >> >> Similarly the comment in `decrement_...` below. > > Yes It is atomic, the lock/safepoint seems not needed. I'll probably keep the comment as it is in this PR, since the are called from different places of FullGC and concurrentGC, we can cleanup these methods later I think. ok for now. Will be worth examining the uses from full at some point but this was just a comment so ok for now. May be leave a `TODO` comment to track if you feel like. // TODO: Check and correct comment, if obsolete. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1879050043 From xpeng at openjdk.org Tue Dec 10 23:24:40 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 10 Dec 2024 23:24:40 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: <79DiuHxGpHadOKlUi5pVkg28taCXNWLIl3G08y_9jBY=.c543ba6b-0af6-458e-95f9-8e51e5033efb@github.com> References: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> <79DiuHxGpHadOKlUi5pVkg28taCXNWLIl3G08y_9jBY=.c543ba6b-0af6-458e-95f9-8e51e5033efb@github.com> Message-ID: <3LbG7L0omjTTXfYRPN2GoKQfnyxHquumNGy_G63vKlI=.a6b6c244-af12-4ebe-8b91-224a674dbde5@github.com> On Tue, 10 Dec 2024 23:07:00 GMT, Y. Srinivas Ramakrishna wrote: > Which other caller? Sorry for the confusion, not the caller of this specific method. I meant to say the the mutator thread, mutator may call try_recycle_trashed here https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp#L1005 under heap-lock, we have removed the heap-lock from GC concurrent cleanup, therefore it becomes race condition between mutator and GC threads. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1879060999 From xpeng at openjdk.org Wed Dec 11 00:38:40 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 11 Dec 2024 00:38:40 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: References: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> Message-ID: On Tue, 10 Dec 2024 23:11:32 GMT, Y. Srinivas Ramakrishna wrote: >> Yes It is atomic, the lock/safepoint seems not needed. I'll probably keep the comment as it is in this PR, since the are called from different places of FullGC and concurrentGC, we can cleanup these methods later I think. > > ok for now. Will be worth examining the uses from full at some point but this was just a comment so ok for now. May be leave a `TODO` comment to track if you feel like. > > > // TODO: Check and correct comment, if obsolete. Thanks, I'll add it if I amend any new change to this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1879139672 From jwaters at openjdk.org Wed Dec 11 06:06:36 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 11 Dec 2024 06:06:36 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n In-Reply-To: References: Message-ID: <_MBiaiBVZYpYUja6xV07wGNE8f0UPpKCME-EdAwsCPE=.5ea574a0-5331-43ad-b03b-68854ed86e5f@github.com> On Tue, 10 Dec 2024 16:37:43 GMT, Kim Barrett wrote: > Please review this change to zUtils.cpp to use a for-loop to fill a block of > memory rather than using the std::fill_n algorithm. Use of is > currently not permitted in HotSpot. > > Testing: mach5 tier1 I wonder how this even got in... Windows/ARM64 also uses forbidden C++ Standard Library utilities, namely in the atomic implementation. I was thinking about fixing that, but I'm unsure of whether its use is truly needed and justified or not, and additionally Windows/Zero also uses the same atomic header as well Sorry I meant orderAccess, not atomic ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/22667#pullrequestreview-2494392701 PR Comment: https://git.openjdk.org/jdk/pull/22667#issuecomment-2533714318 PR Comment: https://git.openjdk.org/jdk/pull/22667#issuecomment-2533715955 From kdnilsen at openjdk.org Wed Dec 11 06:38:40 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 11 Dec 2024 06:38:40 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v9] In-Reply-To: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> References: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> Message-ID: On Tue, 10 Dec 2024 19:55:20 GMT, Xiaolong Peng wrote: >> Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> For the same test test, but with large heap with 32G memory, the improvement on concurrent cleanup is much smaller, which might be related t... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Thanks. This looks like a nice improvement. ------------- Marked as reviewed by kdnilsen (Author). PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2494439957 From mli at openjdk.org Wed Dec 11 09:29:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Dec 2024 09:29:36 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 16:37:43 GMT, Kim Barrett wrote: > Please review this change to zUtils.cpp to use a for-loop to fill a block of > memory rather than using the std::fill_n algorithm. Use of is > currently not permitted in HotSpot. > > Testing: mach5 tier1 The code change itself looks good. Just got one question about the rule, I know a c++ compiler needs to support c++14, as `std::fill_n` is introduced in 17/20/26, so seems we should not use it in hotspot code, is this the reason why we can not use `std::fill_n` here? Or there is a place recording which libaraies/files are allowed or not allowed in hotspot? Thanks! ------------- PR Review: https://git.openjdk.org/jdk/pull/22667#pullrequestreview-2494895901 From tschatzl at openjdk.org Wed Dec 11 09:46:37 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 11 Dec 2024 09:46:37 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 16:37:43 GMT, Kim Barrett wrote: > Please review this change to zUtils.cpp to use a for-loop to fill a block of > memory rather than using the std::fill_n algorithm. Use of is > currently not permitted in HotSpot. > > Testing: mach5 tier1 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22667#pullrequestreview-2494945298 From tschatzl at openjdk.org Wed Dec 11 09:54:37 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 11 Dec 2024 09:54:37 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 09:26:34 GMT, Hamlin Li wrote: > The code change itself looks good. > > Just got one question about the rule, I know a c++ compiler needs to support c++14, as `std::fill_n` is introduced in 17/20/26, so seems we should not use it in hotspot code, is this the reason why we can not use `std::fill_n` here? Or there is a place recording which libaraies/files are allowed or not allowed in hotspot? Thanks! The hotspot style guide only allows a few libraries from the standard library to be used (https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md?plain=1#L531). A previous paragraph (https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md?plain=1#L377) states that unless explicitly allowed, use of other features is disallowed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22667#issuecomment-2535349035 From mli at openjdk.org Wed Dec 11 10:13:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Dec 2024 10:13:38 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 16:37:43 GMT, Kim Barrett wrote: > Please review this change to zUtils.cpp to use a for-loop to fill a block of > memory rather than using the std::fill_n algorithm. Use of is > currently not permitted in HotSpot. > > Testing: mach5 tier1 Looks good! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22667#pullrequestreview-2495010451 From stefank at openjdk.org Wed Dec 11 10:13:39 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Dec 2024 10:13:39 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 16:37:43 GMT, Kim Barrett wrote: > Please review this change to zUtils.cpp to use a for-loop to fill a block of > memory rather than using the std::fill_n algorithm. Use of is > currently not permitted in HotSpot. > > Testing: mach5 tier1 I don't really see the need to forbid `std::fill_n`, so I would have preferred an update to the style guide. However, if we really need to remove it then I would prefer style modification to the explicit loop. src/hotspot/share/gc/z/zUtils.cpp line 41: > 39: for (uintptr_t* end = addr + count; addr < end; ++addr) { > 40: *addr = value; > 41: } I tend to avoid changing values of the input arguments, so I would like to see that changed. Unless there's a problem with the below code I would like to see this changed to this: for (uintptr_t* current = addr; current < addr + count; ++current) { *current = value; } Or maybe even: for (size_t i = 0; i < count; ++i) { *(addr + i) = value; } ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22667#pullrequestreview-2495009553 PR Review Comment: https://git.openjdk.org/jdk/pull/22667#discussion_r1879762974 From mli at openjdk.org Wed Dec 11 10:13:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Dec 2024 10:13:40 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 09:51:39 GMT, Thomas Schatzl wrote: >> The code change itself looks good. >> >> Just got one question about the rule, I know a c++ compiler needs to support c++14, as `std::fill_n` is introduced in 17/20/26, so seems we should not use it in hotspot code, is this the reason why we can not use `std::fill_n` here? Or there is a place recording which libaraies/files are allowed or not allowed in hotspot? Thanks! > >> The code change itself looks good. >> >> Just got one question about the rule, I know a c++ compiler needs to support c++14, as `std::fill_n` is introduced in 17/20/26, so seems we should not use it in hotspot code, is this the reason why we can not use `std::fill_n` here? Or there is a place recording which libaraies/files are allowed or not allowed in hotspot? Thanks! > > The hotspot style guide only allows a few libraries from the standard library to be used (https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md?plain=1#L531). A previous paragraph (https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md?plain=1#L377) states that unless explicitly allowed, use of other features is disallowed. @tschatzl Thank you for the information! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22667#issuecomment-2535389567 From ayang at openjdk.org Wed Dec 11 10:18:37 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 11 Dec 2024 10:18:37 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 10:16:33 GMT, Stefan Karlsson wrote: > The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. > > Tested with tier1-3 I wonder if doing the same for `first_part` makes it more symmetric/readable. ------------- PR Review: https://git.openjdk.org/jdk/pull/22602#pullrequestreview-2495045065 From stefank at openjdk.org Wed Dec 11 10:55:37 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Dec 2024 10:55:37 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 10:16:04 GMT, Albert Mingkun Yang wrote: > I wonder if doing the same for first_part makes it more symmetric/readable. I think it is unclear if that would help readability or not. The `first_part` reserved space has the same base as the `heap_rs`, so it is kind of natural that it inherits the alignment from `heap_rs`. To me it seems somewhat redundant to explicitly specifying `HeapAlignment` again. Any other reviewers that prefer one or the other way? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22602#issuecomment-2535512786 From ayang at openjdk.org Wed Dec 11 11:06:37 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 11 Dec 2024 11:06:37 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 10:16:33 GMT, Stefan Karlsson wrote: > The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. > > Tested with tier1-3 I meant sth like `heap_rs.first_part(MaxNewSize, GenAlignment);`; then, both generations can check they are compliant wrt `GenAlignment`. Checking the `HeapAlignment` compliance, it should be done before this; at another abstraction level. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22602#issuecomment-2535537193 From stefank at openjdk.org Wed Dec 11 11:10:38 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Dec 2024 11:10:38 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 10:16:33 GMT, Stefan Karlsson wrote: > The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. > > Tested with tier1-3 You are right, that seems like a good thing to do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22602#issuecomment-2535546180 From stefank at openjdk.org Wed Dec 11 11:41:13 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Dec 2024 11:41:13 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code [v2] In-Reply-To: References: Message-ID: > The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. > > Tested with tier1-3 Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Initialize with GenAlignment for both generations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22602/files - new: https://git.openjdk.org/jdk/pull/22602/files/e4eba9a8..6c5cf2e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22602&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22602&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22602.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22602/head:pull/22602 PR: https://git.openjdk.org/jdk/pull/22602 From ayang at openjdk.org Wed Dec 11 11:45:42 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 11 Dec 2024 11:45:42 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 11:41:13 GMT, Stefan Karlsson wrote: >> The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. >> >> Tested with tier1-3 > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Initialize with GenAlignment for both generations Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22602#pullrequestreview-2495360280 From aboldtch at openjdk.org Wed Dec 11 11:59:41 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 11 Dec 2024 11:59:41 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 11:41:13 GMT, Stefan Karlsson wrote: >> The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. >> >> Tested with tier1-3 > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Initialize with GenAlignment for both generations lgtm. The alignment always require extra thinking as it has multiple meanings. It is both the required alignment of the base, but it is also the granularity / alignment of the size. But we have the same meaning when it comes to HeapAlignment, SpaceAlignment and GenAlignment. So the partition size does not break this invariant. It is unfortunate that we do not assert that these invariants are maintained when we partition a reserved space. Something like: ReservedSpace::ReservedSpace(char* base, size_t size, size_t alignment, size_t page_size, bool special, bool executable) : _fd_for_heap(-1) { assert((size % os::vm_allocation_granularity()) == 0, "size not allocation aligned"); + assert(alignment != 0, "must be set"); + assert(size % alignment == 0, "must be"); + assert((uintptr_t)base % alignment == 0, "must be"); initialize_members(base, size, alignment, page_size, special, executable); } But I know you are working on refactoring and improving the ReservedSpace. Let us hope we can make this more robust in the future. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22602#pullrequestreview-2495453184 From stefank at openjdk.org Wed Dec 11 12:15:40 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Dec 2024 12:15:40 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 11:56:49 GMT, Axel Boldt-Christmas wrote: > but it is also the granularity / alignment of the size I don't think that this is a strict requirement throughout the JVM's usage of ReservedSpace. AFAIU, the alignment only strictly applies to the base pointer, but some users has also has an 'alignment' requirement (as opposed to a 'page_size' requirement) on the size. Let's take an extra round thinking about that for the ReservedSpace rewrites. > But I know you are working on refactoring and improving the ReservedSpace. Let us hope we can make this more robust in the future. Yes, I have extra verification in my other patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22602#issuecomment-2535825178 From stefank at openjdk.org Wed Dec 11 12:15:41 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Dec 2024 12:15:41 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 11:41:13 GMT, Stefan Karlsson wrote: >> The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. >> >> Tested with tier1-3 > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Initialize with GenAlignment for both generations Thanks both for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22602#issuecomment-2535825532 From aboldtch at openjdk.org Wed Dec 11 12:53:38 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 11 Dec 2024 12:53:38 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 12:12:55 GMT, Stefan Karlsson wrote: > I don't think that this is a strict requirement throughout the JVM's usage of ReservedSpace. AFAIU, the alignment only strictly applies to the base pointer, but some users has also has an 'alignment' requirement (as opposed to a 'page_size' requirement) on the size. Yes, the users might not have these requirements, but the current implementation of ReservedSpace enforces it. It might be nice to separate these two properties. AFAICT all paths go through `ReservedSpace::reserve` which `assert(is_aligned(size, alignment), "Size must be aligned to the requested alignment");` and all three cases ensures that base is aligned, and if the succeed they call `initialize_members`. Of course `_alignment` can be 0 if we have no reservation. But calling `partition` / `last_part` / `first_part` is then not allowed (the same is true most ReservedSpace member functions). We have uses from the outside that do not care about the alignment, and they will get some page_size (or `os::vm_allocation_granularity()`) as their alignment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22602#issuecomment-2535906376 From stefank at openjdk.org Wed Dec 11 14:35:39 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Dec 2024 14:35:39 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 11:41:13 GMT, Stefan Karlsson wrote: >> The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. >> >> Tested with tier1-3 > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Initialize with GenAlignment for both generations All good points, Axel. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22602#issuecomment-2536168413 From stefank at openjdk.org Wed Dec 11 15:00:31 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Dec 2024 15:00:31 GMT Subject: RFR: 8345659: Fix broken alignment after ReservedSpace splitting in GC code [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 11:41:13 GMT, Stefan Karlsson wrote: >> The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. >> >> Tested with tier1-3 > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Initialize with GenAlignment for both generations So, for completeness of the discussion. AFAICT, we have `partition`, `last_part`, `first_part`, and `space_for_range` that all completely skips verifying against `alignment`. The intention is to try to enforce the alignment in an up-coming RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22602#issuecomment-2536223385 From ayang at openjdk.org Wed Dec 11 15:05:50 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 11 Dec 2024 15:05:50 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully [v3] In-Reply-To: References: Message-ID: <5aDaq0UXEwi2Cc231RS7leEJ-CI6YQ5eEeLMTBsMLVA=.9e699229-7bec-4928-937f-351e45aa2391@github.com> > This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). > > Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: refer to the new ticket ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22575/files - new: https://git.openjdk.org/jdk/pull/22575/files/c3600d5d..a0f1af3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22575&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22575&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22575/head:pull/22575 PR: https://git.openjdk.org/jdk/pull/22575 From stefank at openjdk.org Wed Dec 11 15:13:53 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Dec 2024 15:13:53 GMT Subject: Integrated: 8345659: Fix broken alignment after ReservedSpace splitting in GC code In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 10:16:33 GMT, Stefan Karlsson wrote: > The Serial and Parallel GCs create a ReservedSpace for the total heap and then splits it into a young generation ReservedSpace and an old generation ReservedSpace. The latter operation creates an ReservedSpace with an alignment that doesn't match the base address. This bug is benign because the ReservedSpaces are short-lived and we don't look at the alignment. However, if we are to add stricter checks when creating ReservedSpaces we need to fix this. > > Tested with tier1-3 This pull request has now been integrated. Changeset: c34b87c5 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/c34b87c52bbaf37d01cb2a73846631a037b312a5 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod 8345659: Fix broken alignment after ReservedSpace splitting in GC code Reviewed-by: ayang, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/22602 From szaldana at openjdk.org Wed Dec 11 16:14:21 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 11 Dec 2024 16:14:21 GMT Subject: RFR: 8346008: Fix recent NULL usage backsliding in Shenandoah Message-ID: Hi all, This PR addresses [8346008](https://bugs.openjdk.org/browse/JDK-8346008). It's a follow-up from [8345647](https://bugs.openjdk.org/browse/JDK-8345647) with some cases I missed. Thanks, Sonia ------------- Commit messages: - 8346008: Fix recent NULL usage backsliding in Shenandoah Changes: https://git.openjdk.org/jdk/pull/22684/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22684&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346008 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22684/head:pull/22684 PR: https://git.openjdk.org/jdk/pull/22684 From kbarrett at openjdk.org Wed Dec 11 16:56:11 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 11 Dec 2024 16:56:11 GMT Subject: RFR: 8346008: Fix recent NULL usage backsliding in Shenandoah In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 16:10:06 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8346008](https://bugs.openjdk.org/browse/JDK-8346008). It's a follow-up from [8345647](https://bugs.openjdk.org/browse/JDK-8345647) with some cases I missed. > > Thanks, > Sonia Looks good, and trivial. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22684#pullrequestreview-2496271535 From ysr at openjdk.org Wed Dec 11 17:07:16 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 11 Dec 2024 17:07:16 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v9] In-Reply-To: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> References: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> Message-ID: <_9vmQOMCXU2ENdRzJn9U1ajdPOA9VQqCuESleQiLHWA=.f3bf455c-c38c-47c0-9268-11c60bf75ce1@github.com> On Tue, 10 Dec 2024 19:55:20 GMT, Xiaolong Peng wrote: >> Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> For the same test test, but with large heap with 32G memory, the improvement on concurrent cleanup is much smaller, which might be related t... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2496299420 From ysr at openjdk.org Wed Dec 11 17:07:18 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 11 Dec 2024 17:07:18 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v8] In-Reply-To: <3LbG7L0omjTTXfYRPN2GoKQfnyxHquumNGy_G63vKlI=.a6b6c244-af12-4ebe-8b91-224a674dbde5@github.com> References: <9_mdklCBz6MYBwvRw2GWIBL5zN9pz2UZ4ZnhsjgsryU=.0f8a2340-7a97-4659-8b65-b82e4cefd3dd@github.com> <79DiuHxGpHadOKlUi5pVkg28taCXNWLIl3G08y_9jBY=.c543ba6b-0af6-458e-95f9-8e51e5033efb@github.com> <3LbG7L0omjTTXfYRPN2GoKQfnyxHquumNGy_G63vKlI=.a6b6c244-af12-4ebe-8b91-224a674dbde5@github.com> Message-ID: <2nhU3rIBr8AYy4B9ZEdj8kkBmu2McIF8amVZWoXbEo8=.f7708324-3000-45f9-93d7-5f90fdf972b1@github.com> On Tue, 10 Dec 2024 23:22:02 GMT, Xiaolong Peng wrote: >>> There could be race condition that other caller immoderately set the flag, hence the assert may fail, >> >> Which other caller? We are asserting here in the constructor of the SHR object. Is this object visible to anyone other than the constructing thread at this point? I am not sure I understand the reason for a race here. >> >> It's possible I am missing something in the lifecycle of the SHR object here. >> >> If so, it would be good to add a brief comment on why this needs to occur here despite the constructor for the `_recycling` flag which should have been called by this point, so it should already be unset by now. >> >>> notice similar race condition in test, that is why the double check for is_trash() was added. >> >> Yes, I understood that race, which is between multiple threads potentially racing to recycle a trashed region, and resolves such a race in favor of the thread that manages to CAS true into `_recycling` with interlocking checks for its `trash`ness. > >> Which other caller? > > Sorry for the confusion, not the caller of this specific method. > > I meant to say the the mutator thread, mutator may call try_recycle_trashed here https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp#L1005 under heap-lock, we have removed the heap-lock from GC concurrent cleanup, therefore it becomes race condition between mutator and GC threads. OK, that's what I meant. I didn't think the region would be visible to any other thread in the time that the constructor is being executed (which, my guess was, would be when the ShenandoahHeap is first initialized -- before any mutators exist that can access the heap), but I may be missing something here in the lifecycle of a region. Thanks for pointing out the possibility of a race (but that makes me wonder about other things that may go wrong if there were such a race.) I'll think more about this later. In any case, what I was pointing out (based on my mental model) was not a correctness issue, so I'll go away and try and understand the race you mention. No change is needed here. It's good as is, and my review approval stands. Thanks Xiaolong! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1880574386 From iwalulya at openjdk.org Wed Dec 11 17:33:24 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 11 Dec 2024 17:33:24 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v5] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - use reset_table_scanner_for_groups - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - Print Group details in G1PrintRegionLivenessInfoClosure - Albert Review 2 - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - Albert Review - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/gc/g1/g1_globals.hpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - ... and 9 more: https://git.openjdk.org/jdk/compare/af2d52d6...554b7f52 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/e687b0cc..554b7f52 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=03-04 Stats: 322872 lines in 6836 files changed: 158835 ins; 139718 del; 24319 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From iwalulya at openjdk.org Wed Dec 11 17:33:24 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 11 Dec 2024 17:33:24 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v4] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 19:56:23 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Albert Review I have made changes to accomodate printing of liveness information for groups during G1PrintRegionLivenessInfoClosure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22015#issuecomment-2536637581 From kbarrett at openjdk.org Wed Dec 11 18:25:28 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 11 Dec 2024 18:25:28 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n [v2] In-Reply-To: References: Message-ID: > Please review this change to zUtils.cpp to use a for-loop to fill a block of > memory rather than using the std::fill_n algorithm. Use of is > currently not permitted in HotSpot. > > Testing: mach5 tier1 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: stefank review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22667/files - new: https://git.openjdk.org/jdk/pull/22667/files/cbfc9708..26cc6203 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22667&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22667&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22667/head:pull/22667 PR: https://git.openjdk.org/jdk/pull/22667 From kbarrett at openjdk.org Wed Dec 11 18:25:29 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 11 Dec 2024 18:25:29 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 10:11:15 GMT, Stefan Karlsson wrote: > I don't really see the need to forbid `std::fill_n`, so I would have preferred an update to the style guide. The "approved" HotSpot way to do that operation would be to use something from the Copy class. But that class has many shortcomings, and really needs some TLC and to be "modernized" to use templates and the like. But not something I'm interested in doing today for this specific bit of code. I considered adding `template void Copy::fill_n(T*, size_t, T)` and using that, but decided futzing with Copy really ought to be it's own thing. It's less about using `std::fill_n` than about `#include `. Once you permit the latter, it becomes much harder to enforce restrictions. And various changes we might want to make may render such an include problematic. I found this bit of code because I was looking for includes of C++ Standard Library headers in the context of working on improvements to the function poisoning mechanism. Not all Standard Libraries are as careful about protecting themselves against outside influence as gcc's. clang's definitely gets tripped up. I don't remember whether trips similarly, or if this change was just a preemptive strike. ( is also a pretty large hammer for this little nail.) There are approaches to dealing with those sorts of things (mostly "wrapper" headers), but I'm not interested in going there for this case at this time. (This issue and the wrapper header technique is mentioned in the Style Guide, as something that might happen in the future.) Also, if you think something in the Style Guide is onorous, confusing, or wrong, feel free to propose a change. > src/hotspot/share/gc/z/zUtils.cpp line 41: > >> 39: for (uintptr_t* end = addr + count; addr < end; ++addr) { >> 40: *addr = value; >> 41: } > > I tend to avoid changing values of the input arguments, so I would like to see that changed. Unless there's a problem with the below code I would like to see this changed to this: > > for (uintptr_t* current = addr; current < addr + count; ++current) { > *current = value; > } > > > Or maybe even: > > for (size_t i = 0; i < count; ++i) { > *(addr + i) = value; > } Okay. I went with something like the 2nd suggestion, though with array syntax rather than explict pointer arithmetic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22667#issuecomment-2536801139 PR Review Comment: https://git.openjdk.org/jdk/pull/22667#discussion_r1880699772 From xpeng at openjdk.org Wed Dec 11 18:37:20 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 11 Dec 2024 18:37:20 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v9] In-Reply-To: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> References: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> Message-ID: On Tue, 10 Dec 2024 19:55:20 GMT, Xiaolong Peng wrote: >> Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> For the same test test, but with large heap with 32G memory, the improvement on concurrent cleanup is much smaller, which might be related t... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22538#issuecomment-2536824758 From duke at openjdk.org Wed Dec 11 18:37:20 2024 From: duke at openjdk.org (duke) Date: Wed, 11 Dec 2024 18:37:20 GMT Subject: RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v9] In-Reply-To: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> References: <0lQjdkCZMaEzHYRRr544yx3kxkG6GLniucGZWFDW2-E=.2326ebc9-c64d-4d1f-a1d9-8c7f4a76cd6a@github.com> Message-ID: On Tue, 10 Dec 2024 19:55:20 GMT, Xiaolong Peng wrote: >> Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. >> >> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: >> >> TIP: >> >> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms >> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms >> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms >> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms >> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms >> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms >> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms >> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms >> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms >> >> Parallelized: >> >> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms >> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms >> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms >> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms >> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms >> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms >> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms >> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms >> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms >> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms >> >> >> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` >> >> >> For the same test test, but with large heap with 32G memory, the improvement on concurrent cleanup is much smaller, which might be related t... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments @pengxiaolong Your change (at version 1bce0d7e212bb3b1468c3455043226c2d37ddd7f) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22538#issuecomment-2536826985 From wkemper at openjdk.org Wed Dec 11 18:57:12 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 11 Dec 2024 18:57:12 GMT Subject: RFR: 8346008: Fix recent NULL usage backsliding in Shenandoah In-Reply-To: References: Message-ID: <5OBJ0O7NFOKqlh0W1fjfLX_XKXfe3Bdy8ytUncM6iKo=.4877c6aa-4b1a-4448-9954-e6cc5c69ed1d@github.com> On Wed, 11 Dec 2024 16:10:06 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8346008](https://bugs.openjdk.org/browse/JDK-8346008). It's a follow-up from [8345647](https://bugs.openjdk.org/browse/JDK-8345647) with some cases I missed. > > Thanks, > Sonia Thank you - looks good to me! ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/22684#pullrequestreview-2496572142 From wkemper at openjdk.org Wed Dec 11 19:50:28 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 11 Dec 2024 19:50:28 GMT Subject: RFR: 8344049: Shenandoah: Eliminate init-update-refs safepoint Message-ID: <6ZVLoWPco9LC3XZOturDKG9F42n20Ie4h61f5Ap5iIY=.bbeb52d3-3de0-4778-b504-a69dc6ef7d3b@github.com> Shenandoah typically takes 4 safepoints per GC cycle. Although Shenandoah itself does not spend much time on these safepoints, it may still take quite some time for all of the mutator threads to reach the safepoint. The occasionally long time-to-safepoint increases latency in the higher percentiles. The `init-update-refs` safepoint is responsible for retiring GCLABs (and PLABs) used during evacuation. Once evacuation is complete, no threads will access these LABs. This need not be done on a safepoint. `init-update-refs` is also where the global and thread local copies of the `gc_state` are updated. However, here we are turning off the `WEAK_ROOTS` flag _after_ all of the unmarked weak referents have been `nulled` out, so this does not need to happen atomically with respect to the mutators. Neither is it necessary to change the other state flags (EVACUATION, UPDATE_REFS) atomically across all mutators. Note that the `init-update-refs` safepoint is still taken if either verification or `ShenandoahPacing` are enabled. ------------- Commit messages: - Fix comments - Fix comment, revert unnecessary change - Merge remote-tracking branch 'jdk/master' into remove-init-update-refs-safepoint - Fix phase encoding to handle weak roots - WIP: Use Threads::threads_do for propagating gc state (consolidated) - WIP: Use Threads::threads_do for propagating gc state - Remove unnecessary gc state propagations - Encapsulate gc state - Revert unnecessary changes - Merge tag 'jdk-25+1' into two-steps-backward - ... and 20 more: https://git.openjdk.org/jdk/compare/c6317191...9aaef708 Changes: https://git.openjdk.org/jdk/pull/22688/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22688&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344049 Stats: 232 lines in 11 files changed: 125 ins; 70 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/22688.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22688/head:pull/22688 PR: https://git.openjdk.org/jdk/pull/22688 From rcastanedalo at openjdk.org Wed Dec 11 20:48:35 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Dec 2024 20:48:35 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads Message-ID: Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands, which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::has_initial_implicit_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (measured in percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo chopin benchmarks: ![C2-inc-hit-rate-jdk-25+1-vs-jdk-25+1-with-8345067](https://github.com/user-attachments/assets/8d114058-c6b2-4254-a374-0d0b220af718) The larger number of implicit null checks results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. A further extension of the optimization to arbitrary memory access instructions (including e.g. G1 object stores, which emit multiple memory accesses at arbitrary address offsets) will be investigated separately as part of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627). #### Testing - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). ------------- Commit messages: - Revert unnecessary changes - Move check to original location - Enable zLoadP as implicit null check candidates on riscv and ppc - Refactor assertion - Simplify test - Mark zLoadP in x64 as exploitable by implicit null check optimization - Fix comment - Do not mark g1LoadP/g1LoadN as initial_implicit_null_check_candidate, they cannot be exploited anyway due to indirect memory operand - Exploit zLoadP only if the memory operand is indOffL8 (indirect does not work anyway due to limitations in C2's analysis) - Complete test with stores and atomics - ... and 10 more: https://git.openjdk.org/jdk/compare/bedb68ab...01dd8618 Changes: https://git.openjdk.org/jdk/pull/22678/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22678&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345067 Stats: 381 lines in 15 files changed: 336 ins; 37 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/22678.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22678/head:pull/22678 PR: https://git.openjdk.org/jdk/pull/22678 From wkemper at openjdk.org Wed Dec 11 22:36:14 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 11 Dec 2024 22:36:14 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests Message-ID: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. ------------- Commit messages: - Do not get cpu time for threads that have terminated Changes: https://git.openjdk.org/jdk/pull/22693/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22693&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345970 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22693/head:pull/22693 PR: https://git.openjdk.org/jdk/pull/22693 From xpeng at openjdk.org Thu Dec 12 01:11:53 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 12 Dec 2024 01:11:53 GMT Subject: Integrated: 8345423: Shenandoah: Parallelize concurrent cleanup In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 08:25:40 GMT, Xiaolong Peng wrote: > Concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free. > > With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly: > > TIP: > > [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms > [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms > [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms > [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms > [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms > [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms > [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms > [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms > [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms > > Parallelized: > > [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms > [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms > [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms > [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms > [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms > [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms > [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms > [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms > [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms > [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms > > > JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc` > > > For the same test test, but with large heap with 32G memory, the improvement on concurrent cleanup is much smaller, which might be related to less race and contention with mutator threads when the heap size i... This pull request has now been integrated. Changeset: 4da6fd42 Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/4da6fd4283a13be1711e7ad948f1d05a0a9148a5 Stats: 228 lines in 13 files changed: 79 ins; 56 del; 93 mod 8345423: Shenandoah: Parallelize concurrent cleanup Reviewed-by: ysr, kdnilsen, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/22538 From ysr at openjdk.org Thu Dec 12 02:29:35 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 12 Dec 2024 02:29:35 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests In-Reply-To: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> Message-ID: <4R-kS9FPgXdBjB3YIz22BJjnq5WacC5bZvl1xiZDNao=.a3795c17-887f-4566-9642-ec3e709fa6db@github.com> On Wed, 11 Dec 2024 22:32:00 GMT, William Kemper wrote: > I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 51: > 49: ThreadTimeAccumulator() : total_time(0) {} > 50: void do_thread(Thread* thread) override { > 51: if (!thread->has_terminated()) { There's an inherent race here at destruction time because the target thread may be terminated between the check and the cpu time call -- thus you've narrowed the race window but not closed it. Note that this is today called only on GC-worker-like threads (include controller & regulator & worker threads). I agree that the crashes are likely occurring during shutdown, just as you surmised. I'd suggest looking at the constructor and destructor (enroll and disenroll) of the MMU Tracker Task, and disenroll it before the GC-workers et al. are shutdown. That would be the most surgical and cleanest fix, and closes the race. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22693#discussion_r1881283986 From ysr at openjdk.org Thu Dec 12 02:32:34 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 12 Dec 2024 02:32:34 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests In-Reply-To: <4R-kS9FPgXdBjB3YIz22BJjnq5WacC5bZvl1xiZDNao=.a3795c17-887f-4566-9642-ec3e709fa6db@github.com> References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> <4R-kS9FPgXdBjB3YIz22BJjnq5WacC5bZvl1xiZDNao=.a3795c17-887f-4566-9642-ec3e709fa6db@github.com> Message-ID: On Thu, 12 Dec 2024 02:26:17 GMT, Y. Srinivas Ramakrishna wrote: >> I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 51: > >> 49: ThreadTimeAccumulator() : total_time(0) {} >> 50: void do_thread(Thread* thread) override { >> 51: if (!thread->has_terminated()) { > > There's an inherent race here at destruction time because the target thread may be terminated between the check and the cpu time call -- thus you've narrowed the race window but not closed it. > > Note that this is today called only on GC-worker-like threads (include controller & regulator & worker threads). > > I agree that the crashes are likely occurring during shutdown, just as you surmised. I'd suggest looking at the constructor and destructor (enroll and disenroll) of the MMU Tracker Task, and disenroll it before the GC-workers et al. are shutdown. That would be the most surgical and cleanest fix, and closes the race. Right now the disenroll is done a tad late, since the task is disenrolled in the task's destructor which doesn't happen until the heap is destructed. I think at least the disenroll should be done before we start shutting down GC worker threads etc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22693#discussion_r1881289328 From stefank at openjdk.org Thu Dec 12 09:20:41 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 12 Dec 2024 09:20:41 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 18:25:28 GMT, Kim Barrett wrote: >> Please review this change to zUtils.cpp to use a for-loop to fill a block of >> memory rather than using the std::fill_n algorithm. Use of is >> currently not permitted in HotSpot. >> >> Testing: mach5 tier1 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > stefank review Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22667#pullrequestreview-2498536282 From mli at openjdk.org Thu Dec 12 10:22:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 12 Dec 2024 10:22:40 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 18:25:28 GMT, Kim Barrett wrote: >> Please review this change to zUtils.cpp to use a for-loop to fill a block of >> memory rather than using the std::fill_n algorithm. Use of is >> currently not permitted in HotSpot. >> >> Testing: mach5 tier1 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > stefank review Still good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22667#pullrequestreview-2498800062 From sjohanss at openjdk.org Thu Dec 12 13:52:37 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 12 Dec 2024 13:52:37 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully [v3] In-Reply-To: <5aDaq0UXEwi2Cc231RS7leEJ-CI6YQ5eEeLMTBsMLVA=.9e699229-7bec-4928-937f-351e45aa2391@github.com> References: <5aDaq0UXEwi2Cc231RS7leEJ-CI6YQ5eEeLMTBsMLVA=.9e699229-7bec-4928-937f-351e45aa2391@github.com> Message-ID: On Wed, 11 Dec 2024 15:05:50 GMT, Albert Mingkun Yang wrote: >> This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). >> >> Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > refer to the new ticket Marked as reviewed by sjohanss (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22575#pullrequestreview-2499698962 From kbarrett at openjdk.org Thu Dec 12 14:42:40 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 12 Dec 2024 14:42:40 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 09:51:39 GMT, Thomas Schatzl wrote: > Just got one question about the rule, I know a c++ compiler needs to support c++14, as `std::fill_n` is introduced in 17/20/26 Just to be clear, std::fill_n goes way back. It's in C++98/03, and probably earlier. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22667#issuecomment-2539135922 From kbarrett at openjdk.org Thu Dec 12 14:42:41 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 12 Dec 2024 14:42:41 GMT Subject: RFR: 8337995: ZUtils::fill uses std::fill_n [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 18:25:28 GMT, Kim Barrett wrote: >> Please review this change to zUtils.cpp to use a for-loop to fill a block of >> memory rather than using the std::fill_n algorithm. Use of is >> currently not permitted in HotSpot. >> >> Testing: mach5 tier1 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > stefank review Thanks y'all for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22667#issuecomment-2539138822 From kbarrett at openjdk.org Thu Dec 12 14:42:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 12 Dec 2024 14:42:42 GMT Subject: Integrated: 8337995: ZUtils::fill uses std::fill_n In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 16:37:43 GMT, Kim Barrett wrote: > Please review this change to zUtils.cpp to use a for-loop to fill a block of > memory rather than using the std::fill_n algorithm. Use of is > currently not permitted in HotSpot. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: 22845a77 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/22845a77a2175202876d0029f75fa32271e07b91 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod 8337995: ZUtils::fill uses std::fill_n Reviewed-by: mli, stefank, jwaters, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/22667 From wkemper at openjdk.org Thu Dec 12 17:30:40 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 12 Dec 2024 17:30:40 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests In-Reply-To: References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> <4R-kS9FPgXdBjB3YIz22BJjnq5WacC5bZvl1xiZDNao=.a3795c17-887f-4566-9642-ec3e709fa6db@github.com> Message-ID: On Thu, 12 Dec 2024 02:30:21 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 51: >> >>> 49: ThreadTimeAccumulator() : total_time(0) {} >>> 50: void do_thread(Thread* thread) override { >>> 51: if (!thread->has_terminated()) { >> >> There's an inherent race here at destruction time because the target thread may be terminated between the check and the cpu time call -- thus you've narrowed the race window but not closed it. >> >> Note that this is today called only on GC-worker-like threads (include controller & regulator & worker threads). >> >> I agree that the crashes are likely occurring during shutdown, just as you surmised. I'd suggest looking at the constructor and destructor (enroll and disenroll) of the MMU Tracker Task, and disenroll it before the GC-workers et al. are shutdown. That would be the most surgical and cleanest fix, and closes the race. > > Right now the disenroll is done a tad late, since the task is disenrolled in the task's destructor which doesn't happen until the heap is destructed. I think at least the disenroll should be done before we start shutting down GC worker threads etc. Good catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22693#discussion_r1882595612 From wkemper at openjdk.org Thu Dec 12 17:41:51 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 12 Dec 2024 17:41:51 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v2] In-Reply-To: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> Message-ID: > I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Stop periodic mmu task before stopping GC threads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22693/files - new: https://git.openjdk.org/jdk/pull/22693/files/c3b93aec..e99aaa5f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22693&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22693&range=00-01 Stats: 16 lines in 3 files changed: 13 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22693/head:pull/22693 PR: https://git.openjdk.org/jdk/pull/22693 From szaldana at openjdk.org Thu Dec 12 18:17:40 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 12 Dec 2024 18:17:40 GMT Subject: Integrated: 8346008: Fix recent NULL usage backsliding in Shenandoah In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 16:10:06 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8346008](https://bugs.openjdk.org/browse/JDK-8346008). It's a follow-up from [8345647](https://bugs.openjdk.org/browse/JDK-8345647) with some cases I missed. > > Thanks, > Sonia This pull request has now been integrated. Changeset: ff85865b Author: Sonia Zaldana Calles URL: https://git.openjdk.org/jdk/commit/ff85865b752b7a2e765e2035d372a4dbb9279fea Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8346008: Fix recent NULL usage backsliding in Shenandoah Reviewed-by: kbarrett, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/22684 From wkemper at openjdk.org Thu Dec 12 23:19:13 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 12 Dec 2024 23:19:13 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v3] In-Reply-To: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> Message-ID: <_0Lmlae5oSCw1kiPVaXIUF090ldn6P4GohQr9XWlF9s=.81d00eb2-6296-41c2-a77f-21b18f8ba3c1@github.com> > I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Stop regulator thread after control thread - Revert unnecessary change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22693/files - new: https://git.openjdk.org/jdk/pull/22693/files/e99aaa5f..6b2f74e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22693&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22693&range=01-02 Stats: 11 lines in 2 files changed: 8 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22693/head:pull/22693 PR: https://git.openjdk.org/jdk/pull/22693 From wkemper at openjdk.org Thu Dec 12 23:42:10 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 12 Dec 2024 23:42:10 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v4] In-Reply-To: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> Message-ID: > I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove debug logging ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22693/files - new: https://git.openjdk.org/jdk/pull/22693/files/6b2f74e5..88bcf9ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22693&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22693&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22693/head:pull/22693 PR: https://git.openjdk.org/jdk/pull/22693 From wkemper at openjdk.org Thu Dec 12 23:42:10 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 12 Dec 2024 23:42:10 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v4] In-Reply-To: References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> Message-ID: <0k4EwyBsiOqaL8HK3NDFz7HjOQijRL011-_2yvvhnKs=.d707b67d-094e-4f5b-a6ea-365afd4a9c6d@github.com> On Thu, 12 Dec 2024 23:39:00 GMT, William Kemper wrote: >> I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Remove debug logging src/hotspot/share/gc/shenandoah/shenandoahGenerationalHeap.cpp line 184: > 182: void ShenandoahGenerationalHeap::stop() { > 183: ShenandoahHeap::stop(); > 184: regulator_thread()->stop(); This is the fix for the crash reported in https://bugs.openjdk.org/browse/JDK-8345970. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22693#discussion_r1883034809 From wkemper at openjdk.org Thu Dec 12 23:42:10 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 12 Dec 2024 23:42:10 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v4] In-Reply-To: References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> <4R-kS9FPgXdBjB3YIz22BJjnq5WacC5bZvl1xiZDNao=.a3795c17-887f-4566-9642-ec3e709fa6db@github.com> Message-ID: On Thu, 12 Dec 2024 17:28:23 GMT, William Kemper wrote: >> Right now the disenroll is done a tad late, since the task is disenrolled in the task's destructor which doesn't happen until the heap is destructed. I think at least the disenroll should be done before we start shutting down GC worker threads etc. > > Good catch! It wasn't actually the periodic task causing the crash (though that issue has been fixed here as well). The crash was caused by the control thread trying to `ShenandoahHeap::do_gc_threads` which included the regulator thread that had already been stopped by `ShenandoahHeap::stop`. The fix here was to stop the control thread before stopping the regulator thread, thereby preventing the control thread from trying to access stopped threads. This was fairly easy to reproduce (and verify) once I had an Alpine Linux environment set up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22693#discussion_r1883032058 From ysr at openjdk.org Fri Dec 13 00:00:35 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 13 Dec 2024 00:00:35 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v4] In-Reply-To: References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> Message-ID: On Thu, 12 Dec 2024 23:42:10 GMT, William Kemper wrote: >> I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Remove debug logging Looks good; a few more comments for your consideration in tightening the downstream code perhaps? (I haven't examined it, but thought it might be worthwhile, if not in this ticket then in a follow-up separately?) ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22693#pullrequestreview-2500989938 From ysr at openjdk.org Fri Dec 13 00:00:36 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 13 Dec 2024 00:00:36 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v4] In-Reply-To: References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> <4R-kS9FPgXdBjB3YIz22BJjnq5WacC5bZvl1xiZDNao=.a3795c17-887f-4566-9642-ec3e709fa6db@github.com> Message-ID: On Thu, 12 Dec 2024 23:35:20 GMT, William Kemper wrote: >> Good catch! > > It wasn't actually the periodic task causing the crash (though that issue has been fixed here as well). The crash was caused by the control thread trying to `ShenandoahHeap::do_gc_threads` which included the regulator thread that had already been stopped by `ShenandoahHeap::stop`. The fix here was to stop the control thread before stopping the regulator thread, thereby preventing the control thread from trying to access stopped threads. > > This was fairly easy to reproduce (and verify) once I had an Alpine Linux environment set up. In light of the new findings, should the `if` test be converted now into an `assert` of some sort about the threads not having been terminated during any test (I know the assert is still "racy" -- it doesn't cover the entire window -- but sound to call here. Also wondering if the original when run with a fastdebug build may have asserted down in the `os::` method because of finding a null `osthread`? Should the `os::` methods assert on non-nullness of associated `osthread`? Worth checking now that you have an AlpineLinux box to test on?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22693#discussion_r1883044632 From wkemper at openjdk.org Fri Dec 13 00:26:55 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Dec 2024 00:26:55 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v5] In-Reply-To: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> Message-ID: > I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Turn test into assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22693/files - new: https://git.openjdk.org/jdk/pull/22693/files/88bcf9ab..0e3d0a86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22693&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22693&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22693/head:pull/22693 PR: https://git.openjdk.org/jdk/pull/22693 From kbarrett at openjdk.org Fri Dec 13 00:26:59 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 13 Dec 2024 00:26:59 GMT Subject: RFR: 8346139: test_memset_with_concurrent_readers.cpp should not use Message-ID: Please review this change to test_memset_with_concurrent_readers.cpp to use HotSpot's stringStream instead of std::string_stream to build the error message when a failure is detected. Also removed the include of , which is one of the standard headers we expect to be included by globalDefinitions. Testing: mach5 tier1 Locally patched the test to fail and ran it, checking the output. ------------- Commit messages: - no iostream in test Changes: https://git.openjdk.org/jdk/pull/22725/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22725&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346139 Stats: 23 lines in 1 file changed: 2 ins; 6 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/22725.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22725/head:pull/22725 PR: https://git.openjdk.org/jdk/pull/22725 From wkemper at openjdk.org Fri Dec 13 00:30:40 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Dec 2024 00:30:40 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v5] In-Reply-To: References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> <4R-kS9FPgXdBjB3YIz22BJjnq5WacC5bZvl1xiZDNao=.a3795c17-887f-4566-9642-ec3e709fa6db@github.com> Message-ID: On Thu, 12 Dec 2024 23:51:52 GMT, Y. Srinivas Ramakrishna wrote: >> It wasn't actually the periodic task causing the crash (though that issue has been fixed here as well). The crash was caused by the control thread trying to `ShenandoahHeap::do_gc_threads` which included the regulator thread that had already been stopped by `ShenandoahHeap::stop`. The fix here was to stop the control thread before stopping the regulator thread, thereby preventing the control thread from trying to access stopped threads. >> >> This was fairly easy to reproduce (and verify) once I had an Alpine Linux environment set up. > > In light of the new findings, should the `if` test be converted now into an `assert` of some sort about the threads not having been terminated during any test (I know the assert is still "racy" -- it doesn't cover the entire window -- but sound to call here. Also wondering if the original when run with a fastdebug build may have asserted down in the `os::` method because of finding a null `osthread`? Should the `os::` methods assert on non-nullness of associated `osthread`? Worth checking now that you have an AlpineLinux box to test on?) I don't think we can readily test the validity of the `osthread's` native thread handle. I'm sure it _could_ be done, but it's platform specific. In this case, for example, the [glibc version](https://github.com/lattera/glibc/blob/master/nptl/pthread_getcpuclockid.c) of `pthread_getcpuclockid` returns an error code if the handle is `INVALID_TD_P`. The [musl version](https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_getcpuclockid.c) (used for Alpine Linux), on the other hand, has no such check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22693#discussion_r1883069731 From kbarrett at openjdk.org Fri Dec 13 01:29:13 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 13 Dec 2024 01:29:13 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v2] In-Reply-To: References: Message-ID: > Please review this change that introduces two new helper classes to simplify > the usage of PartialArrayStates to manage splitting the processing of large > object arrays into parallelizable chunks. G1 and Parallel young GCs are > changed to use this new mechanism. > > PartialArrayTaskStats is used to collect and report statistics related to > array splitting. It replaces the direct implementation in PSPromotionManager, > and is now also used by G1 young GCs. > > PartialArraySplitter packages up most of the work involved in splitting and > processing tasks. It provides task allocation and release, enqueuing, chunk > claiming, and statistics tracking. It does this by encapsulating existing > objects and functionality. Using array splitting is mostly reduced to calling > the splitter's start function and then calling it's step function to process > partial states. This substantially reduces the amount of code for each client > to perform this work. > > Testing: mach5 tier1-5 > > Manually ran some test programs with each of G1 and Parallel, with taskqueue > stats logging enabled, and checked that the logged statistics looked okay. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into pa-splitter - parallel uses PartialArraySplitter - g1 uses PartialArraySplitter - add PartialArraySplitter - add PartialArrayTaskStats ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22622/files - new: https://git.openjdk.org/jdk/pull/22622/files/7da8b4b0..b0ea3f51 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22622&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22622&range=00-01 Stats: 15644 lines in 2681 files changed: 8811 ins; 1996 del; 4837 mod Patch: https://git.openjdk.org/jdk/pull/22622.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22622/head:pull/22622 PR: https://git.openjdk.org/jdk/pull/22622 From ysr at openjdk.org Fri Dec 13 01:41:40 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 13 Dec 2024 01:41:40 GMT Subject: RFR: 8345970: pthread_getcpuclockid related crashes in shenandoah tests [v5] In-Reply-To: References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> Message-ID: On Fri, 13 Dec 2024 00:26:55 GMT, William Kemper wrote: >> I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Turn test into assert ? ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22693#pullrequestreview-2501082206 From stefank at openjdk.org Fri Dec 13 07:59:35 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Dec 2024 07:59:35 GMT Subject: RFR: 8346139: test_memset_with_concurrent_readers.cpp should not use In-Reply-To: References: Message-ID: <74cghjPWtL2xfedEOINfJP948Z46eM1lSlSLjNO_qiI=.8d6508aa-244a-4f15-9fc3-e3ede6e674e1@github.com> On Fri, 13 Dec 2024 00:22:28 GMT, Kim Barrett wrote: > Please review this change to test_memset_with_concurrent_readers.cpp to use > HotSpot's stringStream instead of std::string_stream to build the error > message when a failure is detected. > > Also removed the include of , which is one of the standard headers > we expect to be included by globalDefinitions. > > Testing: mach5 tier1 > Locally patched the test to fail and ran it, checking the output. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22725#pullrequestreview-2501652198 From tschatzl at openjdk.org Fri Dec 13 08:18:35 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Dec 2024 08:18:35 GMT Subject: RFR: 8346139: test_memset_with_concurrent_readers.cpp should not use In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 00:22:28 GMT, Kim Barrett wrote: > Please review this change to test_memset_with_concurrent_readers.cpp to use > HotSpot's stringStream instead of std::string_stream to build the error > message when a failure is detected. > > Also removed the include of , which is one of the standard headers > we expect to be included by globalDefinitions. > > Testing: mach5 tier1 > Locally patched the test to fail and ran it, checking the output. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22725#pullrequestreview-2501681289 From tschatzl at openjdk.org Fri Dec 13 08:53:38 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Dec 2024 08:53:38 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v2] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 01:29:13 GMT, Kim Barrett wrote: >> Please review this change that introduces two new helper classes to simplify >> the usage of PartialArrayStates to manage splitting the processing of large >> object arrays into parallelizable chunks. G1 and Parallel young GCs are >> changed to use this new mechanism. >> >> PartialArrayTaskStats is used to collect and report statistics related to >> array splitting. It replaces the direct implementation in PSPromotionManager, >> and is now also used by G1 young GCs. >> >> PartialArraySplitter packages up most of the work involved in splitting and >> processing tasks. It provides task allocation and release, enqueuing, chunk >> claiming, and statistics tracking. It does this by encapsulating existing >> objects and functionality. Using array splitting is mostly reduced to calling >> the splitter's start function and then calling it's step function to process >> partial states. This substantially reduces the amount of code for each client >> to perform this work. >> >> Testing: mach5 tier1-5 >> >> Manually ran some test programs with each of G1 and Parallel, with taskqueue >> stats logging enabled, and checked that the logged statistics looked okay. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into pa-splitter > - parallel uses PartialArraySplitter > - g1 uses PartialArraySplitter > - add PartialArraySplitter > - add PartialArrayTaskStats Apart from these comments not being in the right place, seems good. src/hotspot/share/gc/shared/partialArraySplitter.inline.hpp line 45: > 43: PartialArrayTaskStepper::Step step = _stepper.start(length); > 44: // Push any needed partial scan tasks. Pushed before processing the initial > 45: // chunk to allow other workers to steal while we're processing. This comment (last two lines) now imo better belongs to where this method is called. Same with similar comment in `step()`. ------------- PR Review: https://git.openjdk.org/jdk/pull/22622#pullrequestreview-2501736071 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1883567632 From tschatzl at openjdk.org Fri Dec 13 08:54:36 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Dec 2024 08:54:36 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully [v3] In-Reply-To: <5aDaq0UXEwi2Cc231RS7leEJ-CI6YQ5eEeLMTBsMLVA=.9e699229-7bec-4928-937f-351e45aa2391@github.com> References: <5aDaq0UXEwi2Cc231RS7leEJ-CI6YQ5eEeLMTBsMLVA=.9e699229-7bec-4928-937f-351e45aa2391@github.com> Message-ID: <-_oyLasHYhcWbOK5z5nNnZ8G0Kidxg5PxPJteIX-7MI=.3a8c07f3-06c2-465d-96f2-bc57f5e4698f@github.com> On Wed, 11 Dec 2024 15:05:50 GMT, Albert Mingkun Yang wrote: >> This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). >> >> Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > refer to the new ticket lgtm ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22575#pullrequestreview-2501755780 From tschatzl at openjdk.org Fri Dec 13 09:27:42 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Dec 2024 09:27:42 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v5] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 17:33:24 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - use reset_table_scanner_for_groups > - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 > - Print Group details in G1PrintRegionLivenessInfoClosure > - Albert Review 2 > - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 > - Albert Review > - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1_globals.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - ... and 9 more: https://git.openjdk.org/jdk/compare/cd28e7cc...554b7f52 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 293: > 291: > 292: uint num_added_to_group = 0; > 293: // ids 0 and 1 are reserved for region default group and young regions group respectively. I think this comment should not be here but at the `gid` member. Also, what is a "region default group"? src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 76: > 74: size_t _reclaimable_bytes; > 75: double _gc_efficiency; > 76: const uint _gid; Please comment what this is as `gid` is not as obvious as the other members. Also not sure if it isn't better to just write out `_group_id`. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 138: > 136: > 137: // Delete all groups from the list. The cardset cleanup for regions within the > 138: // groups could have been done elsewhere (e.g. when adding groups to the Suggestion: // groups could have been done elsewhere (e.g. when adding groups to the src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 201: > 199: G1CSetCandidateGroupList _from_marking_groups; // Set of regions selected by concurrent marking. > 200: // Set of regions retained due to evacuation failure. Groups added to this list > 201: // should contain only one region, making it easier to evacuate retained regions Suggestion: // should contain only one region each, making it easier to evacuate retained regions ------------- PR Review: https://git.openjdk.org/jdk/pull/22015#pullrequestreview-2501799826 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1883615763 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1883612275 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1883613041 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1883614098 From tschatzl at openjdk.org Fri Dec 13 09:27:44 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Dec 2024 09:27:44 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v4] In-Reply-To: <1NPhKMIwfLaQe7sm34Mi6aDIfXavGGCGniWkUgOlgbs=.51f1c252-9616-4985-993b-415e1f5d90e8@github.com> References: <1NPhKMIwfLaQe7sm34Mi6aDIfXavGGCGniWkUgOlgbs=.51f1c252-9616-4985-993b-415e1f5d90e8@github.com> Message-ID: <2WEjoq-aU8y8ZOjbDTw8TbIJpfRebzLs38Mla1EKf2I=.1bb5669e-0fac-4a40-9913-892e420f3872@github.com> On Fri, 6 Dec 2024 21:46:08 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> Albert Review > > src/hotspot/share/gc/g1/g1CollectionSet.cpp line 655: > >> 653: G1HeapRegion* r = ci._r; >> 654: r->uninstall_group_cardset(); >> 655: r->rem_set()->set_state_complete(); > > Why changing the remset state here? I'd expect it's already complete; otherwise, how can it be added to cset? Maybe change to assert? > src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 38: > >> 36: { } >> 37: >> 38: void G1CSetCandidateGroup::add(G1HeapRegion* hr) { > > I believe this method is only for retained regions; if so, one can make that explicit by naming it sth like `add_region_region`. (Probably `add_retained_region` was meant here?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1883608612 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1883620757 From iwalulya at openjdk.org Fri Dec 13 09:35:38 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 13 Dec 2024 09:35:38 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v2] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 01:29:13 GMT, Kim Barrett wrote: >> Please review this change that introduces two new helper classes to simplify >> the usage of PartialArrayStates to manage splitting the processing of large >> object arrays into parallelizable chunks. G1 and Parallel young GCs are >> changed to use this new mechanism. >> >> PartialArrayTaskStats is used to collect and report statistics related to >> array splitting. It replaces the direct implementation in PSPromotionManager, >> and is now also used by G1 young GCs. >> >> PartialArraySplitter packages up most of the work involved in splitting and >> processing tasks. It provides task allocation and release, enqueuing, chunk >> claiming, and statistics tracking. It does this by encapsulating existing >> objects and functionality. Using array splitting is mostly reduced to calling >> the splitter's start function and then calling it's step function to process >> partial states. This substantially reduces the amount of code for each client >> to perform this work. >> >> Testing: mach5 tier1-5 >> >> Manually ran some test programs with each of G1 and Parallel, with taskqueue >> stats logging enabled, and checked that the logged statistics looked okay. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into pa-splitter > - parallel uses PartialArraySplitter > - g1 uses PartialArraySplitter > - add PartialArraySplitter > - add PartialArrayTaskStats LGTM! Minor nit: src/hotspot/share/gc/shared/partialArraySplitter.inline.hpp line 62: > 60: template > 61: PartialArraySplitter::Step > 62: PartialArraySplitter::step(PartialArrayState* state, Queue* queue, bool stolen) { Probably easier to read if we rename to `claim`, step is used as noun in many other places ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22622#pullrequestreview-2501832505 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1883630926 From iwalulya at openjdk.org Fri Dec 13 11:34:05 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 13 Dec 2024 11:34:05 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v6] In-Reply-To: References: Message-ID: <6N4Q4sbW-7gWcj4yC2-q3D2wLlZleqYGtGtowpM6L8Y=.d1dda02d-2414-404d-b702-6d4734fda89f@github.com> > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision: - Thomas Review - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - use reset_table_scanner_for_groups - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - Print Group details in G1PrintRegionLivenessInfoClosure - Albert Review 2 - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - Albert Review - Update src/hotspot/share/gc/g1/g1CollectionSet.cpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update src/hotspot/share/gc/g1/g1_globals.hpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - ... and 11 more: https://git.openjdk.org/jdk/compare/abea652d...e573b82d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/554b7f52..e573b82d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=04-05 Stats: 8943 lines in 1380 files changed: 4377 ins; 1550 del; 3016 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From ayang at openjdk.org Fri Dec 13 11:48:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 13 Dec 2024 11:48:43 GMT Subject: Integrated: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 12:04:20 GMT, Albert Mingkun Yang wrote: > This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). > > Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. This pull request has now been integrated. Changeset: a9a5f7cb Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/a9a5f7cb0a75b82d613ecd9018e13e5337e90363 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully Reviewed-by: sjohanss, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/22575 From ayang at openjdk.org Fri Dec 13 11:48:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 13 Dec 2024 11:48:43 GMT Subject: RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully [v3] In-Reply-To: <5aDaq0UXEwi2Cc231RS7leEJ-CI6YQ5eEeLMTBsMLVA=.9e699229-7bec-4928-937f-351e45aa2391@github.com> References: <5aDaq0UXEwi2Cc231RS7leEJ-CI6YQ5eEeLMTBsMLVA=.9e699229-7bec-4928-937f-351e45aa2391@github.com> Message-ID: On Wed, 11 Dec 2024 15:05:50 GMT, Albert Mingkun Yang wrote: >> This patch reverts the default value of `OldSize` to its previous setting prior to being obsoleted in [JDK-8333962](https://bugs.openjdk.org/browse/JDK-8333962). The change addresses an issue where `OldSize` being set to zero results in a default `MinHeapSize` that is too small to handle LargePages correctly. This problem is exemplified by `ParallelArguments::initialize_heap_flags_and_sizes`, as identified in [JDK-8345323](https://bugs.openjdk.org/browse/JDK-8345323). >> >> Changing the default value of `OldSize` may have broader implications due to the complexity of the logic that determines default values for various flags. Altering one default can lead to cascading effects and potential breakages elsewhere. For these reasons, this patch restores the previous default value of `OldSize` to mitigate such risks. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > refer to the new ticket Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22575#issuecomment-2541263375 From iwalulya at openjdk.org Fri Dec 13 11:57:21 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 13 Dec 2024 11:57:21 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v7] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request incrementally with two additional commits since the last revision: - cleanup - assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/e573b82d..f0dce79e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=05-06 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From ayang at openjdk.org Fri Dec 13 12:09:14 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 13 Dec 2024 12:09:14 GMT Subject: [jdk24] RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully Message-ID: This is a clean backport to JDK 24. ------------- Commit messages: - Backport a9a5f7cb0a75b82d613ecd9018e13e5337e90363 Changes: https://git.openjdk.org/jdk/pull/22733/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22733&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345323 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22733.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22733/head:pull/22733 PR: https://git.openjdk.org/jdk/pull/22733 From iwalulya at openjdk.org Fri Dec 13 12:12:37 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 13 Dec 2024 12:12:37 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v4] In-Reply-To: <2WEjoq-aU8y8ZOjbDTw8TbIJpfRebzLs38Mla1EKf2I=.1bb5669e-0fac-4a40-9913-892e420f3872@github.com> References: <1NPhKMIwfLaQe7sm34Mi6aDIfXavGGCGniWkUgOlgbs=.51f1c252-9616-4985-993b-415e1f5d90e8@github.com> <2WEjoq-aU8y8ZOjbDTw8TbIJpfRebzLs38Mla1EKf2I=.1bb5669e-0fac-4a40-9913-892e420f3872@github.com> Message-ID: On Fri, 13 Dec 2024 09:24:07 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 38: >> >>> 36: { } >>> 37: >>> 38: void G1CSetCandidateGroup::add(G1HeapRegion* hr) { >> >> I believe this method is only for retained regions; if so, one can make that explicit by naming it sth like `add_region_region`. > > (Probably `add_retained_region` was meant here?) Currently, we are using the method for adding young regions too ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1883841995 From iwalulya at openjdk.org Fri Dec 13 12:47:54 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 13 Dec 2024 12:47:54 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v8] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: fix space issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/f0dce79e..ff5e9e04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=06-07 Stats: 8 lines in 2 files changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From tschatzl at openjdk.org Fri Dec 13 12:53:40 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Dec 2024 12:53:40 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v8] In-Reply-To: References: Message-ID: <1gi3UivZkQDf5-YzL3yaGCaT2pf9J0PwIoyLLypGC1w=.1365959b-671d-4802-a1cc-9ef38f60f47f@github.com> On Fri, 13 Dec 2024 12:47:54 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > fix space issues Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22015#pullrequestreview-2502230027 From wkemper at openjdk.org Fri Dec 13 17:44:59 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Dec 2024 17:44:59 GMT Subject: Integrated: 8345970: pthread_getcpuclockid related crashes in shenandoah tests In-Reply-To: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> References: <1y0l9ZPDQ_8jXY7DBgOfTuxd5tPIIZZZ-ZghysBEqGM=.04c92181-0fd9-4ca8-9d55-ab0e519932bf@github.com> Message-ID: On Wed, 11 Dec 2024 22:32:00 GMT, William Kemper wrote: > I haven't seen this failure mode in our Alpine Linux test pipelines, but the suggestion to avoid getting cpu time for terminated threads sounds sensible. This pull request has now been integrated. Changeset: 2ce53e88 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/2ce53e88481659734bc5424c643c5e31c116bc5d Stats: 18 lines in 4 files changed: 15 ins; 3 del; 0 mod 8345970: pthread_getcpuclockid related crashes in shenandoah tests Reviewed-by: ysr ------------- PR: https://git.openjdk.org/jdk/pull/22693 From kbarrett at openjdk.org Fri Dec 13 19:32:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 13 Dec 2024 19:32:37 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v2] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 08:44:13 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into pa-splitter >> - parallel uses PartialArraySplitter >> - g1 uses PartialArraySplitter >> - add PartialArraySplitter >> - add PartialArrayTaskStats > > src/hotspot/share/gc/shared/partialArraySplitter.inline.hpp line 45: > >> 43: PartialArrayTaskStepper::Step step = _stepper.start(length); >> 44: // Push any needed partial scan tasks. Pushed before processing the initial >> 45: // chunk to allow other workers to steal while we're processing. > > This comment (last two lines) now imo better belongs to where this method is called. Same with similar comment in `step()`. I was going to suggest the comment does belong here, but could perhaps be written more clearly. But on further consideration, I don't think this comment is needed at all. That behavior is the whole point of the splitter class, as somewhat discussed in the comments in the header. I've expanded the comments there to be more explicit. Also, I really don't want to need to be adding comments about this to each current and future caller. Part of the point of this class is to minimize the amount of duplication among clients, and needing (near) duplicated comments would count against that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1884397894 From kbarrett at openjdk.org Fri Dec 13 19:37:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 13 Dec 2024 19:37:39 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v2] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 09:31:10 GMT, Ivan Walulya wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into pa-splitter >> - parallel uses PartialArraySplitter >> - g1 uses PartialArraySplitter >> - add PartialArraySplitter >> - add PartialArrayTaskStats > > src/hotspot/share/gc/shared/partialArraySplitter.inline.hpp line 62: > >> 60: template >> 61: PartialArraySplitter::Step >> 62: PartialArraySplitter::step(PartialArrayState* state, Queue* queue, bool stolen) { > > Probably easier to read if we rename to `claim`, step is used as noun in many other places I like the suggested name change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1884408963 From kbarrett at openjdk.org Fri Dec 13 22:24:07 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 13 Dec 2024 22:24:07 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v3] In-Reply-To: References: Message-ID: > Please review this change that introduces two new helper classes to simplify > the usage of PartialArrayStates to manage splitting the processing of large > object arrays into parallelizable chunks. G1 and Parallel young GCs are > changed to use this new mechanism. > > PartialArrayTaskStats is used to collect and report statistics related to > array splitting. It replaces the direct implementation in PSPromotionManager, > and is now also used by G1 young GCs. > > PartialArraySplitter packages up most of the work involved in splitting and > processing tasks. It provides task allocation and release, enqueuing, chunk > claiming, and statistics tracking. It does this by encapsulating existing > objects and functionality. Using array splitting is mostly reduced to calling > the splitter's start function and then calling it's step function to process > partial states. This substantially reduces the amount of code for each client > to perform this work. > > Testing: mach5 tier1-5 > > Manually ran some test programs with each of G1 and Parallel, with taskqueue > stats logging enabled, and checked that the logged statistics looked okay. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into pa-splitter - rename splitter.step() => claim() - simplify comments - Merge branch 'master' into pa-splitter - parallel uses PartialArraySplitter - g1 uses PartialArraySplitter - add PartialArraySplitter - add PartialArrayTaskStats ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22622/files - new: https://git.openjdk.org/jdk/pull/22622/files/b0ea3f51..cb70d7b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22622&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22622&range=01-02 Stats: 265 lines in 36 files changed: 116 ins; 64 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/22622.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22622/head:pull/22622 PR: https://git.openjdk.org/jdk/pull/22622 From kbarrett at openjdk.org Sat Dec 14 01:51:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 14 Dec 2024 01:51:42 GMT Subject: RFR: 8346139: test_memset_with_concurrent_readers.cpp should not use In-Reply-To: <74cghjPWtL2xfedEOINfJP948Z46eM1lSlSLjNO_qiI=.8d6508aa-244a-4f15-9fc3-e3ede6e674e1@github.com> References: <74cghjPWtL2xfedEOINfJP948Z46eM1lSlSLjNO_qiI=.8d6508aa-244a-4f15-9fc3-e3ede6e674e1@github.com> Message-ID: On Fri, 13 Dec 2024 07:56:48 GMT, Stefan Karlsson wrote: >> Please review this change to test_memset_with_concurrent_readers.cpp to use >> HotSpot's stringStream instead of std::string_stream to build the error >> message when a failure is detected. >> >> Also removed the include of , which is one of the standard headers >> we expect to be included by globalDefinitions. >> >> Testing: mach5 tier1 >> Locally patched the test to fail and ran it, checking the output. > > Marked as reviewed by stefank (Reviewer). Thanks for reviews @stefank and @tschatzl ------------- PR Comment: https://git.openjdk.org/jdk/pull/22725#issuecomment-2542653506 From kbarrett at openjdk.org Sat Dec 14 01:51:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 14 Dec 2024 01:51:43 GMT Subject: Integrated: 8346139: test_memset_with_concurrent_readers.cpp should not use In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 00:22:28 GMT, Kim Barrett wrote: > Please review this change to test_memset_with_concurrent_readers.cpp to use > HotSpot's stringStream instead of std::string_stream to build the error > message when a failure is detected. > > Also removed the include of , which is one of the standard headers > we expect to be included by globalDefinitions. > > Testing: mach5 tier1 > Locally patched the test to fail and ran it, checking the output. This pull request has now been integrated. Changeset: ebb27c2e Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/ebb27c2e8f47d35d4f030cca4126c39e24d456bd Stats: 23 lines in 1 file changed: 2 ins; 6 del; 15 mod 8346139: test_memset_with_concurrent_readers.cpp should not use Reviewed-by: stefank, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/22725 From zgu at openjdk.org Sun Dec 15 18:14:35 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Sun, 15 Dec 2024 18:14:35 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v3] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 22:24:07 GMT, Kim Barrett wrote: >> Please review this change that introduces two new helper classes to simplify >> the usage of PartialArrayStates to manage splitting the processing of large >> object arrays into parallelizable chunks. G1 and Parallel young GCs are >> changed to use this new mechanism. >> >> PartialArrayTaskStats is used to collect and report statistics related to >> array splitting. It replaces the direct implementation in PSPromotionManager, >> and is now also used by G1 young GCs. >> >> PartialArraySplitter packages up most of the work involved in splitting and >> processing tasks. It provides task allocation and release, enqueuing, chunk >> claiming, and statistics tracking. It does this by encapsulating existing >> objects and functionality. Using array splitting is mostly reduced to calling >> the splitter's start function and then calling it's step function to process >> partial states. This substantially reduces the amount of code for each client >> to perform this work. >> >> Testing: mach5 tier1-5 >> >> Manually ran some test programs with each of G1 and Parallel, with taskqueue >> stats logging enabled, and checked that the logged statistics looked okay. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into pa-splitter > - rename splitter.step() => claim() > - simplify comments > - Merge branch 'master' into pa-splitter > - parallel uses PartialArraySplitter > - g1 uses PartialArraySplitter > - add PartialArraySplitter > - add PartialArrayTaskStats src/hotspot/share/utilities/macros.hpp line 375: > 373: #define TASKQUEUE_STATS_ONLY(code) > 374: #endif // TASKQUEUE_STATS > 375: Duplicated definition in `TaskQueue.hpp` https://github.com/openjdk/jdk/blob/ab1dbd4089a1a15bdf1b6b39994d5b1faacc40ab/src/hotspot/share/gc/shared/taskqueue.hpp#L39-51 should be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1885792871 From kbarrett at openjdk.org Sun Dec 15 22:32:36 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 15 Dec 2024 22:32:36 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v3] In-Reply-To: References: Message-ID: On Sun, 15 Dec 2024 18:12:20 GMT, Zhengyu Gu wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into pa-splitter >> - rename splitter.step() => claim() >> - simplify comments >> - Merge branch 'master' into pa-splitter >> - parallel uses PartialArraySplitter >> - g1 uses PartialArraySplitter >> - add PartialArraySplitter >> - add PartialArrayTaskStats > > src/hotspot/share/utilities/macros.hpp line 375: > >> 373: #define TASKQUEUE_STATS_ONLY(code) >> 374: #endif // TASKQUEUE_STATS >> 375: > > Duplicated definition in `TaskQueue.hpp` > https://github.com/openjdk/jdk/blob/ab1dbd4089a1a15bdf1b6b39994d5b1faacc40ab/src/hotspot/share/gc/shared/taskqueue.hpp#L39-51 should be removed. Oops. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1885908376 From sjohanss at openjdk.org Mon Dec 16 10:15:35 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 16 Dec 2024 10:15:35 GMT Subject: [jdk24] RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 12:03:14 GMT, Albert Mingkun Yang wrote: > This is a clean backport to JDK 24. Marked as reviewed by sjohanss (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22733#pullrequestreview-2505734487 From rcastanedalo at openjdk.org Mon Dec 16 12:52:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Dec 2024 12:52:07 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks Message-ID: This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: - the main loop is never unrolled regardless of the selected GC algorithm, - no spilling occurs within the main loop for the final C2 compilation, and - the majority of the execution time is spent in the write operation and its associated barrier. The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). ------------- Commit messages: - Allow inlining, get rid of reads - Add tentative testArrayWriteBarrierFastPathRealLarge version with a single, fixed new value - Avoid loads and range checks in null-writing micro-benchmarks - Do not inline array micro-benchmarks to avoid spilling in the innermost loop - Disable loop unrolling - Update copyright Changes: https://git.openjdk.org/jdk/pull/22763/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22763&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344951 Stats: 21 lines in 1 file changed: 5 ins; 10 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22763.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22763/head:pull/22763 PR: https://git.openjdk.org/jdk/pull/22763 From ayang at openjdk.org Mon Dec 16 15:00:53 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 16 Dec 2024 15:00:53 GMT Subject: [jdk24] RFR: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 12:03:14 GMT, Albert Mingkun Yang wrote: > This is a clean backport to JDK 24. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22733#issuecomment-2545846375 From ayang at openjdk.org Mon Dec 16 15:00:54 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 16 Dec 2024 15:00:54 GMT Subject: [jdk24] Integrated: 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 12:03:14 GMT, Albert Mingkun Yang wrote: > This is a clean backport to JDK 24. This pull request has now been integrated. Changeset: 297b21fb Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/297b21fb60100ff132468cc8f110f353def95a44 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8345323: Parallel GC does not handle UseLargePages and UseNUMA gracefully Reviewed-by: sjohanss Backport-of: a9a5f7cb0a75b82d613ecd9018e13e5337e90363 ------------- PR: https://git.openjdk.org/jdk/pull/22733 From kvn at openjdk.org Mon Dec 16 18:58:43 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Dec 2024 18:58:43 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 12:35:33 GMT, Roberto Casta?eda Lozano wrote: > This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: > > - the main loop is never unrolled regardless of the selected GC algorithm, > - no spilling occurs within the main loop for the final C2 compilation, and > - the majority of the execution time is spent in the write operation and its associated barrier. > > The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. > > Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. > > **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). Is it possible to have 2 runs: one with default `LoopUnrollLimit` and an other as you set. ------------- PR Review: https://git.openjdk.org/jdk/pull/22763#pullrequestreview-2507019577 From rcastanedalo at openjdk.org Tue Dec 17 09:01:14 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Dec 2024 09:01:14 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks [v2] In-Reply-To: References: Message-ID: > This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: > > - the main loop is never unrolled regardless of the selected GC algorithm, > - no spilling occurs within the main loop for the final C2 compilation, and > - the majority of the execution time is spent in the write operation and its associated barrier. > > The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. > > Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. > > **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Add a default run without JVM arguments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22763/files - new: https://git.openjdk.org/jdk/pull/22763/files/61875050..25a24bcc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22763&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22763&range=00-01 Stats: 13 lines in 1 file changed: 11 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22763.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22763/head:pull/22763 PR: https://git.openjdk.org/jdk/pull/22763 From rcastanedalo at openjdk.org Tue Dec 17 09:01:14 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Dec 2024 09:01:14 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks [v2] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 18:55:56 GMT, Vladimir Kozlov wrote: > Is it possible to have 2 runs: one with default `LoopUnrollLimit` and an other as you set. Done (commit 25a24bcc). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22763#issuecomment-2547855713 From iwalulya at openjdk.org Tue Dec 17 10:03:03 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 17 Dec 2024 10:03:03 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v9] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - fix type - fix space issues - cleanup - assert - Thomas Review - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - use reset_table_scanner_for_groups - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - Print Group details in G1PrintRegionLivenessInfoClosure - ... and 16 more: https://git.openjdk.org/jdk/compare/c071b504...6194442d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/ff5e9e04..6194442d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=07-08 Stats: 7602 lines in 316 files changed: 5796 ins; 754 del; 1052 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From rcastanedalo at openjdk.org Tue Dec 17 12:43:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Dec 2024 12:43:13 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks [v3] In-Reply-To: References: Message-ID: > This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: > > - the main loop is never unrolled regardless of the selected GC algorithm, > - no spilling occurs within the main loop for the final C2 compilation, and > - the majority of the execution time is spent in the write operation and its associated barrier. > > The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. > > Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. > > **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Disable inlining again for better stability w.r.t. spilling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22763/files - new: https://git.openjdk.org/jdk/pull/22763/files/25a24bcc..20817324 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22763&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22763&range=01-02 Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22763.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22763/head:pull/22763 PR: https://git.openjdk.org/jdk/pull/22763 From rcastanedalo at openjdk.org Tue Dec 17 12:49:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Dec 2024 12:49:37 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks [v3] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 12:43:13 GMT, Roberto Casta?eda Lozano wrote: >> This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: >> >> - the main loop is never unrolled regardless of the selected GC algorithm, >> - no spilling occurs within the main loop for the final C2 compilation, and >> - the majority of the execution time is spent in the write operation and its associated barrier. >> >> The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. >> >> Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. >> >> **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Disable inlining again for better stability w.r.t. spilling Turns out that spilling did occur in the main loop of the `testArrayWriteBarrierFastPath*` micros for G1 on x64, but only when LinuxPerfAsmProfiler is disabled (!). The latest commit (20817324) ensures no spilling happens within the main loop (for G1, Serial, Parallel, and Z on x64), regardless of whether profiling is enabled, by disabling inlining the benchmarks into their caller JMH-generated harness. Thanks to Thomas Schatzl for pointing out the issue and helping reproduce it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22763#issuecomment-2548370964 From ayang at openjdk.org Tue Dec 17 15:23:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 17 Dec 2024 15:23:43 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v3] In-Reply-To: References: Message-ID: <_jFMtUcubatsffT1v8RF6R26bBfv-l7fBjzmmQsfOFI=.7a9d462c-b767-44a7-ae6e-ab0df24eb022@github.com> On Fri, 13 Dec 2024 22:24:07 GMT, Kim Barrett wrote: >> Please review this change that introduces two new helper classes to simplify >> the usage of PartialArrayStates to manage splitting the processing of large >> object arrays into parallelizable chunks. G1 and Parallel young GCs are >> changed to use this new mechanism. >> >> PartialArrayTaskStats is used to collect and report statistics related to >> array splitting. It replaces the direct implementation in PSPromotionManager, >> and is now also used by G1 young GCs. >> >> PartialArraySplitter packages up most of the work involved in splitting and >> processing tasks. It provides task allocation and release, enqueuing, chunk >> claiming, and statistics tracking. It does this by encapsulating existing >> objects and functionality. Using array splitting is mostly reduced to calling >> the splitter's start function and then calling it's step function to process >> partial states. This substantially reduces the amount of code for each client >> to perform this work. >> >> Testing: mach5 tier1-5 >> >> Manually ran some test programs with each of G1 and Parallel, with taskqueue >> stats logging enabled, and checked that the logged statistics looked okay. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into pa-splitter > - rename splitter.step() => claim() > - simplify comments > - Merge branch 'master' into pa-splitter > - parallel uses PartialArraySplitter > - g1 uses PartialArraySplitter > - add PartialArraySplitter > - add PartialArrayTaskStats src/hotspot/share/gc/shared/partialArraySplitter.hpp line 46: > 44: > 45: public: > 46: explicit PartialArraySplitter(PartialArrayStateManager* manager, Why `explicit` for a method that has two args. src/hotspot/share/gc/shared/partialArrayTaskStats.hpp line 77: > 75: void inc_pushed(size_t n = 1) { _pushed += n; } > 76: void inc_stolen(size_t n = 1) { _stolen += n; } > 77: void inc_processed(size_t n = 1) { _processed += n; } I skimmed through callers of these, but can't find a strong reason to use default-arg-value here. Will there be more call-sites that justify this usage? src/hotspot/share/gc/shared/partialArrayTaskStats.hpp line 96: > 94: > 95: template > 96: void PartialArrayTaskStats::log_set(uint num_stats, Can this be merged with its declaration? Seems kind of odd that these duplicates (method signature) are next to each other. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1888693312 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1888684891 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1888690919 From kbarrett at openjdk.org Tue Dec 17 17:15:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 17 Dec 2024 17:15:39 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v3] In-Reply-To: <_jFMtUcubatsffT1v8RF6R26bBfv-l7fBjzmmQsfOFI=.7a9d462c-b767-44a7-ae6e-ab0df24eb022@github.com> References: <_jFMtUcubatsffT1v8RF6R26bBfv-l7fBjzmmQsfOFI=.7a9d462c-b767-44a7-ae6e-ab0df24eb022@github.com> Message-ID: <3hbk0a1HRqX-AE8_CrNfndkuJXzVMZGOMb8S1qDuP7M=.442b3022-4b98-4acc-a27c-9c8210779b04@github.com> On Tue, 17 Dec 2024 15:18:28 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into pa-splitter >> - rename splitter.step() => claim() >> - simplify comments >> - Merge branch 'master' into pa-splitter >> - parallel uses PartialArraySplitter >> - g1 uses PartialArraySplitter >> - add PartialArraySplitter >> - add PartialArrayTaskStats > > src/hotspot/share/gc/shared/partialArraySplitter.hpp line 46: > >> 44: >> 45: public: >> 46: explicit PartialArraySplitter(PartialArrayStateManager* manager, > > Why `explicit` for a method that has two args. Forgot to remove when 2nd argument added. Originally that number from the manager, but a potentially long-lived and reused manager with dynamic selection of worker threads made that wrong. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1888914756 From kbarrett at openjdk.org Tue Dec 17 17:19:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 17 Dec 2024 17:19:38 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v3] In-Reply-To: <_jFMtUcubatsffT1v8RF6R26bBfv-l7fBjzmmQsfOFI=.7a9d462c-b767-44a7-ae6e-ab0df24eb022@github.com> References: <_jFMtUcubatsffT1v8RF6R26bBfv-l7fBjzmmQsfOFI=.7a9d462c-b767-44a7-ae6e-ab0df24eb022@github.com> Message-ID: On Tue, 17 Dec 2024 15:13:33 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into pa-splitter >> - rename splitter.step() => claim() >> - simplify comments >> - Merge branch 'master' into pa-splitter >> - parallel uses PartialArraySplitter >> - g1 uses PartialArraySplitter >> - add PartialArraySplitter >> - add PartialArrayTaskStats > > src/hotspot/share/gc/shared/partialArrayTaskStats.hpp line 77: > >> 75: void inc_pushed(size_t n = 1) { _pushed += n; } >> 76: void inc_stolen(size_t n = 1) { _stolen += n; } >> 77: void inc_processed(size_t n = 1) { _processed += n; } > > I skimmed through callers of these, but can't find a strong reason to use default-arg-value here. Will there be more call-sites that justify this usage? Currently, inc_pushed needs an argument while others don't. Given this stats object is likely mostly encapsulated in and modified by the splitter object, that might always be the case for these functions. Though consistency has some benefit, maybe not here? I'll wire in the usage, and we can adjust later if needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1888919697 From kbarrett at openjdk.org Tue Dec 17 17:35:48 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 17 Dec 2024 17:35:48 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v3] In-Reply-To: <_jFMtUcubatsffT1v8RF6R26bBfv-l7fBjzmmQsfOFI=.7a9d462c-b767-44a7-ae6e-ab0df24eb022@github.com> References: <_jFMtUcubatsffT1v8RF6R26bBfv-l7fBjzmmQsfOFI=.7a9d462c-b767-44a7-ae6e-ab0df24eb022@github.com> Message-ID: On Tue, 17 Dec 2024 15:17:04 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into pa-splitter >> - rename splitter.step() => claim() >> - simplify comments >> - Merge branch 'master' into pa-splitter >> - parallel uses PartialArraySplitter >> - g1 uses PartialArraySplitter >> - add PartialArraySplitter >> - add PartialArrayTaskStats > > src/hotspot/share/gc/shared/partialArrayTaskStats.hpp line 96: > >> 94: >> 95: template >> 96: void PartialArrayTaskStats::log_set(uint num_stats, > > Can this be merged with its declaration? Seems kind of odd that these duplicates (method signature) are next to each other. That would implicitly declare it inline, which doesn't seem particularly desirable here. And it doesn't seem worth the overhead of splitting out into a .inline.hpp file. (That would let the logging includes be moved there, rather than here in the .hpp file. But that seems like a small benefit, since I don't think there are going to be *that* many includes of this file.) But the implicit inlining probably doesn't really matter after all, since the access function is probably different in every use, so we'll so we'll have 1-1 uses to instantiations anyway. So sure, merging. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1888942745 From kvn at openjdk.org Tue Dec 17 17:40:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Dec 2024 17:40:40 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks [v3] In-Reply-To: References: Message-ID: <8xh5AFbV6Rf_SicNtgWCj0iYgWq0np1h5OPHw2_a9ps=.3b76fc25-8cad-4afc-9723-f351812e0a64@github.com> On Tue, 17 Dec 2024 12:43:13 GMT, Roberto Casta?eda Lozano wrote: >> This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: >> >> - the main loop is never unrolled regardless of the selected GC algorithm, >> - no spilling occurs within the main loop for the final C2 compilation, and >> - the majority of the execution time is spent in the write operation and its associated barrier. >> >> The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. >> >> Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. >> >> **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Disable inlining again for better stability w.r.t. spilling Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22763#pullrequestreview-2509614994 From rcastanedalo at openjdk.org Tue Dec 17 18:01:40 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Dec 2024 18:01:40 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks [v3] In-Reply-To: <8xh5AFbV6Rf_SicNtgWCj0iYgWq0np1h5OPHw2_a9ps=.3b76fc25-8cad-4afc-9723-f351812e0a64@github.com> References: <8xh5AFbV6Rf_SicNtgWCj0iYgWq0np1h5OPHw2_a9ps=.3b76fc25-8cad-4afc-9723-f351812e0a64@github.com> Message-ID: On Tue, 17 Dec 2024 17:38:19 GMT, Vladimir Kozlov wrote: > Good. Thanks for reviewing, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22763#issuecomment-2549205044 From kbarrett at openjdk.org Tue Dec 17 18:08:23 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 17 Dec 2024 18:08:23 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v4] In-Reply-To: References: Message-ID: > Please review this change that introduces two new helper classes to simplify > the usage of PartialArrayStates to manage splitting the processing of large > object arrays into parallelizable chunks. G1 and Parallel young GCs are > changed to use this new mechanism. > > PartialArrayTaskStats is used to collect and report statistics related to > array splitting. It replaces the direct implementation in PSPromotionManager, > and is now also used by G1 young GCs. > > PartialArraySplitter packages up most of the work involved in splitting and > processing tasks. It provides task allocation and release, enqueuing, chunk > claiming, and statistics tracking. It does this by encapsulating existing > objects and functionality. Using array splitting is mostly reduced to calling > the splitter's start function and then calling it's step function to process > partial states. This substantially reduces the amount of code for each client > to perform this work. > > Testing: mach5 tier1-5 > > Manually ran some test programs with each of G1 and Parallel, with taskqueue > stats logging enabled, and checked that the logged statistics looked okay. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Merge branch 'master' into pa-splitter - merge log_set decl/defn - remove default counts for stats incrementers - remove uneeded 'explicit' - cleanup unneeded includes - remove moved-from macro defines - Merge branch 'master' into pa-splitter - rename splitter.step() => claim() - simplify comments - Merge branch 'master' into pa-splitter - ... and 4 more: https://git.openjdk.org/jdk/compare/b7c9006e...54c37988 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22622/files - new: https://git.openjdk.org/jdk/pull/22622/files/cb70d7b3..54c37988 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22622&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22622&range=02-03 Stats: 7934 lines in 322 files changed: 5981 ins; 859 del; 1094 mod Patch: https://git.openjdk.org/jdk/pull/22622.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22622/head:pull/22622 PR: https://git.openjdk.org/jdk/pull/22622 From tschatzl at openjdk.org Tue Dec 17 18:50:36 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 17 Dec 2024 18:50:36 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks [v3] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 12:43:13 GMT, Roberto Casta?eda Lozano wrote: >> This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: >> >> - the main loop is never unrolled regardless of the selected GC algorithm, >> - no spilling occurs within the main loop for the final C2 compilation, and >> - the majority of the execution time is spent in the write operation and its associated barrier. >> >> The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. >> >> Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. >> >> **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Disable inlining again for better stability w.r.t. spilling Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22763#pullrequestreview-2509795740 From ayang at openjdk.org Tue Dec 17 19:49:40 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 17 Dec 2024 19:49:40 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v8] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 12:47:54 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > fix space issues src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 42: > 40: struct G1CollectionSetCandidateInfo { > 41: G1HeapRegion* _r; > 42: double _gc_efficiency; Seems that this field has become unused. src/hotspot/share/gc/g1/g1HeapRegion.cpp line 155: > 153: // rely on the predition for this region. > 154: if (_rem_set->is_added_to_cset_group() && _rem_set->cset_group()->length() > 1) { > 155: return -1.0; I believe all special cases logic (returning -1`) in this method belong to the caller, `G1PrintRegionLivenessInfoClosure`, where we branch use `if(gc_eff < 0) {`. src/hotspot/share/gc/g1/g1HeapRegionRemSet.hpp line 56: > 54: // nullptr guards before every use of _cset_group. > 55: G1CSetCandidateGroup* _default_cset_group; > 56: G1CSetCandidateGroup* _cset_group; As I understand it, only one of these two fields contains the real group. I don't get why we need null-checks if only `_cset_group` is there. Whenever we work with `_cset_group`, we should know whether it's null or not already depending on the call-site. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1888225888 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1888210762 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1888190793 From ayang at openjdk.org Tue Dec 17 20:14:42 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 17 Dec 2024 20:14:42 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v4] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 18:08:23 GMT, Kim Barrett wrote: >> Please review this change that introduces two new helper classes to simplify >> the usage of PartialArrayStates to manage splitting the processing of large >> object arrays into parallelizable chunks. G1 and Parallel young GCs are >> changed to use this new mechanism. >> >> PartialArrayTaskStats is used to collect and report statistics related to >> array splitting. It replaces the direct implementation in PSPromotionManager, >> and is now also used by G1 young GCs. >> >> PartialArraySplitter packages up most of the work involved in splitting and >> processing tasks. It provides task allocation and release, enqueuing, chunk >> claiming, and statistics tracking. It does this by encapsulating existing >> objects and functionality. Using array splitting is mostly reduced to calling >> the splitter's start function and then calling it's step function to process >> partial states. This substantially reduces the amount of code for each client >> to perform this work. >> >> Testing: mach5 tier1-5 >> >> Manually ran some test programs with each of G1 and Parallel, with taskqueue >> stats logging enabled, and checked that the logged statistics looked okay. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into pa-splitter > - merge log_set decl/defn > - remove default counts for stats incrementers > - remove uneeded 'explicit' > - cleanup unneeded includes > - remove moved-from macro defines > - Merge branch 'master' into pa-splitter > - rename splitter.step() => claim() > - simplify comments > - Merge branch 'master' into pa-splitter > - ... and 4 more: https://git.openjdk.org/jdk/compare/999d9cc9...54c37988 Some minor suggestion. src/hotspot/share/gc/shared/partialArraySplitter.hpp line 81: > 79: // Result type for claim(), carrying multiple values. Provides the claimed > 80: // chunk's start and end array indices. > 81: struct Claim { I feel `Chunk` is a better name. src/hotspot/share/gc/shared/partialArraySplitter.inline.hpp line 63: > 61: PartialArraySplitter::claim(PartialArrayState* state, Queue* queue, bool stolen) { > 62: #if TASKQUEUE_STATS > 63: if (stolen) _stats.inc_stolen(); Breaking it into multiple lines make the control flow more explicit. src/hotspot/share/gc/shared/partialArrayTaskStats.cpp line 49: > 47: > 48: void PartialArrayTaskStats::reset() { > 49: *this = PartialArrayTaskStats(); Can we do sth like `static_assert(std::is_trivially_copyable::value)` here? src/hotspot/share/gc/shared/partialArrayTaskStats.hpp line 90: > 88: // title: A string title for the table. > 89: template > 90: static void log_set(uint num_stats, StatsAccess access, const char* title) { Going through all its call sites, I believe `print_stats` is more readable. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22622#pullrequestreview-2509966063 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1889140069 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1889142484 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1889152438 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1889140874 From zgu at openjdk.org Wed Dec 18 01:09:37 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 18 Dec 2024 01:09:37 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v4] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 18:08:23 GMT, Kim Barrett wrote: >> Please review this change that introduces two new helper classes to simplify >> the usage of PartialArrayStates to manage splitting the processing of large >> object arrays into parallelizable chunks. G1 and Parallel young GCs are >> changed to use this new mechanism. >> >> PartialArrayTaskStats is used to collect and report statistics related to >> array splitting. It replaces the direct implementation in PSPromotionManager, >> and is now also used by G1 young GCs. >> >> PartialArraySplitter packages up most of the work involved in splitting and >> processing tasks. It provides task allocation and release, enqueuing, chunk >> claiming, and statistics tracking. It does this by encapsulating existing >> objects and functionality. Using array splitting is mostly reduced to calling >> the splitter's start function and then calling it's step function to process >> partial states. This substantially reduces the amount of code for each client >> to perform this work. >> >> Testing: mach5 tier1-5 >> >> Manually ran some test programs with each of G1 and Parallel, with taskqueue >> stats logging enabled, and checked that the logged statistics looked okay. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into pa-splitter > - merge log_set decl/defn > - remove default counts for stats incrementers > - remove uneeded 'explicit' > - cleanup unneeded includes > - remove moved-from macro defines > - Merge branch 'master' into pa-splitter > - rename splitter.step() => claim() > - simplify comments > - Merge branch 'master' into pa-splitter > - ... and 4 more: https://git.openjdk.org/jdk/compare/9fad115b...54c37988 LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22622#pullrequestreview-2510521893 From rcastanedalo at openjdk.org Wed Dec 18 07:53:42 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Dec 2024 07:53:42 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks [v3] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 12:43:13 GMT, Roberto Casta?eda Lozano wrote: >> This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: >> >> - the main loop is never unrolled regardless of the selected GC algorithm, >> - no spilling occurs within the main loop for the final C2 compilation, and >> - the majority of the execution time is spent in the write operation and its associated barrier. >> >> The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. >> >> Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. >> >> **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Disable inlining again for better stability w.r.t. spilling Thanks for reviewing, Thomas! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22763#issuecomment-2550592085 From rcastanedalo at openjdk.org Wed Dec 18 07:53:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Dec 2024 07:53:43 GMT Subject: Integrated: 8344951: Stabilize write barrier micro-benchmarks In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 12:35:33 GMT, Roberto Casta?eda Lozano wrote: > This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: > > - the main loop is never unrolled regardless of the selected GC algorithm, > - no spilling occurs within the main loop for the final C2 compilation, and > - the majority of the execution time is spent in the write operation and its associated barrier. > > The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. > > Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. > > **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). This pull request has now been integrated. Changeset: edbd76c6 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/edbd76c62482df31cf539672c6950f00121bcbf3 Stats: 42 lines in 1 file changed: 26 ins; 10 del; 6 mod 8344951: Stabilize write barrier micro-benchmarks Reviewed-by: kvn, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/22763 From zgu at openjdk.org Wed Dec 18 14:51:45 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 18 Dec 2024 14:51:45 GMT Subject: RFR: 8346569: Shenandoah: Worker initializes ShenandoahThreadLocalData twice results in memory leak Message-ID: Worker thread initializes ShenandoahThreadLocalData twice, from Thread's constructor and ShenandoahWorkerThreads::on_create_worker(), that results in leaking ShenandoahEvacuationStats. ------------- Commit messages: - 8346569: Shenandoah: Worker initializes ShenandoahThreadLocalData twice results in memory leak Changes: https://git.openjdk.org/jdk/pull/22812/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22812&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346569 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22812/head:pull/22812 PR: https://git.openjdk.org/jdk/pull/22812 From kbarrett at openjdk.org Wed Dec 18 16:59:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 18 Dec 2024 16:59:43 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v4] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 20:00:14 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: >> >> - Merge branch 'master' into pa-splitter >> - merge log_set decl/defn >> - remove default counts for stats incrementers >> - remove uneeded 'explicit' >> - cleanup unneeded includes >> - remove moved-from macro defines >> - Merge branch 'master' into pa-splitter >> - rename splitter.step() => claim() >> - simplify comments >> - Merge branch 'master' into pa-splitter >> - ... and 4 more: https://git.openjdk.org/jdk/compare/6b515303...54c37988 > > src/hotspot/share/gc/shared/partialArraySplitter.hpp line 81: > >> 79: // Result type for claim(), carrying multiple values. Provides the claimed >> 80: // chunk's start and end array indices. >> 81: struct Claim { > > I feel `Chunk` is a better name. I think Chunk is overly generic and used a lot elsewhere. It could just as easily be Region (e.g. the "claimed region" instead of the "claimed chunk"). I think the "claim-ness" is the important feature here. > src/hotspot/share/gc/shared/partialArraySplitter.inline.hpp line 63: > >> 61: PartialArraySplitter::claim(PartialArrayState* state, Queue* queue, bool stolen) { >> 62: #if TASKQUEUE_STATS >> 63: if (stolen) _stats.inc_stolen(); > > Breaking it into multiple lines make the control flow more explicit. This stylistic difference has been discussed at length in the past. > src/hotspot/share/gc/shared/partialArrayTaskStats.cpp line 49: > >> 47: >> 48: void PartialArrayTaskStats::reset() { >> 49: *this = PartialArrayTaskStats(); > > Can we do sth like `static_assert(std::is_trivially_copyable::value)` here? I think you mean is_trivially_assignable. I don't think it's a useful assertion here. Depending on details of the class, one might reasonably implement such an operation in the same way even if it isn't trivially assignable. > src/hotspot/share/gc/shared/partialArrayTaskStats.hpp line 90: > >> 88: // title: A string title for the table. >> 89: template >> 90: static void log_set(uint num_stats, StatsAccess access, const char* title) { > > Going through all its call sites, I believe `print_stats` is more readable. The name log_set was chosen to suggest that it does "UL logging", and to indicate that it is for dealing with a set of stats objects. I think print_stats loses both of those cues and is less clear because of that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1890561127 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1890561291 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1890561350 PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1890561199 From ayang at openjdk.org Wed Dec 18 18:25:37 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 18 Dec 2024 18:25:37 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v4] In-Reply-To: References: Message-ID: On Wed, 18 Dec 2024 16:56:45 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/shared/partialArrayTaskStats.hpp line 90: >> >>> 88: // title: A string title for the table. >>> 89: template >>> 90: static void log_set(uint num_stats, StatsAccess access, const char* title) { >> >> Going through all its call sites, I believe `print_stats` is more readable. > > The name log_set was chosen to suggest that it does "UL logging", and to > indicate that it is for dealing with a set of stats objects. I think > print_stats loses both of those cues and is less clear because of that. Why is "set" more than important than "stats" in "set of stats objects"? If "UL logging" is critical, "log_stats" would be better. When I first read this name, I thought it's related to "set" as in "getter/setter" of log... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1890642530 From kbarrett at openjdk.org Wed Dec 18 19:01:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 18 Dec 2024 19:01:39 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v4] In-Reply-To: References: Message-ID: <0Rt2156r75CNZ05GNDBP9dm2UbtJAm3wuViLWyQXIB8=.6bfd61dc-2b56-4bda-82c1-cc76fb2a91c5@github.com> On Wed, 18 Dec 2024 18:04:32 GMT, Albert Mingkun Yang wrote: >> The name log_set was chosen to suggest that it does "UL logging", and to >> indicate that it is for dealing with a set of stats objects. I think >> print_stats loses both of those cues and is less clear because of that. > > Why is "set" more than important than "stats" in "set of stats objects"? If "UL logging" is critical, "log_stats" would be better. When I first read this name, I thought it's related to "set" as in "getter/setter" of log... "stats" is redundent here. Recall this is a static function. A client call is going to look like `PartialArrayTaskStats::log_set(...)`, so it's already obvious it's related to "stats" at the call site. A value assigning function would have a "set_" prefix. Using a "_set" suffix for that would be really weird and non-idiomatic (and a reader would be quite right to complain about such). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1890702759 From ayang at openjdk.org Wed Dec 18 19:32:38 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 18 Dec 2024 19:32:38 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v4] In-Reply-To: <0Rt2156r75CNZ05GNDBP9dm2UbtJAm3wuViLWyQXIB8=.6bfd61dc-2b56-4bda-82c1-cc76fb2a91c5@github.com> References: <0Rt2156r75CNZ05GNDBP9dm2UbtJAm3wuViLWyQXIB8=.6bfd61dc-2b56-4bda-82c1-cc76fb2a91c5@github.com> Message-ID: On Wed, 18 Dec 2024 18:58:54 GMT, Kim Barrett wrote: >> Why is "set" more than important than "stats" in "set of stats objects"? If "UL logging" is critical, "log_stats" would be better. When I first read this name, I thought it's related to "set" as in "getter/setter" of log... > > "stats" is redundent here. Recall this is a static function. A client call is > going to look like `PartialArrayTaskStats::log_set(...)`, so it's already > obvious it's related to "stats" at the call site. > > A value assigning function would have a "set_" prefix. Using a "_set" suffix > for that would be really weird and non-idiomatic (and a reader would be quite > right to complain about such). I don't feel that the redundancy here is bad, since the first two args are tied to "stats". OTOH, I find the trailing "set" super confusing. This function is to log/print multiple stats, and the most intuitive choice would have been "log/print" + "stats", because it directly communicates the action being performed (logging stats). Emphasizing the collective noun instead of the actual noun seems odd. YMMV. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22622#discussion_r1890738140 From tschatzl at openjdk.org Thu Dec 19 07:16:39 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 19 Dec 2024 07:16:39 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v4] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 18:08:23 GMT, Kim Barrett wrote: >> Please review this change that introduces two new helper classes to simplify >> the usage of PartialArrayStates to manage splitting the processing of large >> object arrays into parallelizable chunks. G1 and Parallel young GCs are >> changed to use this new mechanism. >> >> PartialArrayTaskStats is used to collect and report statistics related to >> array splitting. It replaces the direct implementation in PSPromotionManager, >> and is now also used by G1 young GCs. >> >> PartialArraySplitter packages up most of the work involved in splitting and >> processing tasks. It provides task allocation and release, enqueuing, chunk >> claiming, and statistics tracking. It does this by encapsulating existing >> objects and functionality. Using array splitting is mostly reduced to calling >> the splitter's start function and then calling it's step function to process >> partial states. This substantially reduces the amount of code for each client >> to perform this work. >> >> Testing: mach5 tier1-5 >> >> Manually ran some test programs with each of G1 and Parallel, with taskqueue >> stats logging enabled, and checked that the logged statistics looked okay. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into pa-splitter > - merge log_set decl/defn > - remove default counts for stats incrementers > - remove uneeded 'explicit' > - cleanup unneeded includes > - remove moved-from macro defines > - Merge branch 'master' into pa-splitter > - rename splitter.step() => claim() > - simplify comments > - Merge branch 'master' into pa-splitter > - ... and 4 more: https://git.openjdk.org/jdk/compare/eb68ee60...54c37988 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22622#pullrequestreview-2513518190 From kbarrett at openjdk.org Thu Dec 19 16:05:49 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 19 Dec 2024 16:05:49 GMT Subject: RFR: 8345732: Provide helpers for using PartialArrayState [v4] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 18:08:23 GMT, Kim Barrett wrote: >> Please review this change that introduces two new helper classes to simplify >> the usage of PartialArrayStates to manage splitting the processing of large >> object arrays into parallelizable chunks. G1 and Parallel young GCs are >> changed to use this new mechanism. >> >> PartialArrayTaskStats is used to collect and report statistics related to >> array splitting. It replaces the direct implementation in PSPromotionManager, >> and is now also used by G1 young GCs. >> >> PartialArraySplitter packages up most of the work involved in splitting and >> processing tasks. It provides task allocation and release, enqueuing, chunk >> claiming, and statistics tracking. It does this by encapsulating existing >> objects and functionality. Using array splitting is mostly reduced to calling >> the splitter's start function and then calling it's step function to process >> partial states. This substantially reduces the amount of code for each client >> to perform this work. >> >> Testing: mach5 tier1-5 >> >> Manually ran some test programs with each of G1 and Parallel, with taskqueue >> stats logging enabled, and checked that the logged statistics looked okay. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into pa-splitter > - merge log_set decl/defn > - remove default counts for stats incrementers > - remove uneeded 'explicit' > - cleanup unneeded includes > - remove moved-from macro defines > - Merge branch 'master' into pa-splitter > - rename splitter.step() => claim() > - simplify comments > - Merge branch 'master' into pa-splitter > - ... and 4 more: https://git.openjdk.org/jdk/compare/1a1ee563...54c37988 Thanks all for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22622#issuecomment-2554792809 From kbarrett at openjdk.org Thu Dec 19 16:05:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 19 Dec 2024 16:05:52 GMT Subject: Integrated: 8345732: Provide helpers for using PartialArrayState In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 23:27:33 GMT, Kim Barrett wrote: > Please review this change that introduces two new helper classes to simplify > the usage of PartialArrayStates to manage splitting the processing of large > object arrays into parallelizable chunks. G1 and Parallel young GCs are > changed to use this new mechanism. > > PartialArrayTaskStats is used to collect and report statistics related to > array splitting. It replaces the direct implementation in PSPromotionManager, > and is now also used by G1 young GCs. > > PartialArraySplitter packages up most of the work involved in splitting and > processing tasks. It provides task allocation and release, enqueuing, chunk > claiming, and statistics tracking. It does this by encapsulating existing > objects and functionality. Using array splitting is mostly reduced to calling > the splitter's start function and then calling it's step function to process > partial states. This substantially reduces the amount of code for each client > to perform this work. > > Testing: mach5 tier1-5 > > Manually ran some test programs with each of G1 and Parallel, with taskqueue > stats logging enabled, and checked that the logged statistics looked okay. This pull request has now been integrated. Changeset: 2344a1a9 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/2344a1a917ec6f6380a8187af9f6c369ac3da6cb Stats: 674 lines in 14 files changed: 489 ins; 123 del; 62 mod 8345732: Provide helpers for using PartialArrayState Reviewed-by: tschatzl, ayang, zgu, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/22622 From iwalulya at openjdk.org Thu Dec 19 22:26:58 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 19 Dec 2024 22:26:58 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v10] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 29 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - Albert review - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - fix type - fix space issues - cleanup - assert - Thomas Review - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - ... and 19 more: https://git.openjdk.org/jdk/compare/f270c0d2...6a8039df ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/6194442d..6a8039df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=08-09 Stats: 6194 lines in 221 files changed: 3920 ins; 1574 del; 700 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From iwalulya at openjdk.org Thu Dec 19 22:26:58 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 19 Dec 2024 22:26:58 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v8] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 09:56:48 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> fix space issues > > src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 42: > >> 40: struct G1CollectionSetCandidateInfo { >> 41: G1HeapRegion* _r; >> 42: double _gc_efficiency; > > Seems that this field has become unused. Fixed > src/hotspot/share/gc/g1/g1HeapRegion.cpp line 155: > >> 153: // rely on the predition for this region. >> 154: if (_rem_set->is_added_to_cset_group() && _rem_set->cset_group()->length() > 1) { >> 155: return -1.0; > > I believe all special cases logic (returning -1`) in this method belong to the caller, `G1PrintRegionLivenessInfoClosure`, where we branch use `if(gc_eff < 0) {`. FIxed > src/hotspot/share/gc/g1/g1HeapRegionRemSet.hpp line 56: > >> 54: // nullptr guards before every use of _cset_group. >> 55: G1CSetCandidateGroup* _default_cset_group; >> 56: G1CSetCandidateGroup* _cset_group; > > As I understand it, only one of these two fields contains the real group. I don't get why we need null-checks if only `_cset_group` is there. Whenever we work with `_cset_group`, we should know whether it's null or not already depending on the call-site. Refactored to guarantee that all call-sites are aware of this detail, then removed the _default_cset_group. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1893213194 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1893213277 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1893214413 From wkemper at openjdk.org Thu Dec 19 23:12:50 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 19 Dec 2024 23:12:50 GMT Subject: RFR: 8346688: GenShen: Missing metadata trigger log message Message-ID: <2Z7snRh4sOgooDgZsjShb7M59GrTay0BRjVfq9tLLb4=.ea98e839-5173-4ebb-ac4d-4c5088c39d4a@github.com> The generational mode may trigger a global collection when metaspace is exhausted. When this happens, it should log this trigger as the reason for starting a cycle. ------------- Commit messages: - Missing metadata trigger log message in generational mode Changes: https://git.openjdk.org/jdk/pull/22838/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22838&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346688 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22838.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22838/head:pull/22838 PR: https://git.openjdk.org/jdk/pull/22838 From wkemper at openjdk.org Thu Dec 19 23:35:34 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 19 Dec 2024 23:35:34 GMT Subject: RFR: 8346569: Shenandoah: Worker initializes ShenandoahThreadLocalData twice results in memory leak In-Reply-To: References: Message-ID: On Wed, 18 Dec 2024 14:46:57 GMT, Zhengyu Gu wrote: > Worker thread initializes ShenandoahThreadLocalData twice, from Thread's constructor and ShenandoahWorkerThreads::on_create_worker(), that results in leaking ShenandoahEvacuationStats. Good catch! How'd you find this? ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/22812#pullrequestreview-2516415866 From wkemper at openjdk.org Thu Dec 19 23:52:56 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 19 Dec 2024 23:52:56 GMT Subject: RFR: 8346690: Shenandoah: Fix log message for end of GC usage report Message-ID: At the end of a cycle, the non-generational mode usage report has an errant reference to 'generation': GC(1) At end of GC: generation used: 835M ... After this change, the message is: GC(0) At end of GC: used: 1793K ... The message is unchanged for the generational mode: GC(0) At end of Concurrent Global GC: Young generation used: 1544K ... GC(0) At end of Concurrent Global GC: Old generation used: 0B ... ------------- Commit messages: - Fix usage report log message for non-generational modes Changes: https://git.openjdk.org/jdk/pull/22839/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22839&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346690 Stats: 18 lines in 1 file changed: 3 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/22839.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22839/head:pull/22839 PR: https://git.openjdk.org/jdk/pull/22839 From ysr at openjdk.org Fri Dec 20 01:39:33 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 20 Dec 2024 01:39:33 GMT Subject: RFR: 8346688: GenShen: Missing metadata trigger log message In-Reply-To: <2Z7snRh4sOgooDgZsjShb7M59GrTay0BRjVfq9tLLb4=.ea98e839-5173-4ebb-ac4d-4c5088c39d4a@github.com> References: <2Z7snRh4sOgooDgZsjShb7M59GrTay0BRjVfq9tLLb4=.ea98e839-5173-4ebb-ac4d-4c5088c39d4a@github.com> Message-ID: On Thu, 19 Dec 2024 23:08:19 GMT, William Kemper wrote: > The generational mode may trigger a global collection when metaspace is exhausted. When this happens, it should log this trigger as the reason for starting a cycle. May be also add the affected (previously failing) test names to the ticket as a future archeological aid. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22838#pullrequestreview-2516543634 From ysr at openjdk.org Fri Dec 20 01:43:34 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 20 Dec 2024 01:43:34 GMT Subject: RFR: 8346690: Shenandoah: Fix log message for end of GC usage report In-Reply-To: References: Message-ID: On Thu, 19 Dec 2024 23:48:26 GMT, William Kemper wrote: > At the end of a cycle, the non-generational mode usage report has an errant reference to 'generation': > > GC(1) At end of GC: generation used: 835M ... > > After this change, the message is: > > GC(0) At end of GC: used: 1793K ... > > > The message is unchanged for the generational mode: > > GC(0) At end of Concurrent Global GC: Young generation used: 1544K ... > GC(0) At end of Concurrent Global GC: Old generation used: 0B ... Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22839#pullrequestreview-2516551704 From xpeng at openjdk.org Fri Dec 20 07:58:10 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 20 Dec 2024 07:58:10 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle Message-ID: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. GenShen: Before: [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) After: [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) Shenandoah: Before: [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) After: [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) Additional changes: * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. * Clean up FullGC code, remove duplicate code. Additional tests: - [x] CONF=macosx-aarch64-server-fastdebug make test TEST=hotspot_gc_shenandoah ------------- Commit messages: - Merge branch 'openjdk:master' into reset-bitmap - Remove ShenandoahResetUpdateRegionStateClosure - Always set_mark_incomplete when reset mark bitmap - Fix - Add comments - fix - Not reset_mark_bitmap after cycle when is_concurrent_old_mark_in_progress or is_prepare_for_old_mark_in_progress - Not invoke set_mark_incomplete when reset bitmap after cycle - Renaming, comments, cleanup - Merge Concurrent reset (Old) into Concurrent reset - ... and 5 more: https://git.openjdk.org/jdk/compare/b2811a0c...2b9f28a1 Changes: https://git.openjdk.org/jdk/pull/22778/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22778&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338737 Stats: 176 lines in 9 files changed: 93 ins; 62 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/22778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22778/head:pull/22778 PR: https://git.openjdk.org/jdk/pull/22778 From jkratochvil at openjdk.org Fri Dec 20 11:38:49 2024 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Fri, 20 Dec 2024 11:38:49 GMT Subject: RFR: 8346713: [testsuite] NeverActAsServerClassMachine breaks TestPLABAdaptToMinTLABSize.java TestPinnedHumongousFragmentation.java TestPinnedObjectContents.java Message-ID: JTREG=JAVA_OPTIONS=-XX:+NeverActAsServerClassMachine test/hotspot/jtreg/gc/TestPLABAdaptToMinTLABSize.java java.lang.RuntimeException: Unexpected to get exit value of [0] test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedHumongousFragmentation.java # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 # Error: ShouldNotReachHere() test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedObjectContents.java # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 # Error: ShouldNotReachHere() ------------- Commit messages: - 8346713: [testsuite] NeverActAsServerClassMachine breaks TestPLABAdaptToMinTLABSize.java TestPinnedHumongousFragmentation.java TestPinnedObjectContents.java Changes: https://git.openjdk.org/jdk/pull/22847/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22847&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346713 Stats: 10 lines in 3 files changed: 4 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22847/head:pull/22847 PR: https://git.openjdk.org/jdk/pull/22847 From stefank at openjdk.org Fri Dec 20 13:41:35 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 20 Dec 2024 13:41:35 GMT Subject: RFR: 8346713: [testsuite] NeverActAsServerClassMachine breaks TestPLABAdaptToMinTLABSize.java TestPinnedHumongousFragmentation.java TestPinnedObjectContents.java In-Reply-To: References: Message-ID: On Fri, 20 Dec 2024 11:33:48 GMT, Jan Kratochvil wrote: > JTREG=JAVA_OPTIONS=-XX:+NeverActAsServerClassMachine > > test/hotspot/jtreg/gc/TestPLABAdaptToMinTLABSize.java > java.lang.RuntimeException: Unexpected to get exit value of [0] > > test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedHumongousFragmentation.java > # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 > # Error: ShouldNotReachHere() > > test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedObjectContents.java > # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 > # Error: ShouldNotReachHere() Changes requested by stefank (Reviewer). test/hotspot/jtreg/gc/TestPLABAdaptToMinTLABSize.java line 72: > 70: > 71: public static void main(String[] args) throws Exception { > 72: for (String gc : Arrays.asList("-XX:+UseG1GC", "-XX:+UseParallelGC")) { I think this could be problematic if you compile out G1 or Parallel. I would suggest that you create two separate run blocks. One for G1 and one for Parallel, and then pass in the GC to test through the `args`. ------------- PR Review: https://git.openjdk.org/jdk/pull/22847#pullrequestreview-2517573704 PR Review Comment: https://git.openjdk.org/jdk/pull/22847#discussion_r1893951737 From jkratochvil at openjdk.org Fri Dec 20 13:59:09 2024 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Fri, 20 Dec 2024 13:59:09 GMT Subject: RFR: 8346713: [testsuite] NeverActAsServerClassMachine breaks TestPLABAdaptToMinTLABSize.java TestPinnedHumongousFragmentation.java TestPinnedObjectContents.java [v2] In-Reply-To: References: Message-ID: > JTREG=JAVA_OPTIONS=-XX:+NeverActAsServerClassMachine > > test/hotspot/jtreg/gc/TestPLABAdaptToMinTLABSize.java > java.lang.RuntimeException: Unexpected to get exit value of [0] > > test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedHumongousFragmentation.java > # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 > # Error: ShouldNotReachHere() > > test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedObjectContents.java > # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 > # Error: ShouldNotReachHere() Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Fix compiled out G1 or Parallel - a review by stefank ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22847/files - new: https://git.openjdk.org/jdk/pull/22847/files/4b5a8c24..d570aba7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22847&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22847&range=00-01 Stats: 20 lines in 1 file changed: 11 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/22847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22847/head:pull/22847 PR: https://git.openjdk.org/jdk/pull/22847 From stefank at openjdk.org Fri Dec 20 14:24:37 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 20 Dec 2024 14:24:37 GMT Subject: RFR: 8346713: [testsuite] NeverActAsServerClassMachine breaks TestPLABAdaptToMinTLABSize.java TestPinnedHumongousFragmentation.java TestPinnedObjectContents.java [v2] In-Reply-To: References: Message-ID: On Fri, 20 Dec 2024 13:59:09 GMT, Jan Kratochvil wrote: >> JTREG=JAVA_OPTIONS=-XX:+NeverActAsServerClassMachine >> >> test/hotspot/jtreg/gc/TestPLABAdaptToMinTLABSize.java >> java.lang.RuntimeException: Unexpected to get exit value of [0] >> >> test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedHumongousFragmentation.java >> # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 >> # Error: ShouldNotReachHere() >> >> test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedObjectContents.java >> # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 >> # Error: ShouldNotReachHere() > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Fix compiled out G1 or Parallel - a review by stefank Looks good! ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22847#pullrequestreview-2517662244 From tschatzl at openjdk.org Fri Dec 20 14:54:37 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 20 Dec 2024 14:54:37 GMT Subject: RFR: 8346713: [testsuite] NeverActAsServerClassMachine breaks TestPLABAdaptToMinTLABSize.java TestPinnedHumongousFragmentation.java TestPinnedObjectContents.java [v2] In-Reply-To: References: Message-ID: On Fri, 20 Dec 2024 13:59:09 GMT, Jan Kratochvil wrote: >> JTREG=JAVA_OPTIONS=-XX:+NeverActAsServerClassMachine >> >> test/hotspot/jtreg/gc/TestPLABAdaptToMinTLABSize.java >> java.lang.RuntimeException: Unexpected to get exit value of [0] >> >> test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedHumongousFragmentation.java >> # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 >> # Error: ShouldNotReachHere() >> >> test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedObjectContents.java >> # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 >> # Error: ShouldNotReachHere() > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Fix compiled out G1 or Parallel - a review by stefank Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22847#pullrequestreview-2517721521 From duke at openjdk.org Fri Dec 20 14:59:37 2024 From: duke at openjdk.org (duke) Date: Fri, 20 Dec 2024 14:59:37 GMT Subject: RFR: 8346713: [testsuite] NeverActAsServerClassMachine breaks TestPLABAdaptToMinTLABSize.java TestPinnedHumongousFragmentation.java TestPinnedObjectContents.java [v2] In-Reply-To: References: Message-ID: On Fri, 20 Dec 2024 13:59:09 GMT, Jan Kratochvil wrote: >> JTREG=JAVA_OPTIONS=-XX:+NeverActAsServerClassMachine >> >> test/hotspot/jtreg/gc/TestPLABAdaptToMinTLABSize.java >> java.lang.RuntimeException: Unexpected to get exit value of [0] >> >> test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedHumongousFragmentation.java >> # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 >> # Error: ShouldNotReachHere() >> >> test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedObjectContents.java >> # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 >> # Error: ShouldNotReachHere() > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Fix compiled out G1 or Parallel - a review by stefank @jankratochvil Your change (at version d570aba714c5fc6e0286c5a188976d0bb0eb2c44) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22847#issuecomment-2557168405 From kdnilsen at openjdk.org Fri Dec 20 16:57:36 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 20 Dec 2024 16:57:36 GMT Subject: RFR: 8346688: GenShen: Missing metadata trigger log message In-Reply-To: <2Z7snRh4sOgooDgZsjShb7M59GrTay0BRjVfq9tLLb4=.ea98e839-5173-4ebb-ac4d-4c5088c39d4a@github.com> References: <2Z7snRh4sOgooDgZsjShb7M59GrTay0BRjVfq9tLLb4=.ea98e839-5173-4ebb-ac4d-4c5088c39d4a@github.com> Message-ID: On Thu, 19 Dec 2024 23:08:19 GMT, William Kemper wrote: > The generational mode may trigger a global collection when metaspace is exhausted. When this happens, it should log this trigger as the reason for starting a cycle. Marked as reviewed by kdnilsen (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/22838#pullrequestreview-2517956956 From kdnilsen at openjdk.org Fri Dec 20 16:59:35 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 20 Dec 2024 16:59:35 GMT Subject: RFR: 8346690: Shenandoah: Fix log message for end of GC usage report In-Reply-To: References: Message-ID: On Thu, 19 Dec 2024 23:48:26 GMT, William Kemper wrote: > At the end of a cycle, the non-generational mode usage report has an errant reference to 'generation': > > GC(1) At end of GC: generation used: 835M ... > > After this change, the message is: > > GC(0) At end of GC: used: 1793K ... > > > The message is unchanged for the generational mode: > > GC(0) At end of Concurrent Global GC: Young generation used: 1544K ... > GC(0) At end of Concurrent Global GC: Old generation used: 0B ... Marked as reviewed by kdnilsen (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/22839#pullrequestreview-2517959678 From wkemper at openjdk.org Fri Dec 20 17:32:42 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 20 Dec 2024 17:32:42 GMT Subject: Integrated: 8346690: Shenandoah: Fix log message for end of GC usage report In-Reply-To: References: Message-ID: On Thu, 19 Dec 2024 23:48:26 GMT, William Kemper wrote: > At the end of a cycle, the non-generational mode usage report has an errant reference to 'generation': > > GC(1) At end of GC: generation used: 835M ... > > After this change, the message is: > > GC(0) At end of GC: used: 1793K ... > > > The message is unchanged for the generational mode: > > GC(0) At end of Concurrent Global GC: Young generation used: 1544K ... > GC(0) At end of Concurrent Global GC: Old generation used: 0B ... This pull request has now been integrated. Changeset: d2a48634 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/d2a48634b872b65668b57d3975f805277ae96f83 Stats: 18 lines in 1 file changed: 3 ins; 0 del; 15 mod 8346690: Shenandoah: Fix log message for end of GC usage report Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/22839 From wkemper at openjdk.org Fri Dec 20 17:34:43 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 20 Dec 2024 17:34:43 GMT Subject: RFR: 8346688: GenShen: Missing metadata trigger log message In-Reply-To: <2Z7snRh4sOgooDgZsjShb7M59GrTay0BRjVfq9tLLb4=.ea98e839-5173-4ebb-ac4d-4c5088c39d4a@github.com> References: <2Z7snRh4sOgooDgZsjShb7M59GrTay0BRjVfq9tLLb4=.ea98e839-5173-4ebb-ac4d-4c5088c39d4a@github.com> Message-ID: On Thu, 19 Dec 2024 23:08:19 GMT, William Kemper wrote: > The generational mode may trigger a global collection when metaspace is exhausted. When this happens, it should log this trigger as the reason for starting a cycle. The failing tests are mentioned in the comment, but for posterity and specificity, they were: * vmTestbase/metaspace/gc/watermark_0_1/TestDescription.java * vmTestbase/metaspace/gc/watermark_10_20/TestDescription.java * vmTestbase/metaspace/gc/watermark_70_80/TestDescription.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/22838#issuecomment-2557434836 From wkemper at openjdk.org Fri Dec 20 17:34:44 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 20 Dec 2024 17:34:44 GMT Subject: Integrated: 8346688: GenShen: Missing metadata trigger log message In-Reply-To: <2Z7snRh4sOgooDgZsjShb7M59GrTay0BRjVfq9tLLb4=.ea98e839-5173-4ebb-ac4d-4c5088c39d4a@github.com> References: <2Z7snRh4sOgooDgZsjShb7M59GrTay0BRjVfq9tLLb4=.ea98e839-5173-4ebb-ac4d-4c5088c39d4a@github.com> Message-ID: On Thu, 19 Dec 2024 23:08:19 GMT, William Kemper wrote: > The generational mode may trigger a global collection when metaspace is exhausted. When this happens, it should log this trigger as the reason for starting a cycle. This pull request has now been integrated. Changeset: b8e40b9c Author: William Kemper URL: https://git.openjdk.org/jdk/commit/b8e40b9c2dfecdad9096015c1aa208ea077db7f5 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8346688: GenShen: Missing metadata trigger log message Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/22838 From wkemper at openjdk.org Fri Dec 20 18:13:48 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 20 Dec 2024 18:13:48 GMT Subject: RFR: 8346737: GenShen: Generational memory pools should not report zero for maximum capacity Message-ID: The following tests fail in generational mode because the old generation reports a _maximum_ capacity of zero when it is empty. It makes sense to treat maximum capacity as a _theoretical_ maximum and let it equal the total size of the heap. - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded001/TestDescription.java - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded002/TestDescription.java - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded003/TestDescription.java - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded004/TestDescription.java - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005/TestDescription.java - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold001/TestDescription.java - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold002/TestDescription.java - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold003/TestDescription.java - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold004/TestDescription.java - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold005/TestDescription.java ------------- Commit messages: - Fix usage report log message for non-generational modes Changes: https://git.openjdk.org/jdk/pull/22851/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22851&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346737 Stats: 6 lines in 2 files changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22851/head:pull/22851 PR: https://git.openjdk.org/jdk/pull/22851 From wkemper at openjdk.org Fri Dec 20 18:59:54 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 20 Dec 2024 18:59:54 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle In-Reply-To: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> Message-ID: <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> On Tue, 17 Dec 2024 00:09:25 GMT, Xiaolong Peng wrote: > Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. > > I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. > > GenShen: > Before: > > [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) > > > After: > > [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) > [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) > > > Shenandoah: > Before: > > [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) > > After: > > [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) > [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) > > > Additional changes: > * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. > * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: > - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 > - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. > * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. > * Clean up FullGC code, remove duplicate code. > > Additional tests: > - [x] CONF=macosx-aarch64-server-fastdebug make test T... Looks good. Left a few nits and a few questions in the review. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1211: > 1209: // Only reset for young generation, bitmap for old generation must be retained, > 1210: // except there is collection(global/old/degen/full) trigged to collect regions in old gen. > 1211: heap->young_generation()->reset_mark_bitmap(); Shouldn't it be safe to reset young region bitmaps even when old marking is in progress? src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 58: > 56: if (PREPARE_FOR_CURRENT_CYCLE) { > 57: if (region->need_bitmap_reset() && _heap->is_bitmap_slice_committed(region)) { > 58: _ctx->clear_bitmap(region); Should this also `region->unset_need_bitmap_reset()`? src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 66: > 64: // Reset live data and set TAMS optimistically. We would recheck these under the pause > 65: // anyway to capture any updates that happened since now. > 66: _ctx->capture_top_at_mark_start(region); Full GC used to do this unconditionally for all affiliated regions. Do we not still need that to happen? src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 92: > 90: } > 91: _recycling.unset(); > 92: _need_bitmap_reset = true; Move to initializers? Why does it start with `true`? A new region would have a clear bitmap, right? src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 269: > 267: ShenandoahSharedFlag _recycling; // Used to indicate that the region is being recycled; see try_recycle*(). > 268: > 269: bool _need_bitmap_reset; Nit pick, but I think this would read better as `_needs_bitmap_reset`. src/hotspot/share/gc/shenandoah/shenandoahOldGC.cpp line 161: > 159: } > 160: > 161: entry_reset_after_collect(); Not sure we want to reset old region bitmaps after old marking is compete. Shenandoah opportunistically uses the bitmap for old regions during remembered set scan (it's faster than walking the heap). ------------- Changes requested by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/22778#pullrequestreview-2518128727 PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894270072 PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894273795 PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894277047 PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894278663 PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894283547 PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894282722 From kdnilsen at openjdk.org Fri Dec 20 19:50:35 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 20 Dec 2024 19:50:35 GMT Subject: RFR: 8346737: GenShen: Generational memory pools should not report zero for maximum capacity In-Reply-To: References: Message-ID: On Fri, 20 Dec 2024 18:09:49 GMT, William Kemper wrote: > The following tests fail in generational mode because the old generation reports a _maximum_ capacity of zero when it is empty. It makes sense to treat maximum capacity as a _theoretical_ maximum and let it equal the total size of the heap. > > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded001/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded002/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded003/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded004/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold001/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold002/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold003/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold004/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold005/TestDescription.java Marked as reviewed by kdnilsen (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/22851#pullrequestreview-2518250428 From ysr at openjdk.org Fri Dec 20 23:23:43 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 20 Dec 2024 23:23:43 GMT Subject: RFR: 8346737: GenShen: Generational memory pools should not report zero for maximum capacity In-Reply-To: References: Message-ID: On Fri, 20 Dec 2024 18:09:49 GMT, William Kemper wrote: > The following tests fail in generational mode because the old generation reports a _maximum_ capacity of zero when it is empty. It makes sense to treat maximum capacity as a _theoretical_ maximum and let it equal the total size of the heap. > > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded001/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded002/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded003/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded004/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold001/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold002/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold003/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold004/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold005/TestDescription.java Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22851#pullrequestreview-2518497177 From wkemper at openjdk.org Fri Dec 20 23:54:39 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 20 Dec 2024 23:54:39 GMT Subject: Integrated: 8346737: GenShen: Generational memory pools should not report zero for maximum capacity In-Reply-To: References: Message-ID: <-hw9gqEe4rSQ5n5Ga7Uo5CN-gbW2WsrYQDDTgQez8Js=.515c62d2-0bfd-4d01-8d61-a61ff63b7ea2@github.com> On Fri, 20 Dec 2024 18:09:49 GMT, William Kemper wrote: > The following tests fail in generational mode because the old generation reports a _maximum_ capacity of zero when it is empty. It makes sense to treat maximum capacity as a _theoretical_ maximum and let it equal the total size of the heap. > > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded001/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded002/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded003/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded004/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/isCollectionUsageThresholdExceeded/isexceeded005/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold001/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold002/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold003/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold004/TestDescription.java > - vmTestbase/nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold/setthreshold005/TestDescription.java This pull request has now been integrated. Changeset: 249f1412 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/249f141211c94afcce70d9d536d84e108e07b4e5 Stats: 6 lines in 2 files changed: 0 ins; 6 del; 0 mod 8346737: GenShen: Generational memory pools should not report zero for maximum capacity Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/22851 From xpeng at openjdk.org Fri Dec 20 23:55:36 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 20 Dec 2024 23:55:36 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle In-Reply-To: <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> Message-ID: On Fri, 20 Dec 2024 18:40:18 GMT, William Kemper wrote: >> Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. >> >> I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. >> >> GenShen: >> Before: >> >> [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) >> >> >> After: >> >> [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) >> [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) >> >> >> Shenandoah: >> Before: >> >> [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) >> >> After: >> >> [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) >> [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) >> >> >> Additional changes: >> * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. >> * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: >> - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 >> - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. >> * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. >> * Clean up FullGC code, remove duplicate code. >> >> ... > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1211: > >> 1209: // Only reset for young generation, bitmap for old generation must be retained, >> 1210: // except there is collection(global/old/degen/full) trigged to collect regions in old gen. >> 1211: heap->young_generation()->reset_mark_bitmap(); > > Shouldn't it be safe to reset young region bitmaps even when old marking is in progress? Not really, if old marking is in progress, but current cycle is not bootstrap cycle, it means previous old collection has been cancelled to deal with allocation failure, control thread will try to resume old collection agin which will resume old marking again. > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 66: > >> 64: // Reset live data and set TAMS optimistically. We would recheck these under the pause >> 65: // anyway to capture any updates that happened since now. >> 66: _ctx->capture_top_at_mark_start(region); > > Full GC used to do this unconditionally for all affiliated regions. Do we not still need that to happen? I will double check this, I'm not 100% sure ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894506921 PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894507300 From xpeng at openjdk.org Sat Dec 21 00:08:35 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 21 Dec 2024 00:08:35 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle In-Reply-To: <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> Message-ID: On Fri, 20 Dec 2024 18:44:45 GMT, William Kemper wrote: >> Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. >> >> I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. >> >> GenShen: >> Before: >> >> [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) >> >> >> After: >> >> [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) >> [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) >> >> >> Shenandoah: >> Before: >> >> [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) >> >> After: >> >> [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) >> [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) >> >> >> Additional changes: >> * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. >> * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: >> - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 >> - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. >> * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. >> * Clean up FullGC code, remove duplicate code. >> >> ... > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 58: > >> 56: if (PREPARE_FOR_CURRENT_CYCLE) { >> 57: if (region->need_bitmap_reset() && _heap->is_bitmap_slice_committed(region)) { >> 58: _ctx->clear_bitmap(region); > > Should this also `region->unset_need_bitmap_reset()`? It is for current cycles, bitmaps will be dirty anyway, for next cycle reset will be needed, so the flag can't be unset here ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894511194 From wkemper at openjdk.org Sat Dec 21 00:30:35 2024 From: wkemper at openjdk.org (William Kemper) Date: Sat, 21 Dec 2024 00:30:35 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle In-Reply-To: References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> Message-ID: On Fri, 20 Dec 2024 23:51:58 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1211: >> >>> 1209: // Only reset for young generation, bitmap for old generation must be retained, >>> 1210: // except there is collection(global/old/degen/full) trigged to collect regions in old gen. >>> 1211: heap->young_generation()->reset_mark_bitmap(); >> >> Shouldn't it be safe to reset young region bitmaps even when old marking is in progress? > > Not really, if old marking is in progress, but current cycle is not bootstrap cycle, it means previous old collection has been cancelled to deal with allocation failure, control thread will try to resume old collection agin which will resume old marking again. The old cycle may be preempted by young collections, but it is only really _cancelled_ by global cycles or full GCs. Control thread will resume old marking, but this operates independently from young bitmap regions. I think we can reset young region bitmaps even when concurrent old marking is on going. >> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 58: >> >>> 56: if (PREPARE_FOR_CURRENT_CYCLE) { >>> 57: if (region->need_bitmap_reset() && _heap->is_bitmap_slice_committed(region)) { >>> 58: _ctx->clear_bitmap(region); >> >> Should this also `region->unset_need_bitmap_reset()`? > > It is for current cycle, bitmaps will be dirty anyway, for next cycle reset will be needed, so the flag can't be unset here Got it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894516552 PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894516866 From xpeng at openjdk.org Sat Dec 21 01:29:42 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 21 Dec 2024 01:29:42 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle In-Reply-To: <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> Message-ID: On Fri, 20 Dec 2024 18:55:44 GMT, William Kemper wrote: >> Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. >> >> I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. >> >> GenShen: >> Before: >> >> [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) >> >> >> After: >> >> [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) >> [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) >> >> >> Shenandoah: >> Before: >> >> [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) >> >> After: >> >> [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) >> [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) >> >> >> Additional changes: >> * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. >> * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: >> - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 >> - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. >> * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. >> * Clean up FullGC code, remove duplicate code. >> >> ... > > src/hotspot/share/gc/shenandoah/shenandoahOldGC.cpp line 161: > >> 159: } >> 160: >> 161: entry_reset_after_collect(); > > Not sure we want to reset old region bitmaps after old marking is compete. Shenandoah opportunistically uses the bitmap for old regions during remembered set scan (it's faster than walking the heap). The 'reset-after' won't reset bitmaps for old gen, it only reset for young gen, which is skipped in bootstrap cycle ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894528758 From xpeng at openjdk.org Sat Dec 21 01:50:42 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 21 Dec 2024 01:50:42 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle In-Reply-To: References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> Message-ID: On Sat, 21 Dec 2024 00:26:40 GMT, William Kemper wrote: >> Not really, if old marking is in progress, but current cycle is not bootstrap cycle, it means previous old collection has been cancelled to deal with allocation failure, control thread will try to resume old collection agin which will resume old marking again. > > The old cycle may be preempted by young collections, but it is only really _cancelled_ by global cycles or full GCs. Control thread will resume old marking, but this operates independently from young bitmap regions. I think we can reset young region bitmaps even when concurrent old marking is on going. I think we are taking about the same thing, old gen could be preempted by young gc and resumed after the cycle. I have seem crash from caused by this, an old gc was bootstrapped but it was preempted/canceled multiple times right after the old gc started, eventually caused a crash from verifier because it expected the object in young is marked. I will share the gc log on slack later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1894532058 From jkratochvil at openjdk.org Sat Dec 21 03:43:37 2024 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Sat, 21 Dec 2024 03:43:37 GMT Subject: Integrated: 8346713: [testsuite] NeverActAsServerClassMachine breaks TestPLABAdaptToMinTLABSize.java TestPinnedHumongousFragmentation.java TestPinnedObjectContents.java In-Reply-To: References: Message-ID: <_G6mXj3KmhQHtM8V8AAwaAy5zuKYvbBEsrgIoFjHZpc=.98541c00-c841-41ea-bf6c-d4c5ee0d0dd5@github.com> On Fri, 20 Dec 2024 11:33:48 GMT, Jan Kratochvil wrote: > JTREG=JAVA_OPTIONS=-XX:+NeverActAsServerClassMachine > > test/hotspot/jtreg/gc/TestPLABAdaptToMinTLABSize.java > java.lang.RuntimeException: Unexpected to get exit value of [0] > > test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedHumongousFragmentation.java > # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 > # Error: ShouldNotReachHere() > > test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedObjectContents.java > # Internal Error (/home/azul/azul/openjdk-git/src/hotspot/share/prims/whitebox.cpp:2647), pid=1672170, tid=1672189 > # Error: ShouldNotReachHere() This pull request has now been integrated. Changeset: 43b7e9f5 Author: Jan Kratochvil Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/43b7e9f54776ec7ed98d2e2f717c3d9663268ef2 Stats: 22 lines in 3 files changed: 14 ins; 0 del; 8 mod 8346713: [testsuite] NeverActAsServerClassMachine breaks TestPLABAdaptToMinTLABSize.java TestPinnedHumongousFragmentation.java TestPinnedObjectContents.java Reviewed-by: stefank, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/22847 From ayang at openjdk.org Mon Dec 23 10:33:51 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 23 Dec 2024 10:33:51 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v10] In-Reply-To: References: Message-ID: <5IANDiv_ZPk3dAPem7OekMx6d1cUDiFGtOVWlcWt52Y=.f5e7ad67-3181-4757-8f61-1bbcc9e62280@github.com> On Thu, 19 Dec 2024 22:26:58 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 29 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 > - Albert review > - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 > - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 > - fix type > - fix space issues > - cleanup > - assert > - Thomas Review > - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 > - ... and 19 more: https://git.openjdk.org/jdk/compare/3927700c...6a8039df src/hotspot/share/gc/g1/g1CardSet.cpp line 783: > 781: split_card(card, card_region, card_within_region); > 782: > 783: Extra blank line. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 40: > 38: { } > 39: > 40: G1CSetCandidateGroup::G1CSetCandidateGroup(G1CardSetConfiguration* config, uint group_id) : AFAICT, all callers use the same config from g1heap. I wonder if we reduce arg-list to just `group_id`. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 296: > 294: uint num_added_to_group = 0; > 295: > 296: uint group_id = 2; Should move this magical constant to where ` const uint _group_id;` is. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 3068: > 3066: if (r->rem_set()->cset_group()->length() == 1) { > 3067: gc_eff = r->rem_set()->cset_group()->gc_efficiency(); > 3068: } Why is `gc_eff` set only for length == 1? src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 3172: > 3170: size_t(0), young_only_cset_group->card_set()->mem_size()); > 3171: > 3172: for (G1CSetCandidateGroup* group : g1h->policy()->candidates()->from_marking_groups()) { This would skip retained groups, right? Is that intentional? src/hotspot/share/gc/g1/g1HeapRegionRemSet.hpp line 49: > 47: G1CodeRootSet _code_roots; > 48: > 49: // The collection set groups to which the region owning this RSet is assigned. Should be singular, "group", right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1893739059 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1893774435 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1894518333 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1894520858 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1894521451 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1893739885 From iwalulya at openjdk.org Mon Dec 23 11:12:40 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 23 Dec 2024 11:12:40 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v10] In-Reply-To: <5IANDiv_ZPk3dAPem7OekMx6d1cUDiFGtOVWlcWt52Y=.f5e7ad67-3181-4757-8f61-1bbcc9e62280@github.com> References: <5IANDiv_ZPk3dAPem7OekMx6d1cUDiFGtOVWlcWt52Y=.f5e7ad67-3181-4757-8f61-1bbcc9e62280@github.com> Message-ID: On Sat, 21 Dec 2024 00:45:18 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 29 additional commits since the last revision: >> >> - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 >> - Albert review >> - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 >> - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 >> - fix type >> - fix space issues >> - cleanup >> - assert >> - Thomas Review >> - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 >> - ... and 19 more: https://git.openjdk.org/jdk/compare/ed1a6147...6a8039df > > src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 3068: > >> 3066: if (r->rem_set()->cset_group()->length() == 1) { >> 3067: gc_eff = r->rem_set()->cset_group()->gc_efficiency(); >> 3068: } > > Why is `gc_eff` set only for length == 1? if the group has more than one region, then the gc_eff is associated with the entire group and not just the single region. However, if we have just one region in the group, then we can go ahead and print the `gc_eff` details. > src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 3172: > >> 3170: size_t(0), young_only_cset_group->card_set()->mem_size()); >> 3171: >> 3172: for (G1CSetCandidateGroup* group : g1h->policy()->candidates()->from_marking_groups()) { > > This would skip retained groups, right? Is that intentional? Yes, retained regions are in "single region" groups, so all details should be added to the log when we call `do_heap_region` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1895624069 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1895621405 From ayang at openjdk.org Mon Dec 23 21:05:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 23 Dec 2024 21:05:43 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v10] In-Reply-To: References: <5IANDiv_ZPk3dAPem7OekMx6d1cUDiFGtOVWlcWt52Y=.f5e7ad67-3181-4757-8f61-1bbcc9e62280@github.com> Message-ID: On Mon, 23 Dec 2024 11:08:57 GMT, Ivan Walulya wrote: >> src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 3172: >> >>> 3170: size_t(0), young_only_cset_group->card_set()->mem_size()); >>> 3171: >>> 3172: for (G1CSetCandidateGroup* group : g1h->policy()->candidates()->from_marking_groups()) { >> >> This would skip retained groups, right? Is that intentional? > > Yes, retained regions are in "single region" groups, so all details should be added to the log when we call `do_heap_region` I see; however, this would print the same gc_eff twice if young-gen contains a single region, right? Since this method is about cset-groups, I think it's more natural to visit all groups (regardless their size) here. With this PR, there is no gc_eff associated with individual region, `do_heap_region` can just skip gc_eff. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1896120738 From xpeng at openjdk.org Wed Dec 25 08:11:20 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 25 Dec 2024 08:11:20 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle [v2] In-Reply-To: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> Message-ID: > Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. > > I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. > > GenShen: > Before: > > [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) > > > After: > > [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) > [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) > > > Shenandoah: > Before: > > [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) > > After: > > [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) > [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) > > > Additional changes: > * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. > * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: > - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 > - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. > * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. > * Clean up FullGC code, remove duplicate code. > > Additional tests: > - [x] CONF=macosx-aarch64-server-fastdebug make test T... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22778/files - new: https://git.openjdk.org/jdk/pull/22778/files/2b9f28a1..36f14832 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22778&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22778&range=00-01 Stats: 22 lines in 5 files changed: 2 ins; 1 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/22778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22778/head:pull/22778 PR: https://git.openjdk.org/jdk/pull/22778 From xpeng at openjdk.org Thu Dec 26 17:07:39 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 26 Dec 2024 17:07:39 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle [v2] In-Reply-To: References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> Message-ID: On Fri, 20 Dec 2024 23:53:09 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 66: >> >>> 64: // Reset live data and set TAMS optimistically. We would recheck these under the pause >>> 65: // anyway to capture any updates that happened since now. >>> 66: _ctx->capture_top_at_mark_start(region); >> >> Full GC used to do this unconditionally for all affiliated regions. Do we not still need that to happen? > > I will double check this, I'm not 100% sure I have update code to handle old GC differently, now the behavior should be same as before. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1898022673 From xpeng at openjdk.org Thu Dec 26 17:07:40 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 26 Dec 2024 17:07:40 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle [v2] In-Reply-To: <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> <__kORuPC0guQED9-jn2Xg9CFIJ15wVRojwZoy_VqcPs=.0e5c812f-9e4e-4396-8acd-1e84a5e598c5@github.com> Message-ID: <9QFHoPnlL-dJ8seRqDpmPghTPu5WfE6h0T_8KXZeSaE=.60ec6946-dc55-4ea1-ab0a-c76178e8c24b@github.com> On Fri, 20 Dec 2024 18:56:38 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 269: > >> 267: ShenandoahSharedFlag _recycling; // Used to indicate that the region is being recycled; see try_recycle*(). >> 268: >> 269: bool _need_bitmap_reset; > > Nit pick, but I think this would read better as `_needs_bitmap_reset`. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22778#discussion_r1898022147 From kbarrett at openjdk.org Mon Dec 30 01:13:06 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 30 Dec 2024 01:13:06 GMT Subject: RFR: 8345374: Ubsan: runtime error: division by zeroavoid divide by zero Message-ID: Please review this change to G1HeapSizingPolicy to avoid a float division by zero when calculating the maximum desired capacity with a MaxHeapFreeRatio value of 100%. Testing: mach5 tier1 with G1 and MaxHeapFreeRatio=100. ------------- Commit messages: - avoid divide by zero Changes: https://git.openjdk.org/jdk/pull/22893/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22893&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345374 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22893.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22893/head:pull/22893 PR: https://git.openjdk.org/jdk/pull/22893 From jwaters at openjdk.org Mon Dec 30 05:23:34 2024 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 30 Dec 2024 05:23:34 GMT Subject: RFR: 8345374: Ubsan: runtime error: division by zeroavoid divide by zero In-Reply-To: References: Message-ID: On Mon, 30 Dec 2024 01:07:44 GMT, Kim Barrett wrote: > Please review this change to G1HeapSizingPolicy to avoid a float division by > zero when calculating the maximum desired capacity with a MaxHeapFreeRatio > value of 100%. > > Testing: mach5 tier1 with G1 and MaxHeapFreeRatio=100. src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 201: > 199: > 200: static size_t target_heap_capacity(size_t used_bytes, uintx free_ratio) { > 201: assert(free_ratio <= 100, "precondition"); Doesn't debug.hpp have precond for this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22893#discussion_r1899279856 From kbarrett at openjdk.org Mon Dec 30 08:20:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 30 Dec 2024 08:20:39 GMT Subject: RFR: 8345374: Ubsan: runtime error: division by zeroavoid divide by zero In-Reply-To: References: Message-ID: On Mon, 30 Dec 2024 05:20:39 GMT, Julian Waters wrote: >> Please review this change to G1HeapSizingPolicy to avoid a float division by >> zero when calculating the maximum desired capacity with a MaxHeapFreeRatio >> value of 100%. >> >> Testing: mach5 tier1 with G1 and MaxHeapFreeRatio=100. > > src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 201: > >> 199: >> 200: static size_t target_heap_capacity(size_t used_bytes, uintx free_ratio) { >> 201: assert(free_ratio <= 100, "precondition"); > > Doesn't debug.hpp have precond for this? It does, and hardly anyone uses it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22893#discussion_r1899359181 From xpeng at openjdk.org Mon Dec 30 22:54:27 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 30 Dec 2024 22:54:27 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle [v3] In-Reply-To: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> Message-ID: > Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. > > I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. > > GenShen: > Before: > > [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) > > > After: > > [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) > [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) > > > Shenandoah: > Before: > > [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) > > After: > > [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) > [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) > > > Additional changes: > * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. > * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: > - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 > - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. > * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. > * Clean up FullGC code, remove duplicate code. > > Additional tests: > - [x] CONF=macosx-aarch64-server-fastdebug make test T... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Merge branch 'openjdk:master' into reset-bitmap - Address review comments - Merge branch 'openjdk:master' into reset-bitmap - Remove ShenandoahResetUpdateRegionStateClosure - Always set_mark_incomplete when reset mark bitmap - Fix - Add comments - fix - Not reset_mark_bitmap after cycle when is_concurrent_old_mark_in_progress or is_prepare_for_old_mark_in_progress - Not invoke set_mark_incomplete when reset bitmap after cycle - ... and 7 more: https://git.openjdk.org/jdk/compare/673b06be...f82fdfaa ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22778/files - new: https://git.openjdk.org/jdk/pull/22778/files/36f14832..f82fdfaa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22778&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22778&range=01-02 Stats: 4760 lines in 68 files changed: 4271 ins; 269 del; 220 mod Patch: https://git.openjdk.org/jdk/pull/22778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22778/head:pull/22778 PR: https://git.openjdk.org/jdk/pull/22778 From jwaters at openjdk.org Tue Dec 31 06:58:34 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 31 Dec 2024 06:58:34 GMT Subject: RFR: 8345374: Ubsan: runtime error: division by zeroavoid divide by zero In-Reply-To: References: Message-ID: <5J21YcGO5FJukhpN1W3G1dYu1KQudSVANgR2jUTF6JI=.4a46b4cf-dea2-473a-a036-a29004b722e9@github.com> On Mon, 30 Dec 2024 01:07:44 GMT, Kim Barrett wrote: > Please review this change to G1HeapSizingPolicy to avoid a float division by > zero when calculating the maximum desired capacity with a MaxHeapFreeRatio > value of 100%. > > Testing: mach5 tier1 with G1 and MaxHeapFreeRatio=100. Looks alright, but I think the title needs to be changed to match the one on the tracker ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/22893#pullrequestreview-2526212615