From qamai at openjdk.org Mon Dec 1 06:28:04 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 1 Dec 2025 06:28:04 GMT Subject: RFR: 8372779: C2: Disambiguate Node::adr_type for the IR graph Message-ID: Hi, Currently, `Node::adr_type` is ambiguous. For some, it refers to the memory the node consumes, while for the others, it refer to the memory the node produces. This PR removes that ambiguity by introducing `Node::in_adr_type` and `Node::out_adr_type` that refer to those properties, respectively. It also introduces a local verification of the memory graph during compilation. These additions uncover some issues: - Sometimes, the memory is wired incorrectly, such as in `LibraryCall::extend_setCurrentThread`, the `Phi` collect the `StoreNode`s instead of the whole memory state. I think these issues do not result in crashes or miscompilation, though. - `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. - For nodes such as `StrInflatedCopyNode`, as it consumes more than it produces, during scheduling, we need to compute anti-dependencies. This is not the case, so I fixed it by making it kill all the memory it consumes. - `GraphKit::set_output_for_allocation` uses a raw `ProjNode` as the base for a `MergeMem`, this is really suspicious. I didn't fix it, as it seems to not result in any symptom at the moment. In the end, the execution of the compiler is strictly more restricted than before, and there is less room for ambiguity. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - Disambiguate Node::adr_type Changes: https://git.openjdk.org/jdk/pull/28570/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28570&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372779 Stats: 629 lines in 36 files changed: 403 ins; 72 del; 154 mod Patch: https://git.openjdk.org/jdk/pull/28570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28570/head:pull/28570 PR: https://git.openjdk.org/jdk/pull/28570 From qamai at openjdk.org Mon Dec 1 10:53:48 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 1 Dec 2025 10:53:48 GMT Subject: RFR: 8372779: C2: Disambiguate Node::adr_type for the IR graph [v2] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Node::adr_type` is ambiguous. For some, it refers to the memory the node consumes, while for the others, it refer to the memory the node produces. This PR removes that ambiguity by introducing `Node::in_adr_type` and `Node::out_adr_type` that refer to those properties, respectively. It also introduces a local verification of the memory graph during compilation. These additions uncover some issues: > > - Sometimes, the memory is wired incorrectly, such as in `LibraryCall::extend_setCurrentThread`, the `Phi` collect the `StoreNode`s instead of the whole memory state. I think these issues do not result in crashes or miscompilation, though. > - `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. > - For nodes such as `StrInflatedCopyNode`, as it consumes more than it produces, during scheduling, we need to compute anti-dependencies. This is not the case, so I fixed it by making it kill all the memory it consumes. > - `GraphKit::set_output_for_allocation` uses a raw `ProjNode` as the base for a `MergeMem`, this is really suspicious. I didn't fix it, as it seems to not result in any symptom at the moment. > > In the end, the execution of the compiler is strictly more restricted than before, and there is less room for ambiguity. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: store_to_memory does not emit MemBars ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28570/files - new: https://git.openjdk.org/jdk/pull/28570/files/10c0303f..b39029a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28570&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28570&range=00-01 Stats: 9 lines in 1 file changed: 4 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28570/head:pull/28570 PR: https://git.openjdk.org/jdk/pull/28570 From wkemper at openjdk.org Mon Dec 1 15:40:34 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 Dec 2025 15:40:34 GMT Subject: Integrated: 8372444: Genshen: Optimize evacuation function In-Reply-To: References: Message-ID: <2aym_7c9MkBuruXbGMpz4DjKubklrkrf_U7w_Yc81Ck=.3fe75278-3246-48bb-8c28-7de179adadb0@github.com> On Tue, 25 Nov 2025 17:33:01 GMT, William Kemper wrote: > This is a hot code path. Many of the branches can be eliminated at compile time by introducing template parameters. This change shows a 5% reduction in concurrent evacuation times at the trimmed-10 average on the extremem benchmark: > > > gen/control/extremem > Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum > concurrent_evacuation_young_data | 65 | 9625198.000 | 118747.249 | 148079.969 | 145182.189 | 76534.845 | 7216.000 | 317261.000 > > gen/template/extremem > Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum > concurrent_evacuation_young_data | 65 | 9095084.000 | 113036.539 | 139924.369 | 137661.226 | 71091.273 | 7523.000 | 294442.000 This pull request has now been integrated. Changeset: a1cc8f4e Author: William Kemper URL: https://git.openjdk.org/jdk/commit/a1cc8f4e4107e361f64cf51ff73985e471cdde03 Stats: 54 lines in 5 files changed: 15 ins; 13 del; 26 mod 8372444: Genshen: Optimize evacuation function Reviewed-by: ysr, xpeng ------------- PR: https://git.openjdk.org/jdk/pull/28496 From wkemper at openjdk.org Mon Dec 1 15:40:32 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 Dec 2025 15:40:32 GMT Subject: RFR: 8372444: Genshen: Optimize evacuation function In-Reply-To: <1H7ReDFaqSzRUxHiPQgoHwqr9nGnieCTkypgz2m5Z4I=.0bfbe08e-c85f-4ad0-805e-d94c709057d1@github.com> References: <1H7ReDFaqSzRUxHiPQgoHwqr9nGnieCTkypgz2m5Z4I=.0bfbe08e-c85f-4ad0-805e-d94c709057d1@github.com> Message-ID: On Wed, 26 Nov 2025 02:13:39 GMT, Y. Srinivas Ramakrishna wrote: >> This is a hot code path. Many of the branches can be eliminated at compile time by introducing template parameters. This change shows a 5% reduction in concurrent evacuation times at the trimmed-10 average on the extremem benchmark: >> >> >> gen/control/extremem >> Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum >> concurrent_evacuation_young_data | 65 | 9625198.000 | 118747.249 | 148079.969 | 145182.189 | 76534.845 | 7216.000 | 317261.000 >> >> gen/template/extremem >> Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum >> concurrent_evacuation_young_data | 65 | 9095084.000 | 113036.539 | 139924.369 | 137661.226 | 71091.273 | 7523.000 | 294442.000 > > LGTM. Impressed that templatization led to such a substantial improvement. > Should the 2-case switch in `ShenandoahGenerationalHeap::evacuate_object()` be converted to an `if-else` ? @ysramakrishna , we have had reviewers in the past ask us to use a switch with this enumeration. I agree an `if/else` would be more compact here, but it would probably belong on a different pull request. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28496#issuecomment-3597245231 From ysr at openjdk.org Mon Dec 1 16:34:40 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 1 Dec 2025 16:34:40 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 In-Reply-To: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: <4h1gCnsTk4_Gbl77bm_oe2S5deR7LGajZmvsPcDV5dw=.b29abfd8-1649-433f-b392-f959e63a817c@github.com> On Thu, 27 Nov 2025 03:10:18 GMT, Xiaolong Peng wrote: > For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. > > The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. > > Tests: > - [x] specjbb, no crash > - [x] hotspot_gc_shenandoah Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28521#pullrequestreview-3525935185 From wkemper at openjdk.org Mon Dec 1 16:37:40 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 Dec 2025 16:37:40 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 In-Reply-To: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: On Thu, 27 Nov 2025 03:10:18 GMT, Xiaolong Peng wrote: > For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. > > The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. > > Tests: > - [x] specjbb, no crash > - [x] hotspot_gc_shenandoah Given that https://github.com/openjdk/jdk/pull/28247 significantly changed the encoding of the request `type` here, and we already missed an incorrect usage in one spot, I think we should use the member function to test for a shared/lab allocation. Can we also check if there are other uses of `type()` that may not be safe now? ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28521#pullrequestreview-3525951105 From xpeng at openjdk.org Mon Dec 1 16:44:27 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 16:44:27 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 In-Reply-To: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: On Thu, 27 Nov 2025 03:10:18 GMT, Xiaolong Peng wrote: > For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. > > The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. > > Tests: > - [x] specjbb, no crash > - [x] hotspot_gc_shenandoah > Given that #28247 significantly changed the encoding of the request `type` here, and we already missed an incorrect usage in one spot, I think we should use the member function to test for a shared/lab allocation. Can we also check if there are other uses of `type()` that may not be safe now? I did briefly check all the places where `type()` is called, it should be good now, there are two more places we could improve but won't causes bug, I initially added them in the PR and decided to revert them since they are not related to this bug fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28521#issuecomment-3597622168 From xpeng at openjdk.org Mon Dec 1 16:59:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 16:59:16 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 [v2] In-Reply-To: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> > For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. > > The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. > > Tests: > - [x] specjbb, no crash > - [x] hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Use member function is_lab_alloc() instead of test the value of type() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28521/files - new: https://git.openjdk.org/jdk/pull/28521/files/24dae41f..ad82a691 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28521&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28521&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28521/head:pull/28521 PR: https://git.openjdk.org/jdk/pull/28521 From xpeng at openjdk.org Mon Dec 1 16:59:18 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 16:59:18 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 In-Reply-To: References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: On Mon, 1 Dec 2025 16:42:16 GMT, Xiaolong Peng wrote: > Given that #28247 significantly changed the encoding of the request `type` here, and we already missed an incorrect usage in one spot, I think we should use the member function to test for a shared/lab allocation. Can we also check if there are other uses of `type()` that may not be safe now? I have updated the PR to use the member function to test for a shared/lab allocation, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28521#issuecomment-3597721121 From wkemper at openjdk.org Mon Dec 1 17:26:26 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 Dec 2025 17:26:26 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 [v2] In-Reply-To: <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> Message-ID: On Mon, 1 Dec 2025 16:59:16 GMT, Xiaolong Peng wrote: >> For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. >> >> The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. >> >> Tests: >> - [x] specjbb, no crash >> - [x] hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Use member function is_lab_alloc() instead of test the value of type() Thank you. I think we should try to remove other uses of `type()` in a separate PR. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28521#pullrequestreview-3526162449 From xpeng at openjdk.org Mon Dec 1 17:31:28 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 17:31:28 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 [v2] In-Reply-To: References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> Message-ID: On Mon, 1 Dec 2025 17:23:34 GMT, William Kemper wrote: > Thank you. I think we should try to remove other uses of `type()` in a separate PR. Thanks, I'll see if we can remove it, I believe we should be able to remove it from all the places except logging. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28521#issuecomment-3597890180 From xpeng at openjdk.org Mon Dec 1 18:30:48 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 18:30:48 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 [v2] In-Reply-To: <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> Message-ID: On Mon, 1 Dec 2025 16:59:16 GMT, Xiaolong Peng wrote: >> For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. >> >> The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. >> >> Tests: >> - [x] specjbb, no crash >> - [x] hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Use member function is_lab_alloc() instead of test the value of type() Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28521#issuecomment-3598205309 From xpeng at openjdk.org Mon Dec 1 18:34:02 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 18:34:02 GMT Subject: Integrated: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 In-Reply-To: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: On Thu, 27 Nov 2025 03:10:18 GMT, Xiaolong Peng wrote: > For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. > > The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. > > Tests: > - [x] specjbb, no crash > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: 79e99bb0 Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/79e99bb0778608733a677821a0bb35041e9fb939 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 Reviewed-by: wkemper, kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/28521 From dlong at openjdk.org Tue Dec 2 05:35:50 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 2 Dec 2025 05:35:50 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 05:11:47 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge master > - update make_barrier_type > - Merge branch 'openjdk:master' into new_pr > - Merge branch 'openjdk:master' into new_pr > - My chages Yes, bring it over, as it's an improvement. However, I was wondering if there was a way we can get rid of the remaining `#if INCLUDE_SHENANDOAHGC` in shared c2 code. The first idea that I came up with is for the GC init to reference a callback function for C2, but I'm not sure if the complexity is worth it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3600260250 From roland at openjdk.org Tue Dec 2 09:20:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:20:41 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into JDK-8354282 - whitespace - review - review - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java Co-authored-by: Christian Hagedorn - review - review - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 ------------- Changes: https://git.openjdk.org/jdk/pull/24575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=07 Stats: 365 lines in 13 files changed: 264 ins; 27 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From chagedorn at openjdk.org Tue Dec 2 13:54:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:26 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <2xxjKX6hMeKDfS9SGBEvll8yadDthCoUjCIRpaE8ObA=.b567ec00-7dad-4b57-82a4-db1149fc8942@github.com> On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 Thanks for the update, it looks good to me! If @eme64 also agrees with the latest patch, we can submit some testing and then hopefully get it in right before the fork. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3530251375 From chagedorn at openjdk.org Tue Dec 2 13:54:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:29 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> Message-ID: On Thu, 27 Nov 2025 12:29:10 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/castnode.hpp line 101: >> >>> 99: } >>> 100: return NonFloatingNonNarrowing; >>> 101: } >> >> Just a side note: We seem to mix the terms "(non-)pinned" with "(non-)floating" freely. Should we stick to just one? But maybe it's justified to use both depending on the situation/code context. > > The patch as it is now adds some extra uses of "pinned" and "floating". What could make sense, I suppose, would be to try to use "floating"/"non floating" instead but there are so many uses of "pinned" in the code base already, and I don't see us getting rid of them, that I wonder if it would make a difference. So, I'm not too sure what to do. Yes, that's true. I was also unsure about whether we should stick with one or just allow both interchangeably. I guess since there are so many uses, we can just move forward with what you have now and still come back to clean it up if necessary - we can always do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581285955 From chagedorn at openjdk.org Tue Dec 2 13:54:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:34 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> References: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> Message-ID: On Wed, 26 Nov 2025 13:24:05 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - review >> - review >> - Merge branch 'master' into JDK-8354282 >> - review >> - infinite loop in gvn fix >> - renaming >> - merge >> - Merge branch 'master' into JDK-8354282 >> - fix & test > > src/hotspot/share/opto/castnode.hpp line 120: > >> 118: // be removed in any case otherwise the sunk node floats back into the loop. >> 119: static const DependencyType NonFloatingNonNarrowing; >> 120: > > I needed a moment to completely understand all these combinations. I rewrote the definitions in this process a little bit. Feel free to take some of it over: > > > // All the possible combinations of floating/narrowing with example use cases: > > // Use case example: Range Check CastII > // Floating: The Cast is only dependent on the single range check. > // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > // remove the cast because the array access will be safe. > static const DependencyType FloatingNarrowing; > > // Use case example: Widening Cast nodes' types after loop opts: We want to common Casts with slightly different types. > // Floating: These Casts only depend on the single control. > // NonNarrowing: Even when the input type is narrower, we are not removing the Cast. Otherwise, the dependency > // to the single control is lost, and an array access could float above its range check because we > // just removed the dependency to the range check by removing the Cast. This could lead to an > // out-of-bounds access. > static const DependencyType FloatingNonNarrowing; > > // Use case example: An array accesses that is no longer dependent on a single range check (e.g. range check smearing). > // NonFloating: The array access must be pinned below all the checks it depends on. If the check it directly depends > // on with a control input is hoisted, we do hoist the Cast as well. If we allowed the Cast to float, > // we risk that the array access ends up above another check it depends on (we cannot model two control > // dependencies for a node in the IR). This could lead to an out-of-bounds access. > // Narrowing: If the Cast does not narrow the input type, then it's safe to remove the cast because the array access > // will be safe. > static const DependencyType NonFloatingNarrowing; > > // Use case example: Sinking nodes out of a loop > // Non-Floating & Non-Narrowing: We don't want the Cast that forces the node to be out of loop to be removed in any > // case. Otherwise, the sunk node could float back into the l... Thanks for taking it over :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581287358 From epeter at openjdk.org Tue Dec 2 15:32:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:55 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 src/hotspot/share/opto/castnode.hpp line 108: > 106: // Floating: The Cast is only dependent on the single range check. > 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > 108: // remove the cast because the array access will be safe. The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? So is it not therefore already pinned? Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581630546 From epeter at openjdk.org Tue Dec 2 15:32:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:57 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> Message-ID: <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> On Tue, 2 Dec 2025 15:17:38 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/castnode.hpp line 108: >> >>> 106: // Floating: The Cast is only dependent on the single range check. >>> 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >>> 108: // remove the cast because the array access will be safe. >> >> The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? >> So is it not therefore already pinned? >> >> Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? > > Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581649395 From epeter at openjdk.org Tue Dec 2 15:32:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:56 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> On Tue, 2 Dec 2025 15:14:28 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into JDK-8354282 >> - whitespace >> - review >> - review >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java >> >> Co-authored-by: Christian Hagedorn >> - review >> - review >> - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 > > src/hotspot/share/opto/castnode.hpp line 108: > >> 106: // Floating: The Cast is only dependent on the single range check. >> 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >> 108: // remove the cast because the array access will be safe. > > The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? > So is it not therefore already pinned? > > Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581642290 From epeter at openjdk.org Tue Dec 2 15:32:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:58 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> Message-ID: <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> On Tue, 2 Dec 2025 15:19:26 GMT, Emanuel Peter wrote: >> Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... > > At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. Suggestion: // Use case example: Range Check CastII // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted // is would be safe to let the the Cast float to where the range check is hoisted up to. // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely // remove the cast because the array access will be safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581692285 From epeter at openjdk.org Tue Dec 2 16:52:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 16:52:30 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <9zey9SqquL1zLlFLuyKV_18OiZs2UQSokhREx9ln0l0=.edc15ede-e798-4d88-b61a-d2ed086d99da@github.com> On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 @rwestrel Nice work! We not just only fixed the bug but made the concepts much clearer. This makes me very happy ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3531172652 From epeter at openjdk.org Tue Dec 2 16:52:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 16:52:32 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 15:29:42 GMT, Emanuel Peter wrote: >> At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. > > Suggestion: > > // Use case example: Range Check CastII > // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted > // is would be safe to let the the Cast float to where the range check is hoisted up to. > // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > // remove the cast because the array access will be safe. Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582034834 From qamai at openjdk.org Tue Dec 2 17:48:43 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 17:48:43 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 src/hotspot/share/opto/castnode.hpp line 105: > 103: // All the possible combinations of floating/narrowing with example use cases: > 104: > 105: // Use case example: Range Check CastII I believe this is incorrect, a range check should be floating non-narrowing. It is only narrowing if the length of the array is a constant. It is because this cast encodes the dependency on the condition `index u< length`. This condition cannot be expressed in terms of `Type` unless `length` is a constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582188782 From qamai at openjdk.org Tue Dec 2 17:48:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 17:48:44 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 16:48:55 GMT, Emanuel Peter wrote: >> Suggestion: >> >> // Use case example: Range Check CastII >> // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted >> // it would be safe to let the the Cast float to where the range check is hoisted up to. >> // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >> // remove the cast because the array access will be safe. > > Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. Naming is hard, but it is worth pointing out in the comment that floating here refers to `depends_only_on_test`. In other words, a cast is considered floating if it is legal to change the control input of a cast from an `IfTrue` or `IfFalse` to an `IfTrue` and `IfFalse` that dominates the current control input, and the corresponding conditions of the `If`s are the same. In contrast, we cannot do that for a pinned cast, and if the control is folded away, the control input of the pinned cast is changed to the control predecessor of the folded node. It is also worth noting that we have `Node::pinned` which means the node is pinned AT the control input while pinned here means that it is pinned UNDER the control input. Very confusing! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582215477 From wkemper at openjdk.org Tue Dec 2 18:29:21 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 Dec 2025 18:29:21 GMT Subject: RFR: 8314599: [GenShen] Couple adaptive tenuring and generation size budgeting [v10] In-Reply-To: References: Message-ID: > Notable changes: > * Improvements to logging > * More accurate tracking of promotion failures > * Use shared allocation for promotions only when the size is above the maximum plab size (not the minimum size) > * Use census information gathered during mark to size promotion reserves and old generation > > With these changes, GenShen is expected to have fewer promotion failures and this is indeed the case. As a result of this, we expect less time to be spent in concurrent marking and update refs for young collections. We may also expect shorter concurrent evacuation phases because GenShen will have fewer densely packed regions stuck in the young generation. With more objects being promoted, we also expect to see longer remembered set scan times. This is generally the case across all benchmarks, but we do also see some counter-intuitive results. > > Here we are comparing 20 executions (10 on x86, 10 on aarch64) of the changes in the PR (experiment) against 20 executions of the same benchmarks results from tip. This is a summary of statistically significant changes of more than 5% across all benchmarks: > > > Concurrent Evacuation: 7 improvements, 3 regressions > ? Best improvements: extremem-large-45g (-29.6%), neo4j-analytics (-26.9%) > ? Worst regression: xalan (+53.7%) > > Concurrent Marking: 15 improvements, 1 regression > ? Best improvements: hyperalloc_a2048_o4096 (-30.1%), crypto.rsa (-27.3%) > ? Only regression: serial (+8.9%) > > Concurrent Scan Remembered Set: 7 improvements, 2 regressions > ? Best improvements: xalan (-49.4%), pmd (-49.0%), crypto.rsa (-41.8%) > ? Worst regression: extremem-phased (+52.4%) > > Concurrent Update Refs: 5 improvements, 4 regressions > ? Best improvements: crypto.rsa (-36.4%), mnemonics (-28.4%) > ? Worst regression: xalan (+89.4%) William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 73 commits: - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Adaptive tenuring is no longer optional We are using age census data to compute promotion reserves. The tenuring threshold may still be fixed by setting the min/max threshold to the same value. - Remove bad asserts - Don't include tenurable bytes for current cycle in the next cycle Also remove vestigial promotion potential calculation - Idle fix ups - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Disable assertion (will revisit later) - Print global evac tracking after other gc stats This makes it easier for parsers to distinguish from per cycle reports - Instrumentation and assertions - Idle cleanup as I read - ... and 63 more: https://git.openjdk.org/jdk/compare/5627ff2d...0c682e1c ------------- Changes: https://git.openjdk.org/jdk/pull/27632/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27632&range=09 Stats: 406 lines in 11 files changed: 166 ins; 182 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/27632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27632/head:pull/27632 PR: https://git.openjdk.org/jdk/pull/27632 From xpeng at openjdk.org Tue Dec 2 18:40:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 18:40:16 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class the allocator, most of the allocation code is in the class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2608 u... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: - Add missing header for ShenandoahFreeSetPartitionId - Declare ShenandoahFreeSetPartitionId as enum instead of enum class - Fix a typo - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition - Port the fix of JDK-8372566 - Merge branch 'master' into cas-alloc-1 - Merge remote-tracking branch 'origin/master' into cas-alloc-1 - Remove junk code - Remove unnecessary change and tidy up - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=13 Stats: 1637 lines in 25 files changed: 1283 ins; 242 del; 112 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Tue Dec 2 18:40:21 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 18:40:21 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v5] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 20:31:18 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 135 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - format >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Move ShenandoahHeapRegionIterationClosure to shenandoahFreeSet.hpp >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix errors caused by renaming ofAtomic to AtomicAccess >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 125 more: https://git.openjdk.org/jdk/compare/2f613911...e6bfef05 > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 271: > >> 269: _used[int(which_partition)] = value; >> 270: _available[int(which_partition)] = _capacity[int(which_partition)] - value; >> 271: AtomicAccess::store(_used + int(which_partition), value); > > Also here, should not require AtomicAccess. Sorry, it is junk code left over, I'm tidying up the changes in the PR, this line will be removed. > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 363: > >> 361: } >> 362: >> 363: void ShenandoahHeapRegion::reset_alloc_metadata() { > > Do we need to make these atomic because we now increment asynchronously from within mutator CAS allocations? Before, they were only adjusted while holding heap lock? I'm wondering if add-with-fetch() or CAS() would be more/less efficient than AtomicAccess::stores. Can we test the tradeoffs? Yes, we need to update these from mutator after every allocation w/o heap lock. `reset_alloc_metadata` it to reset the values, we have to use AtomicAccess::store, it is not in the hot path, it is only invoked when the region is recycled, I don't think there is performance issue here. For the code in hot path of mem allocation, I simply use `AtomicAccess::add` with `memory_order_relaxed`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2562099016 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2562086131 From xpeng at openjdk.org Tue Dec 2 19:07:35 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 19:07:35 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism Message-ID: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. Test result: java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" With the change: [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) Original: [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 100%+ improvement. ### Other tests - [x] hotspot_gc_shenandoah ------------- Commit messages: - Fix wrong impl of parallel_region_stride in ShenandoahExcludeRegionClosure & ShenandoahIncludeRegionClosure - Add comments - Set parallel_region_stride to 8 for ShenandoahResetBitmapClosure - Tidying - Override ShenandoahParallelRegionStride to 8 when wrap the closure with ShenandoahIncludeRegionClosure - Override ShenandoahParallelRegionStride to 8 when wrap the closure with ShenandoahExcludeRegionClosure Changes: https://git.openjdk.org/jdk/pull/28613/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28613&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372861 Stats: 16 lines in 4 files changed: 15 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28613/head:pull/28613 PR: https://git.openjdk.org/jdk/pull/28613 From wkemper at openjdk.org Tue Dec 2 19:18:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 Dec 2025 19:18:54 GMT Subject: Integrated: Merge openjdk/jdk21u:master In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:24:27 GMT, William Kemper wrote: > Merges tag jdk-21.0.10+4 This pull request has now been integrated. Changeset: fe585bd6 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/fe585bd6d0f9678f1cc8bcc56dc9c8a56af5d044 Stats: 632 lines in 23 files changed: 598 ins; 0 del; 34 mod Merge ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/229 From wkemper at openjdk.org Tue Dec 2 19:18:51 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 Dec 2025 19:18:51 GMT Subject: RFR: Merge openjdk/jdk21u:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.10+4 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/229/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/229/files/2f897401..2f897401 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=229&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=229&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/229.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/229/head:pull/229 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/229 From wkemper at openjdk.org Tue Dec 2 19:31:25 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 Dec 2025 19:31:25 GMT Subject: RFR: 8314599: [GenShen] Couple adaptive tenuring and generation size budgeting [v11] In-Reply-To: References: Message-ID: > Notable changes: > * Improvements to logging > * More accurate tracking of promotion failures > * Use shared allocation for promotions only when the size is above the maximum plab size (not the minimum size) > * Use census information gathered during mark to size promotion reserves and old generation > > With these changes, GenShen is expected to have fewer promotion failures and this is indeed the case. As a result of this, we expect less time to be spent in concurrent marking and update refs for young collections. We may also expect shorter concurrent evacuation phases because GenShen will have fewer densely packed regions stuck in the young generation. With more objects being promoted, we also expect to see longer remembered set scan times. This is generally the case across all benchmarks, but we do also see some counter-intuitive results. > > Here we are comparing 20 executions (10 on x86, 10 on aarch64) of the changes in the PR (experiment) against 20 executions of the same benchmarks results from tip. This is a summary of statistically significant changes of more than 5% across all benchmarks: > > > Concurrent Evacuation: 7 improvements, 3 regressions > ? Best improvements: extremem-large-45g (-29.6%), neo4j-analytics (-26.9%) > ? Worst regression: xalan (+53.7%) > > Concurrent Marking: 15 improvements, 1 regression > ? Best improvements: hyperalloc_a2048_o4096 (-30.1%), crypto.rsa (-27.3%) > ? Only regression: serial (+8.9%) > > Concurrent Scan Remembered Set: 7 improvements, 2 regressions > ? Best improvements: xalan (-49.4%), pmd (-49.0%), crypto.rsa (-41.8%) > ? Worst regression: extremem-phased (+52.4%) > > Concurrent Update Refs: 5 improvements, 4 regressions > ? Best improvements: crypto.rsa (-36.4%), mnemonics (-28.4%) > ? Worst regression: xalan (+89.4%) William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove commented out assertion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27632/files - new: https://git.openjdk.org/jdk/pull/27632/files/0c682e1c..502797a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27632&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27632&range=09-10 Stats: 7 lines in 1 file changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27632/head:pull/27632 PR: https://git.openjdk.org/jdk/pull/27632 From kdnilsen at openjdk.org Tue Dec 2 19:31:45 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 Dec 2025 19:31:45 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Tue, 2 Dec 2025 18:59:25 GMT, Xiaolong Peng wrote: > In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. > > In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. > > Test result: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > With the change: > > [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) > [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) > > Original: > > > [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) > [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) > > > The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 100%+ improvement. > > ### Other tests > - [x] hotspot_gc_shenandoah I think this is a 58% improvement (in the header) rather than a 100% improvement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28613#issuecomment-3603643649 From kdnilsen at openjdk.org Tue Dec 2 19:43:04 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 Dec 2025 19:43:04 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> On Tue, 2 Dec 2025 19:33:01 GMT, Kelvin Nilsen wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 100%+ improvement. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 83: > >> 81: // For a 31G heap resetting bitmaps could take more than 60ms for single thread, we should use a small >> 82: // parallel region stride for ShenandoahResetBitmapClosure. >> 83: size_t parallel_region_stride() override { return 8; } > > Should this be: > > if (ShenandoahParallelRegionStride == 0) { > return 8; > } else { > return ShenandoahParallelRegionStride; > } In fact, rather than the "constant" value 8, should we return ShenandoahWorkerPolicy::calc_workers_for_conc_reset()? This makes the change robust against future integration of "worker surge". In fact, the parallel stride depends on the reason we are iterating. Can we make this change more "generic"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582566703 From kdnilsen at openjdk.org Tue Dec 2 19:43:03 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 Dec 2025 19:43:03 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Tue, 2 Dec 2025 18:59:25 GMT, Xiaolong Peng wrote: > In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. > > In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. > > Test result: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > With the change: > > [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) > [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) > > Original: > > > [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) > [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) > > > The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 100%+ improvement. > > ### Other tests > - [x] hotspot_gc_shenandoah See below src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 83: > 81: // For a 31G heap resetting bitmaps could take more than 60ms for single thread, we should use a small > 82: // parallel region stride for ShenandoahResetBitmapClosure. > 83: size_t parallel_region_stride() override { return 8; } Should this be: if (ShenandoahParallelRegionStride == 0) { return 8; } else { return ShenandoahParallelRegionStride; } ------------- Changes requested by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3531815307 PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582551300 From xpeng at openjdk.org Tue Dec 2 20:02:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 20:02:16 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Tue, 2 Dec 2025 19:28:58 GMT, Kelvin Nilsen wrote: > I think this is a 58% improvement (in the header) rather than a 100% improvement. To be precise, it is 58% time reduction, 100+% speed/throughput improvement, I have updated the description. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28613#issuecomment-3603737834 From xpeng at openjdk.org Tue Dec 2 22:03:25 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 22:03:25 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> Message-ID: <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> On Tue, 2 Dec 2025 19:40:19 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 83: >> >>> 81: // For a 31G heap resetting bitmaps could take more than 60ms for single thread, we should use a small >>> 82: // parallel region stride for ShenandoahResetBitmapClosure. >>> 83: size_t parallel_region_stride() override { return 8; } >> >> Should this be: >> >> if (ShenandoahParallelRegionStride == 0) { >> return 8; >> } else { >> return ShenandoahParallelRegionStride; >> } > > In fact, rather than the "constant" value 8, should we return ShenandoahWorkerPolicy::calc_workers_for_conc_reset()? > > This makes the change robust against future integration of "worker surge". > > In fact, the parallel stride depends on the reason we are iterating. Can we make this change more "generic"? The intention of the change is to let `ShenandoahResetBitmapClosure` not use the ShenandoahParallelRegionStride global value at all, here is the reasons: ShenandoahParallelRegionStride is usually set to a large value, the default value used to be 1024, large value won't help with the performance ShenandoahHeapRegionClosure at all for some reasons: 1. ShenandoahResetBitmapClosure reset the marking bitmaps before/after GC cycle, the resetting may not not needed for each region. e.g. when `top_bitmap == bottom`(immediate trash regions?) or the region is not current gc generation. 2. Withe large ShenandoahParallelRegionStride, each task will get large number of successive regions, e.g. worker 0 will process region 1 to region 1024, in this way it is not possible to make sure the actual workload is evenly distributed to all workers, some of the workers may have most of the regions need bitmap reset, some of the worker may not really do any actual bitmap reset at all. A smaller parallel region stride value will help with the workload distribution and also make it adaptive to different number of workers, it should be also working just fine with "worker surge" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582857990 From xpeng at openjdk.org Tue Dec 2 22:03:28 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 22:03:28 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> Message-ID: <6TAyVPTOcp6ykpzuzVuTP04UU8mk0tSkakJEinh4dnA=.4c43b199-09b5-4b9c-984d-c69cdca0b294@github.com> On Tue, 2 Dec 2025 21:38:26 GMT, Xiaolong Peng wrote: >> In fact, rather than the "constant" value 8, should we return ShenandoahWorkerPolicy::calc_workers_for_conc_reset()? >> >> This makes the change robust against future integration of "worker surge". >> >> In fact, the parallel stride depends on the reason we are iterating. Can we make this change more "generic"? > > The intention of the change is to let `ShenandoahResetBitmapClosure` not use the ShenandoahParallelRegionStride global value at all, here is the reasons: > ShenandoahParallelRegionStride is usually set to a large value, the default value used to be 1024, large value won't help with the performance ShenandoahHeapRegionClosure at all for some reasons: > 1. ShenandoahResetBitmapClosure reset the marking bitmaps before/after GC cycle, the resetting may not not needed for each region. e.g. when `top_bitmap == bottom`(immediate trash regions?) or the region is not current gc generation. > 2. Withe large ShenandoahParallelRegionStride, each task will get large number of successive regions, e.g. worker 0 will process region 1 to region 1024, in this way it is not possible to make sure the actual workload is evenly distributed to all workers, some of the workers may have most of the regions need bitmap reset, some of the worker may not really do any actual bitmap reset at all. > > A smaller parallel region stride value will help with the workload distribution and also make it adaptive to different number of workers, it should be also working just fine with "worker surge" In the JBS bug report, I attached a test I did for this, I have tested value from 1 to 4096: java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -XX:ShenandoahParallelRegionStride=-jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" [1] [77.444s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3078 us) (n = 14) (lvls, us = 1172, 1289, 1328, 1406, 14780) [77.444s][info][gc,stats ] Concurrent Reset After Collect = 0.044 s (a = 3150 us) (n = 14) (lvls, us = 1074, 1504, 1895, 4121, 8952) [2] [77.304s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3036 us) (n = 14) (lvls, us = 1152, 1211, 1289, 1328, 14872) [77.305s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3297 us) (n = 14) (lvls, us = 939, 1602, 2148, 3945, 8744) [4] [76.898s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3048 us) (n = 14) (lvls, us = 1152, 1230, 1270, 1328, 14989) [76.898s][info][gc,stats ] Concurrent Reset After Collect = 0.045 s (a = 3215 us) (n = 14) (lvls, us = 1016, 1309, 1914, 3301, 7076) [8] [77.916s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3067 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1309, 15091) [77.916s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3050 us) (n = 14) (lvls, us = 1133, 1484, 1934, 3086, 8113) [16] [77.071s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3019 us) (n = 14) (lvls, us = 1152, 1250, 1270, 1328, 14615) [77.071s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3284 us) (n = 14) (lvls, us = 932, 1523, 2090, 2930, 8841) [32] [76.965s][info][gc,stats ] Concurrent Reset = 0.044 s (a = 3117 us) (n = 14) (lvls, us = 1191, 1211, 1328, 1348, 14768) [76.965s][info][gc,stats ] Concurrent Reset After Collect = 0.047 s (a = 3323 us) (n = 14) (lvls, us = 930, 1406, 1875, 4316, 8565) [64] [77.255s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3033 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1406, 14635) [77.255s][info][gc,stats ] Concurrent Reset After Collect = 0.054 s (a = 3862 us) (n = 14) (lvls, us = 1133, 1504, 2852, 5508, 8947) [128] [76.502s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3027 us) (n = 14) (lvls, us = 1133, 1230, 1250, 1426, 14264) [76.502s][info][gc,stats ] Concurrent Reset After Collect = 0.053 s (a = 3762 us) (n = 14) (lvls, us = 1172, 1582, 2129, 5273, 9272) [256] [76.751s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3057 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1426, 14713) [76.751s][info][gc,stats ] Concurrent Reset After Collect = 0.056 s (a = 4029 us) (n = 14) (lvls, us = 1484, 1602, 3027, 4629, 11267) [512] [77.508s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3082 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1426, 14893) [77.508s][info][gc,stats ] Concurrent Reset After Collect = 0.068 s (a = 4822 us) (n = 14) (lvls, us = 1953, 2285, 3633, 5605, 16366) [1024] [76.933s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3073 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1426, 14957) [76.933s][info][gc,stats ] Concurrent Reset After Collect = 0.082 s (a = 5877 us) (n = 14) (lvls, us = 1895, 3203, 4258, 7793, 15587) [2048] [76.746s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3022 us) (n = 14) (lvls, us = 1133, 1172, 1211, 1406, 14586) [76.746s][info][gc,stats ] Concurrent Reset After Collect = 0.099 s (a = 7104 us) (n = 14) (lvls, us = 1875, 3281, 4590, 7695, 19292) [4096] [77.356s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3031 us) (n = 14) (lvls, us = 1133, 1191, 1250, 1426, 14606) [77.356s][info][gc,stats ] Concurrent Reset After Collect = 0.101 s (a = 7213 us) (n = 14) (lvls, us = 1914, 3262, 4238, 7871, 19862) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582863336 From wkemper at openjdk.org Tue Dec 2 22:15:36 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 Dec 2025 22:15:36 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <6TAyVPTOcp6ykpzuzVuTP04UU8mk0tSkakJEinh4dnA=.4c43b199-09b5-4b9c-984d-c69cdca0b294@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> <6TAyVPTOcp6ykpzuzVuTP04UU8mk0tSkakJEinh4dnA=.4c43b199-09b5-4b9c-984d-c69cdca0b294@github.com> Message-ID: On Tue, 2 Dec 2025 21:40:29 GMT, Xiaolong Peng wrote: >> The intention of the change is to let `ShenandoahResetBitmapClosure` not use the ShenandoahParallelRegionStride global value at all, here is the reasons: >> ShenandoahParallelRegionStride is usually set to a large value, the default value used to be 1024, large value won't help with the performance ShenandoahHeapRegionClosure at all for some reasons: >> 1. ShenandoahResetBitmapClosure reset the marking bitmaps before/after GC cycle, the resetting may not not needed for each region. e.g. when `top_bitmap == bottom`(immediate trash regions?) or the region is not current gc generation. >> 2. Withe large ShenandoahParallelRegionStride, each task will get large number of successive regions, e.g. worker 0 will process region 1 to region 1024, in this way it is not possible to make sure the actual workload is evenly distributed to all workers, some of the workers may have most of the regions need bitmap reset, some of the worker may not really do any actual bitmap reset at all. >> >> A smaller parallel region stride value will help with the workload distribution and also make it adaptive to different number of workers, it should be also working just fine with "worker surge" > > In the JBS bug report, I attached a test I did for this, I have tested value from 1 to 4096: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -XX:ShenandoahParallelRegionStride=-jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > > > [1] > [77.444s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3078 us) (n = 14) (lvls, us = 1172, 1289, 1328, 1406, 14780) > [77.444s][info][gc,stats ] Concurrent Reset After Collect = 0.044 s (a = 3150 us) (n = 14) (lvls, us = 1074, 1504, 1895, 4121, 8952) > > [2] > [77.304s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3036 us) (n = 14) (lvls, us = 1152, 1211, 1289, 1328, 14872) > [77.305s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3297 us) (n = 14) (lvls, us = 939, 1602, 2148, 3945, 8744) > > > [4] > [76.898s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3048 us) (n = 14) (lvls, us = 1152, 1230, 1270, 1328, 14989) > [76.898s][info][gc,stats ] Concurrent Reset After Collect = 0.045 s (a = 3215 us) (n = 14) (lvls, us = 1016, 1309, 1914, 3301, 7076) > > [8] > [77.916s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3067 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1309, 15091) > [77.916s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3050 us) (n = 14) (lvls, us = 1133, 1484, 1934, 3086, 8113) > > [16] > [77.071s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3019 us) (n = 14) (lvls, us = 1152, 1250, 1270, 1328, 14615) > [77.071s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3284 us) (n = 14) (lvls, us = 932, 1523, 2090, 2930, 8841) > > [32] > [76.965s][info][gc,stats ] Concurrent Reset = 0.044 s (a = 3117 us) (n = 14) (lvls, us = 1191, 1211, 1328, 1348, 14768) > [76.965s][info][gc,stats ] Concurrent Reset After Collect = 0.047 s (a = 3323 us) (n = 14) (lvls, us = 930, 1406, 1875, 4316, 8565) > > > [64] > [77.255s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3033 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1406, 14635) > [77.255s][info][gc,stats ] Concurrent Reset After Collect = 0.054 s (a = 3862 us) (n = 14) (lvls, us = 1133, 1504, 2852, 5508, 8947) > > [128] > [76.502s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3027 us) (n = 14) (lvls, us = 1133, 1230, 1250, 1426, 14264) > [76.502s][info][gc,stats ] Concurrent Reset After Collect = 0.053 s (a = 3762 us) (n = 14) (lvls, us = 1172, 15... Maybe amend the comment to explain that using a smaller value yields better task distribution for a lumpy workload like resetting bitmaps? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582942471 From xpeng at openjdk.org Tue Dec 2 23:28:58 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 23:28:58 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: > In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. > > In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. > > Test result: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > With the change: > > [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) > [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) > > Original: > > > [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) > [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) > > > The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. > > ### Other tests > - [x] hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add more comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28613/files - new: https://git.openjdk.org/jdk/pull/28613/files/06f27543..3b964995 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28613&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28613&range=00-01 Stats: 14 lines in 2 files changed: 10 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28613/head:pull/28613 PR: https://git.openjdk.org/jdk/pull/28613 From xpeng at openjdk.org Tue Dec 2 23:29:00 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 23:29:00 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> <6TAyVPTOcp6ykpzuzVuTP04UU8mk0tSkakJEinh4dnA=.4c43b199-09b5-4b9c-984d-c69cdca0b294@github.com> Message-ID: On Tue, 2 Dec 2025 22:13:12 GMT, William Kemper wrote: >> In the JBS bug report, I attached a test I did for this, I have tested value from 1 to 4096: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -XX:ShenandoahParallelRegionStride=-jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> >> >> [1] >> [77.444s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3078 us) (n = 14) (lvls, us = 1172, 1289, 1328, 1406, 14780) >> [77.444s][info][gc,stats ] Concurrent Reset After Collect = 0.044 s (a = 3150 us) (n = 14) (lvls, us = 1074, 1504, 1895, 4121, 8952) >> >> [2] >> [77.304s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3036 us) (n = 14) (lvls, us = 1152, 1211, 1289, 1328, 14872) >> [77.305s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3297 us) (n = 14) (lvls, us = 939, 1602, 2148, 3945, 8744) >> >> >> [4] >> [76.898s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3048 us) (n = 14) (lvls, us = 1152, 1230, 1270, 1328, 14989) >> [76.898s][info][gc,stats ] Concurrent Reset After Collect = 0.045 s (a = 3215 us) (n = 14) (lvls, us = 1016, 1309, 1914, 3301, 7076) >> >> [8] >> [77.916s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3067 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1309, 15091) >> [77.916s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3050 us) (n = 14) (lvls, us = 1133, 1484, 1934, 3086, 8113) >> >> [16] >> [77.071s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3019 us) (n = 14) (lvls, us = 1152, 1250, 1270, 1328, 14615) >> [77.071s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3284 us) (n = 14) (lvls, us = 932, 1523, 2090, 2930, 8841) >> >> [32] >> [76.965s][info][gc,stats ] Concurrent Reset = 0.044 s (a = 3117 us) (n = 14) (lvls, us = 1191, 1211, 1328, 1348, 14768) >> [76.965s][info][gc,stats ] Concurrent Reset After Collect = 0.047 s (a = 3323 us) (n = 14) (lvls, us = 930, 1406, 1875, 4316, 8565) >> >> >> [64] >> [77.255s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3033 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1406, 14635) >> [77.255s][info][gc,stats ] Concurrent Reset After Collect = 0.054 s (a = 3862 us) (n = 14) (lvls, us = 1133, 1504, 2852, 5508, 8947) >> >> [128] >> [76.502s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3027 us) (n = 14) (lvls, us = 1133, 1230, 1250, 1426, 14264) >> [76.502s][info][gc,stats ] Concur... > > Maybe amend the comment to explain that using a smaller value yields better task distribution for a lumpy workload like resetting bitmaps? I have added more comments on ShenandoahResetBitmapClosure and the base class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2583109244 From xpeng at openjdk.org Wed Dec 3 00:16:10 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 00:16:10 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> <6TAyVPTOcp6ykpzuzVuTP04UU8mk0tSkakJEinh4dnA=.4c43b199-09b5-4b9c-984d-c69cdca0b294@github.com> Message-ID: On Tue, 2 Dec 2025 23:24:11 GMT, Xiaolong Peng wrote: >> Maybe amend the comment to explain that using a smaller value yields better task distribution for a lumpy workload like resetting bitmaps? > > I have added more comments on ShenandoahResetBitmapClosure and the base class. >Can we make this change more "generic"? I thought about making it more "generic", the current design with new method `parallel_region_stride` make it possible to customize the behavior if needed. I was looking into other closures which may have similar problems but impact should be much smaller than this one: 1. ShenandoahMergeWriteTable: Copy the write-version of the card-table into the read-version, clearing the write-copy, only for old gen. 2. ShenandoahEnsureHeapActiveClosure: Make sure regions are in good state: committed, active, clean, it may commit region if the region is not committed. Only in FullGC, also it is not threa-safe(but should be) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2583186822 From kdnilsen at openjdk.org Wed Dec 3 01:00:11 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 3 Dec 2025 01:00:11 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: Message-ID: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> On Tue, 2 Dec 2025 18:40:16 GMT, Xiaolong Peng wrote: >> Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: >> >> * ShenandoahAllocator: base class the allocator, most of the allocation code is in the class. >> * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. >> * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. >> * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` >> >> I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: >> >> 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. >> >> java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" >> >> >> Openjdk TIP: >> >> ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== >> ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== >> ===== DaCapo tail laten... > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: > > - Add missing header for ShenandoahFreeSetPartitionId > - Declare ShenandoahFreeSetPartitionId as enum instead of enum class > - Fix a typo > - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php > - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition > - Port the fix of JDK-8372566 > - Merge branch 'master' into cas-alloc-1 > - Merge remote-tracking branch 'origin/master' into cas-alloc-1 > - Remove junk code > - Remove unnecessary change and tidy up > - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 I'm still reading through the code, but have these comments far... src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 80: > 78: break; > 79: case ShenandoahFreeSetPartitionId::OldCollector: > 80: _free_set->recompute_total_used 98: HeapWord* ShenandoahAllocator::attempt_allocation(ShenandoahAllocRequest& req, bool& in_new_region) { > 99: if (_alloc_region_count == 0u) { > 100: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); Looking for more comments here as well. What does it mean that _alloc_region_count == 0? Does this mean we have not yet initialized the directly allocatable regions (following a particular GC event)? Or does it mean that we have depleted all of the available regions and we are out of memory? In the first case, it seems we would want to replenish our supply of directly allocatable regions while we hold the GC lock. In the second case, it seems there's really no value in even attempting a slow allocation. (If we were unable to refresh our directly allocatable regions, then it will not find allocatable memory even on the other side of the heap lock...) src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 121: > 119: template > 120: HeapWord* ShenandoahAllocator::attempt_allocation_slow(ShenandoahAllocRequest& req, bool& in_new_region) { > 121: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); I think this is an error. We don't want to acquire the lock here. We also don't want to introduce accounting_update here. Instead, I think these belong before line 130, in case we need to refresh the alloc regions. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 155: > 153: size_t min_free_words = req.is_lab_alloc() ? req.min_size() : req.size(); > 154: ShenandoahHeapRegion* r = _free_set->find_heap_region_for_allocation(ALLOC_PARTITION, min_free_words, req.is_lab_alloc(), in_new_region); > 155: // The region returned by find_heap_region_for_allocation must have sufficient free space for the allocation it if it is not nullptr comment has an extra "it" src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 158: > 156: if (r != nullptr) { > 157: bool ready_for_retire = false; > 158: obj = atomic_allocate_in(r, false, req, in_new_region, ready_for_retire); Not sure why we use atomic_allocate_in() here. We hold the heap lock so we don't need to use atomic operations. We should clarify with comments. src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 69: > 67: > 68: // Attempt to allocate in shared alloc regions, the allocation attempt is done with atomic operation w/o > 69: // holding heap lock. I would rewrite comment: // Attempt to allocate in a shared alloc region using atomic operation without holding the heap lock. // Returns nullptr and overwrites regions_ready_for_refresh with the number of shared alloc regions that are ready // to be retired if it is unable to satisfy the allocation request from the existing shared alloc regions. ------------- PR Review: https://git.openjdk.org/jdk/pull/26171#pullrequestreview-3532274683 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582914041 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582950768 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582966259 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582922617 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582936150 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582959962 From kdnilsen at openjdk.org Wed Dec 3 01:00:12 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 3 Dec 2025 01:00:12 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: <30wgBRAA7e-TzwsWTFagwcJdAxRcOeFD6-WJS-ashhA=.05995833-528a-4dec-820f-0307bf769520@github.com> On Tue, 2 Dec 2025 22:00:17 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: >> >> - Add missing header for ShenandoahFreeSetPartitionId >> - Declare ShenandoahFreeSetPartitionId as enum instead of enum class >> - Fix a typo >> - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php >> - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition >> - Port the fix of JDK-8372566 >> - Merge branch 'master' into cas-alloc-1 >> - Merge remote-tracking branch 'origin/master' into cas-alloc-1 >> - Remove junk code >> - Remove unnecessary change and tidy up >> - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 80: > >> 78: break; >> 79: case ShenandoahFreeSetPartitionId::OldCollector: >> 80: _free_set->recompute_total_used > These parameters seem overly conservative. Can we distinguish what needs to be recomputed? > Normally, OldCollector allocation does not change UsedByMutator or UsedByCollector. It will only change MutatorEmpties if we did flip_to_old. It will normally not changed OldCollectorEmpties (unless it flips multiple mutator to OldCollector. if might flip one region from mutator, but that region will not be empty after we allocate fro it... I suppose we could use conservative values for a first implementation, as long as we file a "low priority" ticket to come back and revisit for improved efficiency at a later time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582916275 From wkemper at openjdk.org Wed Dec 3 01:07:03 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 01:07:03 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Tue, 2 Dec 2025 23:28:58 GMT, Xiaolong Peng wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add more comments. Looks good to me. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3532721468 From xpeng at openjdk.org Wed Dec 3 01:09:03 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 01:09:03 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <30wgBRAA7e-TzwsWTFagwcJdAxRcOeFD6-WJS-ashhA=.05995833-528a-4dec-820f-0307bf769520@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> <30wgBRAA7e-TzwsWTFagwcJdAxRcOeFD6-WJS-ashhA=.05995833-528a-4dec-820f-0307bf769520@github.com> Message-ID: On Tue, 2 Dec 2025 22:01:17 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 80: >> >>> 78: break; >>> 79: case ShenandoahFreeSetPartitionId::OldCollector: >>> 80: _free_set->recompute_total_used> >> These parameters seem overly conservative. Can we distinguish what needs to be recomputed? >> Normally, OldCollector allocation does not change UsedByMutator or UsedByCollector. It will only change MutatorEmpties if we did flip_to_old. It will normally not changed OldCollectorEmpties (unless it flips multiple mutator to OldCollector. if might flip one region from mutator, but that region will not be empty after we allocate fro it... > > I suppose we could use conservative values for a first implementation, as long as we file a "low priority" ticket to come back and revisit for improved efficiency at a later time. We don't really know what need to be recompute until the allocation finishes, we can make it less conservative, but then we needs more code branches here because the template methods require explicit template parameters. I'll create to ticket to follow up on this, given that I also want to see if we can defer the recomputation to the read side, if we can do that we don't even need the ShenandoahHeapAccountingUpdater here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2583271726 From xpeng at openjdk.org Wed Dec 3 01:12:23 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 01:12:23 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Tue, 2 Dec 2025 22:16:56 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: >> >> - Add missing header for ShenandoahFreeSetPartitionId >> - Declare ShenandoahFreeSetPartitionId as enum instead of enum class >> - Fix a typo >> - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php >> - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition >> - Port the fix of JDK-8372566 >> - Merge branch 'master' into cas-alloc-1 >> - Merge remote-tracking branch 'origin/master' into cas-alloc-1 >> - Remove junk code >> - Remove unnecessary change and tidy up >> - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 100: > >> 98: HeapWord* ShenandoahAllocator::attempt_allocation(ShenandoahAllocRequest& req, bool& in_new_region) { >> 99: if (_alloc_region_count == 0u) { >> 100: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); > > Looking for more comments here as well. What does it mean that _alloc_region_count == 0? Does this mean we have not yet initialized the directly allocatable regions (following a particular GC event)? Or does it mean that we have depleted all of the available regions and we are out of memory? In the first case, it seems we would want to replenish our supply of directly allocatable regions while we hold the GC lock. In the second case, it seems there's really no value in even attempting a slow allocation. (If we were unable to refresh our directly allocatable regions, then it will not find allocatable memory even on the other side of the heap lock...) I'll add comments on this, _alloc_region_count == 0 means we don't want to use any shared alloc region, it will also allocate with a heap lock, ideally the performance should be same as before, so it always simply find a region with enough space and allocate in the region. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2583283894 From xpeng at openjdk.org Wed Dec 3 01:17:43 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 01:17:43 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Tue, 2 Dec 2025 22:24:55 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: >> >> - Add missing header for ShenandoahFreeSetPartitionId >> - Declare ShenandoahFreeSetPartitionId as enum instead of enum class >> - Fix a typo >> - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php >> - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition >> - Port the fix of JDK-8372566 >> - Merge branch 'master' into cas-alloc-1 >> - Merge remote-tracking branch 'origin/master' into cas-alloc-1 >> - Remove junk code >> - Remove unnecessary change and tidy up >> - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 121: > >> 119: template >> 120: HeapWord* ShenandoahAllocator::attempt_allocation_slow(ShenandoahAllocRequest& req, bool& in_new_region) { >> 121: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); > > I think this is an error. We don't want to acquire the lock here. We also don't want to introduce accounting_update here. Instead, I think these belong before line 130, in case we need to refresh the alloc regions. It is not an error, before calling into attempt_allocation_slow, it already called attempt_allocation_in_alloc_regions once and failed to allocate, slow path is always with heap lock. After taking the lock, we should try the attempt_allocation_in_alloc_regions right away, because other mutator thread may have refreshed the alloc regions while holding the lock. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2583295858 From xpeng at openjdk.org Wed Dec 3 01:17:44 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 01:17:44 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Wed, 3 Dec 2025 01:13:41 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 121: >> >>> 119: template >>> 120: HeapWord* ShenandoahAllocator::attempt_allocation_slow(ShenandoahAllocRequest& req, bool& in_new_region) { >>> 121: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); >> >> I think this is an error. We don't want to acquire the lock here. We also don't want to introduce accounting_update here. Instead, I think these belong before line 130, in case we need to refresh the alloc regions. > > It is not an error, before calling into attempt_allocation_slow, it already called attempt_allocation_in_alloc_regions once and failed to allocate, slow path is always with heap lock. > > After taking the lock, we should try the attempt_allocation_in_alloc_regions right away, because other mutator thread may have refreshed the alloc regions while holding the lock. accounting_update is required for slow path, but you are right, it can be moved to somewhere later, e.g. line 128. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2583298253 From xpeng at openjdk.org Wed Dec 3 01:21:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 01:21:56 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Tue, 2 Dec 2025 22:10:34 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: >> >> - Add missing header for ShenandoahFreeSetPartitionId >> - Declare ShenandoahFreeSetPartitionId as enum instead of enum class >> - Fix a typo >> - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php >> - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition >> - Port the fix of JDK-8372566 >> - Merge branch 'master' into cas-alloc-1 >> - Merge remote-tracking branch 'origin/master' into cas-alloc-1 >> - Remove junk code >> - Remove unnecessary change and tidy up >> - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 158: > >> 156: if (r != nullptr) { >> 157: bool ready_for_retire = false; >> 158: obj = atomic_allocate_in(r, false, req, in_new_region, ready_for_retire); > > Not sure why we use atomic_allocate_in() here. We hold the heap lock so we don't need to use atomic operations. > We should clarify with comments. It is not really necessary to `atomic_allocate_in` here, but I wanted reuse some of the codes in atomic_allocate_in, we can discuss this later, I can change it back to non-atomic version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2583304908 From duke at openjdk.org Wed Dec 3 07:31:42 2025 From: duke at openjdk.org (Rui Li) Date: Wed, 3 Dec 2025 07:31:42 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect Message-ID: Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used soft_tail = Xmx - soft_max if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. Suggested fix: when deciding when to trigger gc, use logic similar to below: mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; available = mutator_soft_capacity - used; if (available < mutator_soft_capacity) // trigger gc ``` ------- This change also improved gc logging: Before: [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B After: [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: 122, Reserved: 102M, Max free available in a single region: 1024K; ------------- Commit messages: - 8372543: Shenandoah: undercalculated the available size when soft max takes effect Changes: https://git.openjdk.org/jdk/pull/28622/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372543 Stats: 226 lines in 7 files changed: 157 ins; 44 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/28622.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28622/head:pull/28622 PR: https://git.openjdk.org/jdk/pull/28622 From duke at openjdk.org Wed Dec 3 08:37:02 2025 From: duke at openjdk.org (Harshit470250) Date: Wed, 3 Dec 2025 08:37:02 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: > This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: - add guard to the include - add load_reference_barrier_Type - add clone_barrier_Type - add write_barrier_pre_Type - revert shenandoah changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27279/files - new: https://git.openjdk.org/jdk/pull/27279/files/6e6a2bbf..4dfa36ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27279&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27279&range=01-02 Stats: 145 lines in 5 files changed: 67 ins; 73 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27279/head:pull/27279 PR: https://git.openjdk.org/jdk/pull/27279 From shade at openjdk.org Wed Dec 3 10:14:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 10:14:51 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: <_UCh_KR6-uzVFOJ9MM-gK3gsTessZ03FuecOFkS2F8c=.86fbe32d-b450-4285-8906-ead7bd003b8f@github.com> On Tue, 2 Dec 2025 23:28:58 GMT, Xiaolong Peng wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add more comments. Changes requested by shade (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 88: > 86: // Using a smaller value here yields better task distribution for a lumpy workload. The task will be split > 87: // into smaller batches with 8 regions in batch, the worker processes more regions w/o needs to reset bitmaps > 88: // will process more batches, but overall all workers will be saturated throughout the whole concurrent reset phase. I have a very general comment about writing comments like this one. This entire block of prose is really excessive, is set up to be outdated (are you tracking the real behavior of `SH::parallel_heap_region_iterate` and its magical `4096`?), and can be boiled down to much more succinct: Bitmap reset task is heavy-weight and benefits from much smaller tasks than the default. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 119: > 117: // ShenandoahHeap::parallel_heap_region_iterate will derive a reasonable value based > 118: // on active worker threads and number of regions. > 119: // For some lumpy workload, the value can be overridden for better task distribution. Again, excessive. You can just drop the comment; its purpose is obvious from the code. ------------- PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3534247421 PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2584465890 PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2584468632 From eastigeevich at openjdk.org Wed Dec 3 14:55:17 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 14:55:17 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v12] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Benchmarking results: Neoverse-N1 r3p1 (Graviton 2) > > - Baseline > > $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1Errata1542419 -XX:+UseZGC -XX:ZYoungGCThreads=1 -XX:ZOldGC... Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Remove trailing whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/79f9a2a0..8c5ef0e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From eastigeevich at openjdk.org Wed Dec 3 15:13:21 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 15:13:21 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v3] In-Reply-To: <-cnMy4YHNCrKRqt_2Kkh9ksi-qE8ndZLB5yoyKkS3gM=.3f328f98-15a2-4736-9a6c-f9ab0705b830@github.com> References: <-cnMy4YHNCrKRqt_2Kkh9ksi-qE8ndZLB5yoyKkS3gM=.3f328f98-15a2-4736-9a6c-f9ab0705b830@github.com> Message-ID: On Tue, 25 Nov 2025 13:04:55 GMT, Andrew Haley wrote: >> Yeah patching all nmethods as one unit is basically equivalent to making the code cache processing a STW operation. Last time we processed the code cache STW was JDK 11. A dark place I don't want to go back to. It can get pretty big and mess up latency. So I'm in favour of limiting the fix and not re-introduce STW code cache processing. >> >> Otherwise yes you are correct; we perform synchronous cross modifying code with no assumptions about instruction cache coherency because we didn't trust it would actually work for all ARM implementations. Seems like that was a good bet. We rely on it on x64 still though. >> >> It's a bit surprising to me if they invalidate all TLB entries, effectively ripping out the entire virtual address space, even when a range is passed in. If so, a horrible alternative might be to use mprotect to temporarily remove execution permission on the affected per nmethod pages, and detect over shooting in the signal handler, resuming execution when execution privileges are then restored immediately after. That should limit the affected VA to close to what is actually invalidated. But it would look horrible. > >> It's a bit surprising to me if they invalidate all TLB entries, effectively ripping out the entire virtual address space, even when a range is passed in. If so, > > "Because the cache-maintenance wasn't needed, we can do the TLBI instead. > In fact, the I-Cache line-size isn't relevant anymore, we can reduce > the number of traps by producing a fake value. > > "For user-space, the kernel's work is now to trap CTR_EL0 to hide DIC, > and produce a fake IminLine. EL3 traps the now-necessary I-Cache > maintenance and performs the inner-shareable-TLBI that makes everything > better." > > My interpretation of this is that we only need to do the synchronization dance once, at the end of the patching. But I guess we don't know exactly if we have an affected core or if the kernel workaround is in action. @theRealAph @fisk @shipilev I have updated all places to use optimized icache invalidation. Could you please have a look? I am running different tests and benchmarks. @fisk @shipilev - I added `nmethod::has_non_immediate_oops`. I think it's easy to detect them when we generate code. If this is OK, we might need to update `ZNMethod::attach_gc_data` and `ShenandoahNMethod::detect_reloc_oops`. - Code of `G1NMethodClosure::do_evacuation_and_fixup(nmethod* nm)` looks strange: _oc.set_nm(nm); // Evacuate objects pointed to by the nmethod nm->oops_do(&_oc); if (_strong) { // CodeCache unloading support nm->mark_as_maybe_on_stack(); BarrierSetNMethod* bs_nm = BarrierSet::barrier_set()->barrier_set_nmethod(); bs_nm->disarm(nm); } ICacheInvalidationContext icic(nm->has_non_immediate_oops()); nm->fix_oop_relocations(); If `_strong` is true, we disarm `nm` and patch it with `fix_oop_relocations`. I have assertions checking we can defer icache invalidation. Neither of them are triggered. I thing this path always happens at a safepoint. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3607330040 From eastigeevich at openjdk.org Wed Dec 3 15:42:38 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 15:42:38 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Benchmarking results: Neoverse-N1 r3p1 (Graviton 2) > > - Baseline > > $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1Errata1542419 -XX:+UseZGC -XX:ZYoungGCThreads=1 -XX:ZOldGC... Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Fix linux-cross-compile build aarch64 - Merge branch 'master' into JDK-8370947 - Remove trailing whitespaces - Add support of deferred icache invalidation to other GCs and JIT - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence - Add jtreg test - Fix linux-cross-compile aarch64 build - Fix regressions for Java methods without field accesses - Fix code style - Correct ifdef; Add dsb after ic - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f ------------- Changes: https://git.openjdk.org/jdk/pull/28328/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=12 Stats: 879 lines in 25 files changed: 839 ins; 7 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From shade at openjdk.org Wed Dec 3 16:14:05 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 16:14:05 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 15:42:38 GMT, Evgeny Astigeevich wrote: >> Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. >> >> Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: >> - Disable coherent icache. >> - Trap IC IVAU instructions. >> - Execute: >> - `tlbi vae3is, xzr` >> - `dsb sy` >> >> `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. >> >> As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: >> >> "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." >> >> This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. >> >> Changes include: >> >> * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. >> * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. >> * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. >> * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. >> >> Benchmarking results: Neoverse-N1 r3p1 (Graviton 2) >> >> - Baseline >> >> $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1... > > Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Fix linux-cross-compile build aarch64 > - Merge branch 'master' into JDK-8370947 > - Remove trailing whitespaces > - Add support of deferred icache invalidation to other GCs and JIT > - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence > - Add jtreg test > - Fix linux-cross-compile aarch64 build > - Fix regressions for Java methods without field accesses > - Fix code style > - Correct ifdef; Add dsb after ic > - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f Interesting work! I was able to look through it very briefly: src/hotspot/cpu/aarch64/globals_aarch64.hpp line 133: > 131: "Enable workaround for Neoverse N1 erratum 1542419") \ > 132: product(bool, UseDeferredICacheInvalidation, false, DIAGNOSTIC, \ > 133: "Defer multiple ICache invalidation to single invalidation") \ Since the `ICacheInvalidationContext` is in shared code, and I suppose x86_64 would also benefit from this (at least eventually), this sounds like `globals.hpp` option. src/hotspot/share/asm/codeBuffer.cpp line 371: > 369: !((oop_Relocation*)reloc)->oop_is_immediate()) { > 370: _has_non_immediate_oops = true; > 371: } Honestly, this looks fragile? We can go into nmethods patching for some other reason, not for patching oops. Also, we still might need to go and patch immediate oops? I see this: // Instruct loadConP of x86_64.ad places oops in code that are not also // listed in the oop section. static bool mustIterateImmediateOopsInCode() { return true; } Is there a substantial loss is doing icache invalidation without checking for the existence of interesting oops? Do you have an idea how many methods this filters? src/hotspot/share/asm/codeBuffer.cpp line 939: > 937: // Move all the code and relocations to the new blob: > 938: relocate_code_to(&cb); > 939: } Here and later, the preferred style is: Suggestion: // Move all the code and relocations to the new blob: { ICacheInvalidationContext icic(ICacheInvalidation::NOT_NEEDED); relocate_code_to(&cb); } src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.cpp line 37: > 35: #include "memory/universe.hpp" > 36: #include "runtime/atomicAccess.hpp" > 37: #include "runtime/icache.hpp" Include is added, but no actual use? Is something missing, or this is a leftover include? test/hotspot/jtreg/gc/TestDeferredICacheInvalidation.java line 28: > 26: > 27: /* > 28: * @test id=ParallelGC Usually just: Suggestion: * @test id=parallel test/hotspot/jtreg/gc/TestDeferredICacheInvalidation.java line 34: > 32: * @requires vm.debug > 33: * @requires os.family=="linux" > 34: * @requires os.arch=="aarch64" I am guessing it is more future-proof to drop Linux/AArch64 filters, and rely on test doing the right thing, regardless of the config. I see it already skips when `UseDeferredICacheInvalidation` is off. test/micro/org/openjdk/bench/vm/gc/GCPatchingNmethodCost.java line 184: > 182: @Benchmark > 183: @Warmup(iterations = 0) > 184: @Measurement(iterations = 1) Not sure what is the intent here. Maybe you wanted `@BenchmarkMode(OneShot)` instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/28328#pullrequestreview-3535752098 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585729392 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585679778 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585704068 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585707389 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585735476 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585734553 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585743873 From xpeng at openjdk.org Wed Dec 3 16:23:29 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 16:23:29 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v3] In-Reply-To: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: > In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. > > In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. > > Test result: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > With the change: > > [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) > [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) > > Original: > > > [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) > [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) > > > The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. > > ### Other tests > - [x] hotspot_gc_shenandoah > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Simplify comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28613/files - new: https://git.openjdk.org/jdk/pull/28613/files/3b964995..892676c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28613&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28613&range=01-02 Stats: 14 lines in 2 files changed: 0 ins; 13 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28613/head:pull/28613 PR: https://git.openjdk.org/jdk/pull/28613 From shade at openjdk.org Wed Dec 3 16:23:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 16:23:30 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v3] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Wed, 3 Dec 2025 16:20:25 GMT, Xiaolong Peng wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Simplify comments Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3535887356 From xpeng at openjdk.org Wed Dec 3 16:23:32 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 16:23:32 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: <_UCh_KR6-uzVFOJ9MM-gK3gsTessZ03FuecOFkS2F8c=.86fbe32d-b450-4285-8906-ead7bd003b8f@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <_UCh_KR6-uzVFOJ9MM-gK3gsTessZ03FuecOFkS2F8c=.86fbe32d-b450-4285-8906-ead7bd003b8f@github.com> Message-ID: <5mKXql8U-bSulRVIzoVQXqwcQXlm24-3xExvFAk5oYU=.0ddb4082-5d95-4c08-9c8a-125585d05af4@github.com> On Wed, 3 Dec 2025 10:10:45 GMT, Aleksey Shipilev wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more comments. > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 88: > >> 86: // Using a smaller value here yields better task distribution for a lumpy workload. The task will be split >> 87: // into smaller batches with 8 regions in batch, the worker processes more regions w/o needs to reset bitmaps >> 88: // will process more batches, but overall all workers will be saturated throughout the whole concurrent reset phase. > > I have a very general comment about writing comments like this one. This entire block of prose is really excessive, is set up to be outdated (are you tracking the real behavior of `SH::parallel_heap_region_iterate` and its magical `4096`?), and can be boiled down to much more succinct: > > > Bitmap reset task is heavy-weight and benefits from much smaller tasks than the default. Thanks a lot! I have updated the PR to use the succinct one you suggested. > src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 119: > >> 117: // ShenandoahHeap::parallel_heap_region_iterate will derive a reasonable value based >> 118: // on active worker threads and number of regions. >> 119: // For some lumpy workload, the value can be overridden for better task distribution. > > Again, excessive. You can just drop the comment; its purpose is obvious from the code. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2585783241 PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2585784611 From btaylor at openjdk.org Wed Dec 3 17:26:19 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Wed, 3 Dec 2025 17:26:19 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered Message-ID: The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build ------------- Commit messages: - 8373039: Remove Incorrect Asserts in shenandoahScanRemembered Changes: https://git.openjdk.org/jdk/pull/28642/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28642&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373039 Stats: 9 lines in 1 file changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28642/head:pull/28642 PR: https://git.openjdk.org/jdk/pull/28642 From wkemper at openjdk.org Wed Dec 3 17:47:47 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 17:47:47 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 17:16:02 GMT, Ben Taylor wrote: > The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. > > A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 > > This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 384: > 382: oop obj = cast_to_oop(p); > 383: assert(oopDesc::is_oop(obj), "Should be an object"); > 384: assert(p <= left, "p should start at or before left end of card"); I think it's fine to take out this loop, but the assert on 384 now seems redundant to the assert on 363. I'm also not sure if the assert on 385 necessarily holds because `p` is no longer increased in the loop. Maybe remove this whole `#ifdef ASSERT` block, or leave in the loop and just take out the `Klass::is_valid` usage. ------------- PR Review: https://git.openjdk.org/jdk/pull/28642#pullrequestreview-3536232605 PR Review Comment: https://git.openjdk.org/jdk/pull/28642#discussion_r2586068246 From ysr at openjdk.org Wed Dec 3 18:30:29 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 3 Dec 2025 18:30:29 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 17:45:31 GMT, William Kemper wrote: >> The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. >> >> A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 >> >> This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build > > src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 384: > >> 382: oop obj = cast_to_oop(p); >> 383: assert(oopDesc::is_oop(obj), "Should be an object"); >> 384: assert(p <= left, "p should start at or before left end of card"); > > I think it's fine to take out this loop, but the assert on 384 now seems redundant to the assert on 363. I'm also not sure if the assert on 385 necessarily holds because `p` is no longer increased in the loop. Maybe remove this whole `#ifdef ASSERT` block, or leave in the loop and just take out the `Klass::is_valid` usage. I agree. In addition, the comment should be updated so it doesn't make the confusing reference to "the loop that follows", which just went away, etc. It's fine to leave a suitably modified comment as to why it is safe to query the size of the object at the oop being returned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28642#discussion_r2586182968 From ysr at openjdk.org Wed Dec 3 18:30:30 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 3 Dec 2025 18:30:30 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:24:58 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 384: >> >>> 382: oop obj = cast_to_oop(p); >>> 383: assert(oopDesc::is_oop(obj), "Should be an object"); >>> 384: assert(p <= left, "p should start at or before left end of card"); >> >> I think it's fine to take out this loop, but the assert on 384 now seems redundant to the assert on 363. I'm also not sure if the assert on 385 necessarily holds because `p` is no longer increased in the loop. Maybe remove this whole `#ifdef ASSERT` block, or leave in the loop and just take out the `Klass::is_valid` usage. > > I agree. In addition, the comment should be updated so it doesn't make the confusing reference to "the loop that follows", which just went away, etc. It's fine to leave a suitably modified comment as to why it is safe to query the size of the object at the oop being returned. > I'm also not sure if the assert on 385 necessarily holds because p is no longer increased in the loop. It should hold for the oop/object being returned here. It's a post-condition of the method which should have been stated in its API spec I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28642#discussion_r2586190041 From eastigeevich at openjdk.org Wed Dec 3 18:48:32 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 18:48:32 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 16:10:55 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix linux-cross-compile build aarch64 >> - Merge branch 'master' into JDK-8370947 >> - Remove trailing whitespaces >> - Add support of deferred icache invalidation to other GCs and JIT >> - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence >> - Add jtreg test >> - Fix linux-cross-compile aarch64 build >> - Fix regressions for Java methods without field accesses >> - Fix code style >> - Correct ifdef; Add dsb after ic >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f > > test/micro/org/openjdk/bench/vm/gc/GCPatchingNmethodCost.java line 184: > >> 182: @Benchmark >> 183: @Warmup(iterations = 0) >> 184: @Measurement(iterations = 1) > > Not sure what is the intent here. Maybe you wanted `@BenchmarkMode(OneShot)` instead? The current algorithm: - Create an object used in Java methods. - Run the methods in the interpreter. - Compile the methods. - Make the object garbage collectable. - Run GC (we measure this). There are not many things to warm-up. And setting up everything for multiple iterations of GC runs might be expensive. Instead we use forks. IMO, Yes it is `@BenchmarkMode(OneShot)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586236955 From shade at openjdk.org Wed Dec 3 18:53:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 18:53:13 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:45:25 GMT, Evgeny Astigeevich wrote: >> test/micro/org/openjdk/bench/vm/gc/GCPatchingNmethodCost.java line 184: >> >>> 182: @Benchmark >>> 183: @Warmup(iterations = 0) >>> 184: @Measurement(iterations = 1) >> >> Not sure what is the intent here. Maybe you wanted `@BenchmarkMode(OneShot)` instead? > > The current algorithm: > - Create an object used in Java methods. > - Run the methods in the interpreter. > - Compile the methods. > - Make the object garbage collectable. > - Run GC (we measure this). > > There are not many things to warm-up. And setting up everything for multiple iterations of GC runs might be expensive. Instead we use forks. > > IMO, Yes it is `@BenchmarkMode(OneShot)`. Yeah, but first GC would likely be slower, because it would have more real work to do. So you probably want OneShot with the default number of iterations. It will warmup by doing a few GCs, and then do a few other GCs for measurement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586250541 From eastigeevich at openjdk.org Wed Dec 3 18:53:10 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 18:53:10 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 16:00:05 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix linux-cross-compile build aarch64 >> - Merge branch 'master' into JDK-8370947 >> - Remove trailing whitespaces >> - Add support of deferred icache invalidation to other GCs and JIT >> - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence >> - Add jtreg test >> - Fix linux-cross-compile aarch64 build >> - Fix regressions for Java methods without field accesses >> - Fix code style >> - Correct ifdef; Add dsb after ic >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f > > src/hotspot/share/asm/codeBuffer.cpp line 939: > >> 937: // Move all the code and relocations to the new blob: >> 938: relocate_code_to(&cb); >> 939: } > > Here and later, the preferred style is: > > Suggestion: > > // Move all the code and relocations to the new blob: > { > ICacheInvalidationContext icic(ICacheInvalidation::NOT_NEEDED); > relocate_code_to(&cb); > } I followed @xmas92 comments on style to use a blank line. @xmas92, what style should I follow? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586248135 From wkemper at openjdk.org Wed Dec 3 18:56:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 18:56:56 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 02:02:18 GMT, Rui Li wrote: > Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. > > Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: > > > available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used > soft_tail = Xmx - soft_max > if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc > > > The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. > > > Suggested fix: when deciding when to trigger gc, use logic similar to below: > > mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; > available = mutator_soft_capacity - used; > if (available < mutator_soft_capacity) // trigger gc > ``` > > ------- > This change also improved gc logging: > > Before: > > [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) > [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% > external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B > > > After: > > [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) > [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: > 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: > 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: > 122, Reserved: 102M, Max free available in a single region: 1024K; A few nits. Thank you for adding a test case for this! src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 240: > 238: size_t allocated = _space_info->bytes_allocated_since_gc_start(); > 239: > 240: log_debug(gc)("should_start_gc calculation: available: %zu%s, soft_max_capacity: %zu%s" Can we add `ergo` tag to this message? Let's use the `PROPERFMT` and `PROPERFMTARGS` macros here and in other log messages we're changing. src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 258: > 256: size_t min_threshold = min_free_threshold(); > 257: if (available < min_threshold) { > 258: log_trigger("Free (Soft mutator free) (%zu%s) is below minimum threshold (%zu%s)", Changing this will break some log parsers, do we really need this? src/hotspot/share/gc/shenandoah/heuristics/shenandoahCompactHeuristics.cpp line 52: > 50: size_t capacity = ShenandoahHeap::heap()->soft_max_capacity(); > 51: size_t available = _space_info->soft_available(); > 52: size_t allocated = _space_info->bytes_allocated_since_gc_start(); This shadows `bytes_allocated` below. Let's just use one variable for this. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 3209: > 3207: log_freeset_stats(ShenandoahFreeSetPartitionId::Mutator, ls); > 3208: log_freeset_stats(ShenandoahFreeSetPartitionId::Collector, ls); > 3209: if (_heap->mode()->is_generational()) {log_freeset_stats(ShenandoahFreeSetPartitionId::OldCollector, ls);} Suggestion: if (_heap->mode()->is_generational()) { log_freeset_stats(ShenandoahFreeSetPartitionId::OldCollector, ls); } src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 632: > 630: size_t get_usable_free_words(size_t free_bytes) const; > 631: > 632: void log_freeset_stats(ShenandoahFreeSetPartitionId partition_id, LogStream& ls); `log_freeset_stats` should probably be `private`. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28622#pullrequestreview-3536428634 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586232667 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586234993 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586237553 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586243946 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586247150 From eastigeevich at openjdk.org Wed Dec 3 19:51:32 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 19:51:32 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:50:44 GMT, Aleksey Shipilev wrote: >> The current algorithm: >> - Create an object used in Java methods. >> - Run the methods in the interpreter. >> - Compile the methods. >> - Make the object garbage collectable. >> - Run GC (we measure this). >> >> There are not many things to warm-up. And setting up everything for multiple iterations of GC runs might be expensive. Instead we use forks. >> >> IMO, Yes it is `@BenchmarkMode(OneShot)`. > > Yeah, but first GC would likely be slower, because it would have more real work to do. So you probably want OneShot with the default number of iterations. It will warmup by doing a few GCs, and then do a few other GCs for measurement. I have `Thread.sleep(1000)` in `setupCodeCache()` to let everything to settle down. I use it because I saw high variance in GC times. With it variance became OK. Maybe I should use `System.gc()` instead of `Thread.sleep`. > So you probably want OneShot with the default number of iterations. Will I need to recreate an object and to rerun Java methods before each iteration? The first iteration will collect garbage object `fields`. So following iterations running GC will do nothing. Or will they patch nmethods again? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586405992 From xpeng at openjdk.org Wed Dec 3 21:18:42 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 21:18:42 GMT Subject: RFR: 8373048: Genshen: Remove dead code from Shenandoah Message-ID: Trivial PR to remove dead code from Shenandoah. I noticed some dead code in shenandoahFreeSet.cpp when I was working on https://github.com/openjdk/jdk/pull/26171, this PR is to clean up the dead code in shenandoahFreeSet.cpp and some other files, no functional change at all. ### Test - [x] hotspot_gc_shenandoah - [ ] GHA ------------- Commit messages: - Removed dead code Changes: https://git.openjdk.org/jdk/pull/28647/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28647&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373048 Stats: 145 lines in 7 files changed: 0 ins; 143 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28647.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28647/head:pull/28647 PR: https://git.openjdk.org/jdk/pull/28647 From wkemper at openjdk.org Wed Dec 3 21:28:24 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 21:28:24 GMT Subject: RFR: 8373048: Genshen: Remove dead code from Shenandoah In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 20:55:09 GMT, Xiaolong Peng wrote: > Trivial PR to remove dead code from Shenandoah. I noticed some dead code in shenandoahFreeSet.cpp when I was working on https://github.com/openjdk/jdk/pull/26171, this PR is to clean up the dead code in shenandoahFreeSet.cpp and some other files, no functional change at all. > > ### Test > - [x] hotspot_gc_shenandoah > - [ ] GHA Nice cleanup, thank you! ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28647#pullrequestreview-3536983896 From kdnilsen at openjdk.org Wed Dec 3 21:33:59 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 3 Dec 2025 21:33:59 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v3] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Wed, 3 Dec 2025 16:23:29 GMT, Xiaolong Peng wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Simplify comments Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3537003493 From btaylor at openjdk.org Wed Dec 3 21:42:55 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Wed, 3 Dec 2025 21:42:55 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots Message-ID: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. ------------- Commit messages: - 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots Changes: https://git.openjdk.org/jdk/pull/28648/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28648&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373054 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28648.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28648/head:pull/28648 PR: https://git.openjdk.org/jdk/pull/28648 From wkemper at openjdk.org Wed Dec 3 21:42:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 21:42:56 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots In-Reply-To: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: On Wed, 3 Dec 2025 21:33:50 GMT, Ben Taylor wrote: > The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. > > The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. Let's change the misleading comment. src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.cpp line 147: > 145: ShenandoahReentrantLocker locker(nm_data->lock()); > 146: > 147: // Heal oops and disarm Suggestion: // Heal oops and leave the nmethod armed because code cache unloading needs to know about on-stack nmethods. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28648#pullrequestreview-3537029053 PR Review Comment: https://git.openjdk.org/jdk/pull/28648#discussion_r2586693272 From btaylor at openjdk.org Wed Dec 3 22:07:15 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Wed, 3 Dec 2025 22:07:15 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots [v2] In-Reply-To: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: > The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. > > The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: Fix misleading comment in previous commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28648/files - new: https://git.openjdk.org/jdk/pull/28648/files/d830a0a1..a1a9bf11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28648&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28648&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28648.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28648/head:pull/28648 PR: https://git.openjdk.org/jdk/pull/28648 From btaylor at openjdk.org Wed Dec 3 22:08:11 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Wed, 3 Dec 2025 22:08:11 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v2] In-Reply-To: References: Message-ID: > The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. > > A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 > > This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: Fix up comment and remove additional assert from previous commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28642/files - new: https://git.openjdk.org/jdk/pull/28642/files/03456d6f..eec662f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28642&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28642&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28642/head:pull/28642 PR: https://git.openjdk.org/jdk/pull/28642 From xpeng at openjdk.org Wed Dec 3 22:45:02 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 22:45:02 GMT Subject: RFR: 8373048: Genshen: Remove dead code from Shenandoah In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 20:55:09 GMT, Xiaolong Peng wrote: > Trivial PR to remove dead code from Shenandoah. I noticed some dead code in shenandoahFreeSet.cpp when I was working on https://github.com/openjdk/jdk/pull/26171, this PR is to clean up the dead code in shenandoahFreeSet.cpp and some other files, no functional change at all. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28647#issuecomment-3609153899 From xpeng at openjdk.org Wed Dec 3 22:46:14 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 22:46:14 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v3] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Wed, 3 Dec 2025 16:23:29 GMT, Xiaolong Peng wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Simplify comments Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28613#issuecomment-3609152280 From xpeng at openjdk.org Wed Dec 3 22:46:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 22:46:16 GMT Subject: Integrated: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Tue, 2 Dec 2025 18:59:25 GMT, Xiaolong Peng wrote: > In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. > > In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. > > Test result: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > With the change: > > [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) > [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) > > Original: > > > [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) > [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) > > > The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. > > ### Other tests > - [x] hotspot_gc_shenandoah > - [x] GHA This pull request has now been integrated. Changeset: db2a5420 Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/db2a5420a2e3d0f5f0f066eace37a8fd4f075802 Stats: 13 lines in 4 files changed: 12 ins; 0 del; 1 mod 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism Reviewed-by: kdnilsen, shade, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/28613 From xpeng at openjdk.org Wed Dec 3 22:49:07 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 22:49:07 GMT Subject: Integrated: 8373048: Genshen: Remove dead code from Shenandoah In-Reply-To: References: Message-ID: <7zQpw05McTnnh2XNSZ4jc1FIMOcGiKUObOM-_sZhAfo=.1a091412-513e-4018-97c0-62cd4b004016@github.com> On Wed, 3 Dec 2025 20:55:09 GMT, Xiaolong Peng wrote: > Trivial PR to remove dead code from Shenandoah. I noticed some dead code in shenandoahFreeSet.cpp when I was working on https://github.com/openjdk/jdk/pull/26171, this PR is to clean up the dead code in shenandoahFreeSet.cpp and some other files, no functional change at all. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA This pull request has now been integrated. Changeset: 8f8fda7c Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/8f8fda7c80b57e8a36827cc260f0be0e5d61f6a6 Stats: 145 lines in 7 files changed: 0 ins; 143 del; 2 mod 8373048: Genshen: Remove dead code from Shenandoah Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/28647 From duke at openjdk.org Wed Dec 3 22:50:49 2025 From: duke at openjdk.org (Rui Li) Date: Wed, 3 Dec 2025 22:50:49 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:43:48 GMT, William Kemper wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 240: > >> 238: size_t allocated = _space_info->bytes_allocated_since_gc_start(); >> 239: >> 240: log_debug(gc)("should_start_gc calculation: available: %zu%s, soft_max_capacity: %zu%s" > > Can we add `ergo` tag to this message? Let's use the `PROPERFMT` and `PROPERFMTARGS` macros here and in other log messages we're changing. Sure. > src/hotspot/share/gc/shenandoah/heuristics/shenandoahCompactHeuristics.cpp line 52: > >> 50: size_t capacity = ShenandoahHeap::heap()->soft_max_capacity(); >> 51: size_t available = _space_info->soft_available(); >> 52: size_t allocated = _space_info->bytes_allocated_since_gc_start(); > > This shadows `bytes_allocated` below. Let's just use one variable for this. Good catch. Removed one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586856567 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586856298 From xpeng at openjdk.org Wed Dec 3 23:24:29 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 23:24:29 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() Message-ID: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread ### Test - [ ] hotspot_gc_shenandoah - [ ] GHA ------------- Commit messages: - Remove direct use of alloc type from ShenandoahHeapRegion::adjust_alloc_metadata - Revert log change - Remove unnecessary use of ShenandoahAllocRequest.type() Changes: https://git.openjdk.org/jdk/pull/28649/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28649&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373056 Stats: 79 lines in 6 files changed: 12 ins; 20 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/28649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28649/head:pull/28649 PR: https://git.openjdk.org/jdk/pull/28649 From duke at openjdk.org Wed Dec 3 23:26:56 2025 From: duke at openjdk.org (Rui Li) Date: Wed, 3 Dec 2025 23:26:56 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:49:28 GMT, William Kemper wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 632: > >> 630: size_t get_usable_free_words(size_t free_bytes) const; >> 631: >> 632: void log_freeset_stats(ShenandoahFreeSetPartitionId partition_id, LogStream& ls); > > `log_freeset_stats` should probably be `private`. I thought it was private already? The `private` starts from [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp#L478). Or, if you expand this section a bit to line 636, another `public` starts after these declaration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586918976 From xpeng at openjdk.org Wed Dec 3 23:30:21 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 23:30:21 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v2] In-Reply-To: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: <1wB8K5uAm9h-sVDOlHskuhpH_kNuJIcxhBTHrkfDck0=.07cd6bd4-5cc5-4ec6-afbb-bb4b9cfa1cde@github.com> > Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. > > In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: > * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) > > Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread > > ### Test > - [x] hotspot_gc_shenandoah > - [ ] GHA Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8373056 - Remove direct use of alloc type from ShenandoahHeapRegion::adjust_alloc_metadata - Revert log change - Remove unnecessary use of ShenandoahAllocRequest.type() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28649/files - new: https://git.openjdk.org/jdk/pull/28649/files/28f802d8..59087c8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28649&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28649&range=00-01 Stats: 9796 lines in 279 files changed: 6048 ins; 2286 del; 1462 mod Patch: https://git.openjdk.org/jdk/pull/28649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28649/head:pull/28649 PR: https://git.openjdk.org/jdk/pull/28649 From eastigeevich at openjdk.org Wed Dec 3 23:36:03 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 23:36:03 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 15:54:24 GMT, Aleksey Shipilev wrote: > Honestly, this looks fragile? We can go into nmethods patching for some other reason, not for patching oops. For GCs on ARM64, I found only patching `nmethod::fix_oop_relocations` and patching ZGC barriers. This may be because `mustIterateImmediateOopsInCode` return false on ARM64. We will need to add support of instructions modified through `OopClosure::do_oop`. > Is there a substantial loss is doing icache invalidation without checking for the existence of interesting oops? Do you have an idea how many methods this filters? https://github.com/openjdk/jdk/pull/28328#issuecomment-3558673810 Axel (@xmas92) saw some SpecJVM regressions. I think they might be caused by the increased number of icache invalidation. We had not patched methods, no icache invalidation, before this PR and started always-icache invalidation after this PR. I will be checking SpecJVM, SpecJBB and other benchmarks (dacapo, renaissance). I might check if the following approach does not have much overhead: - In `nmethod::fix_oop_relocations` ICacheInvalidationContext icic(UseDeferredICacheInvalidation : ICacheInvalidation::DEFERRED ? ICacheInvalidation::IMMEDIATE); bool patching_code = false; while (iter.next()) { ... patching_code = reloc->fix_oop_relocation(); ... patching_code = reloc->fix_metadata_relocation(); } If (icic.mode() == ICacheInvalidation::DEFERRED && !patching_code) { icic.set_mode(ICacheInvalidation::NOT_NEEDED); } If it works, it will reduce amount of changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586934914 From xpeng at openjdk.org Wed Dec 3 23:41:26 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 23:41:26 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v15] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 256 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - Add missing header for ShenandoahFreeSetPartitionId - Declare ShenandoahFreeSetPartitionId as enum instead of enum class - Fix a typo - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition - Port the fix of JDK-8372566 - Merge branch 'master' into cas-alloc-1 - Merge remote-tracking branch 'origin/master' into cas-alloc-1 - Remove junk code - ... and 246 more: https://git.openjdk.org/jdk/compare/8f8fda7c...f9f74ff0 ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=14 Stats: 1637 lines in 25 files changed: 1283 ins; 242 del; 112 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From wkemper at openjdk.org Wed Dec 3 23:51:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 23:51:59 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 23:24:38 GMT, Rui Li wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 632: >> >>> 630: size_t get_usable_free_words(size_t free_bytes) const; >>> 631: >>> 632: void log_freeset_stats(ShenandoahFreeSetPartitionId partition_id, LogStream& ls); >> >> `log_freeset_stats` should probably be `private`. > > I thought it was private already? The `private` starts from [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp#L478). Or, if you expand this section a bit to line 636, another `public` starts after these declaration. :face-palm:, you're right. I misread the diff. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586958311 From eastigeevich at openjdk.org Wed Dec 3 23:58:04 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 23:58:04 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 15:42:38 GMT, Evgeny Astigeevich wrote: >> Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. >> >> Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: >> - Disable coherent icache. >> - Trap IC IVAU instructions. >> - Execute: >> - `tlbi vae3is, xzr` >> - `dsb sy` >> >> `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. >> >> As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: >> >> "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." >> >> This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. >> >> Changes include: >> >> * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. >> * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. >> * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. >> * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. >> >> Benchmarking results: Neoverse-N1 r3p1 (Graviton 2) >> >> - Baseline >> >> $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1... > > Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Fix linux-cross-compile build aarch64 > - Merge branch 'master' into JDK-8370947 > - Remove trailing whitespaces > - Add support of deferred icache invalidation to other GCs and JIT > - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence > - Add jtreg test > - Fix linux-cross-compile aarch64 build > - Fix regressions for Java methods without field accesses > - Fix code style > - Correct ifdef; Add dsb after ic > - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f src/hotspot/os_cpu/linux_aarch64/icache_linux_aarch64.hpp line 114: > 112: _code = nullptr; > 113: _size = 0; > 114: _mode = ICacheInvalidation::NOT_NEEDED; This should be inside IF. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586966933 From wkemper at openjdk.org Thu Dec 4 00:41:00 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 00:41:00 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v2] In-Reply-To: <1wB8K5uAm9h-sVDOlHskuhpH_kNuJIcxhBTHrkfDck0=.07cd6bd4-5cc5-4ec6-afbb-bb4b9cfa1cde@github.com> References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> <1wB8K5uAm9h-sVDOlHskuhpH_kNuJIcxhBTHrkfDck0=.07cd6bd4-5cc5-4ec6-afbb-bb4b9cfa1cde@github.com> Message-ID: On Wed, 3 Dec 2025 23:30:21 GMT, Xiaolong Peng wrote: >> Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. >> >> In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: >> * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) >> >> Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [ ] GHA > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8373056 > - Remove direct use of alloc type from ShenandoahHeapRegion::adjust_alloc_metadata > - Revert log change > - Remove unnecessary use of ShenandoahAllocRequest.type() Looks good. Left a minor nit about a now stale comment. src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 198: > 196: void > 197: ShenandoahOldGeneration::configure_plab_for_current_thread(const ShenandoahAllocRequest &req) { > 198: // Note: Even when a mutator is performing a promotion outside a LAB, we use a 'shared_gc' request. Is this comment vestigial now? This method doesn't handle shared allocations anymore. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28649#pullrequestreview-3537426868 PR Review Comment: https://git.openjdk.org/jdk/pull/28649#discussion_r2587024810 From dlong at openjdk.org Thu Dec 4 00:46:59 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 Dec 2025 00:46:59 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 08:37:02 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: > > - add guard to the include > - add load_reference_barrier_Type > - add clone_barrier_Type > - add write_barrier_pre_Type > - revert shenandoah changes How about leaving make_clone_type_Type() in barrierSetC2.cpp? I don't see a need to move it into runtime.cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3609423313 From ysr at openjdk.org Thu Dec 4 01:22:57 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 4 Dec 2025 01:22:57 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v2] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 22:08:11 GMT, Ben Taylor wrote: >> The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. >> >> A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 >> >> This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix up comment and remove additional assert from previous commit One more fix to the comment. LGTM otherwise. Thanks for the cleanups. src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 372: > 370: // and then too only during promotion/evacuation phases. Thus there is no danger > 371: // of races between reading from and writing to the object start array, > 372: // or of asking partially initialized objects their size (in the loop below). Remove reference to "in the loop below". ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28642#pullrequestreview-3537496302 PR Review Comment: https://git.openjdk.org/jdk/pull/28642#discussion_r2587092135 From xpeng at openjdk.org Thu Dec 4 01:23:55 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Dec 2025 01:23:55 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v3] In-Reply-To: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: > Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. > > In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: > * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) > > Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread > > ### Test > - [x] hotspot_gc_shenandoah > - [ ] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Remove outdated comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28649/files - new: https://git.openjdk.org/jdk/pull/28649/files/59087c8e..57305932 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28649&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28649&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28649/head:pull/28649 PR: https://git.openjdk.org/jdk/pull/28649 From xpeng at openjdk.org Thu Dec 4 01:23:59 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Dec 2025 01:23:59 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v2] In-Reply-To: References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> <1wB8K5uAm9h-sVDOlHskuhpH_kNuJIcxhBTHrkfDck0=.07cd6bd4-5cc5-4ec6-afbb-bb4b9cfa1cde@github.com> Message-ID: On Thu, 4 Dec 2025 00:36:56 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8373056 >> - Remove direct use of alloc type from ShenandoahHeapRegion::adjust_alloc_metadata >> - Revert log change >> - Remove unnecessary use of ShenandoahAllocRequest.type() > > src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 198: > >> 196: void >> 197: ShenandoahOldGeneration::configure_plab_for_current_thread(const ShenandoahAllocRequest &req) { >> 198: // Note: Even when a mutator is performing a promotion outside a LAB, we use a 'shared_gc' request. > > Is this comment vestigial now? This method doesn't handle shared allocations anymore. Yeah, it is outdated, I will remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28649#discussion_r2587092735 From ysr at openjdk.org Thu Dec 4 01:27:58 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 4 Dec 2025 01:27:58 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots [v2] In-Reply-To: References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: <3dmdwyrhOIf8HobQN8f_s_OuhA1DI-cMTuJ7jD-oCUU=.ca7fa0a7-a41e-46c4-8d86-435f02db1c9b@github.com> On Wed, 3 Dec 2025 22:07:15 GMT, Ben Taylor wrote: >> The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. >> >> The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix misleading comment in previous commit LGTM ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28648#pullrequestreview-3537506166 From xpeng at openjdk.org Thu Dec 4 01:29:55 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Dec 2025 01:29:55 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v2] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 22:08:11 GMT, Ben Taylor wrote: >> The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. >> >> A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 >> >> This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix up comment and remove additional assert from previous commit LGTM, thanks for looking into this. ------------- Marked as reviewed by xpeng (Committer). PR Review: https://git.openjdk.org/jdk/pull/28642#pullrequestreview-3537509599 From duke at openjdk.org Thu Dec 4 01:43:36 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 4 Dec 2025 01:43:36 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v2] In-Reply-To: References: Message-ID: > Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. > > Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: > > > available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used > soft_tail = Xmx - soft_max > if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc > > > The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. > > > Suggested fix: when deciding when to trigger gc, use logic similar to below: > > mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; > available = mutator_soft_capacity - used; > if (available < mutator_soft_capacity) // trigger gc > ``` > > ------- > This change also improved gc logging: > > Before: > > [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) > [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% > external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B > > > After: > > [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) > [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: > 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: > 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: > 122, Reserved: 102M, Max free available in a single region: 1024K; Rui Li has updated the pull request incrementally with two additional commits since the last revision: - Rename soft_available. Change Generation soft avail impl - log format fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28622/files - new: https://git.openjdk.org/jdk/pull/28622/files/b23e9ff1..103ce8f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=00-01 Stats: 35 lines in 11 files changed: 2 ins; 7 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/28622.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28622/head:pull/28622 PR: https://git.openjdk.org/jdk/pull/28622 From kdnilsen at openjdk.org Thu Dec 4 01:50:04 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 Dec 2025 01:50:04 GMT Subject: RFR: 8373048: Genshen: Remove dead code from Shenandoah In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 20:55:09 GMT, Xiaolong Peng wrote: > Trivial PR to remove dead code from Shenandoah. I noticed some dead code in shenandoahFreeSet.cpp when I was working on https://github.com/openjdk/jdk/pull/26171, this PR is to clean up the dead code in shenandoahFreeSet.cpp and some other files, no functional change at all. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA Thanks for this cleanup ------------- PR Review: https://git.openjdk.org/jdk/pull/28647#pullrequestreview-3537557808 From duke at openjdk.org Thu Dec 4 02:19:32 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 4 Dec 2025 02:19:32 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: > Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. > > Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: > > > available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used > soft_tail = Xmx - soft_max > if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc > > > The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. > > > Suggested fix: when deciding when to trigger gc, use logic similar to below: > > mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; > available = mutator_soft_capacity - used; > if (available < mutator_soft_capacity) // trigger gc > ``` > > ------- > This change also improved gc logging: > > Before: > > [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) > [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% > external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B > > > After: > > [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) > [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: > 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: > 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: > 122, Reserved: 102M, Max free available in a single region: 1024K; Rui Li has updated the pull request incrementally with one additional commit since the last revision: Remove unused freeset includes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28622/files - new: https://git.openjdk.org/jdk/pull/28622/files/103ce8f8..599cc2d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=01-02 Stats: 3 lines in 3 files changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28622.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28622/head:pull/28622 PR: https://git.openjdk.org/jdk/pull/28622 From aboldtch at openjdk.org Thu Dec 4 06:16:05 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 4 Dec 2025 06:16:05 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: <834LXpq7tgXkAdLSbu_J-OoTWWYhCxr40d-y80Z5z3M=.84badc91-3b8f-44db-800b-b48cd1dfc8d6@github.com> On Wed, 3 Dec 2025 16:00:05 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix linux-cross-compile build aarch64 >> - Merge branch 'master' into JDK-8370947 >> - Remove trailing whitespaces >> - Add support of deferred icache invalidation to other GCs and JIT >> - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence >> - Add jtreg test >> - Fix linux-cross-compile aarch64 build >> - Fix regressions for Java methods without field accesses >> - Fix code style >> - Correct ifdef; Add dsb after ic >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f > > src/hotspot/share/asm/codeBuffer.cpp line 939: > >> 937: // Move all the code and relocations to the new blob: >> 938: relocate_code_to(&cb); >> 939: } > > Here and later, the preferred style is: > > Suggestion: > > // Move all the code and relocations to the new blob: > { > ICacheInvalidationContext icic(ICacheInvalidation::NOT_NEEDED); > relocate_code_to(&cb); > } Go ahead and use @shipilev suggested change. Following the code style of the surrounding code is usually my preference as well. _The style comments we had earlier all applied to the ZGC code which has a certain style._ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2587712509 From duke at openjdk.org Thu Dec 4 06:17:59 2025 From: duke at openjdk.org (Harshit470250) Date: Thu, 4 Dec 2025 06:17:59 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 08:37:02 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: > > - add guard to the include > - add load_reference_barrier_Type > - add clone_barrier_Type > - add write_barrier_pre_Type > - revert shenandoah changes After moving make_clone_type_Type() into barrierSetC2.cpp when I try to include the barrierSetC2.cpp file into runtime.cpp or type.cpp it causes redefinition of many functions. I get these errors duplicate symbol 'BarrierSetC2::load_at_resolved(C2Access&, Type const*) const' in: /Users/harshitdhiman/jdk/build/macosx-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/barrierSetC2.o /Users/harshitdhiman/jdk/build/macosx-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/runtime.o duplicate symbol 'BarrierStubC2::entry()' in: /Users/harshitdhiman/jdk/build/macosx-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/barrierSetC2.o /Users/harshitdhiman/jdk/build/macosx-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/runtime.o Can you suggest a way to solve this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3610474295 From shade at openjdk.org Thu Dec 4 07:16:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Dec 2025 07:16:58 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots [v2] In-Reply-To: References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: On Wed, 3 Dec 2025 22:07:15 GMT, Ben Taylor wrote: >> The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. >> >> The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix misleading comment in previous commit Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28648#pullrequestreview-3538487931 From wkemper at openjdk.org Thu Dec 4 15:08:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 15:08:52 GMT Subject: RFR: Merge openjdk/jdk21u:master Message-ID: Merges tag jdk-21.0.10+5 ------------- Commit messages: - Merge - 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization - 8327980: Convert javax/swing/JToggleButton/4128979/bug4128979.java applet test to main - 8341131: Some jdk/jfr/event/compiler tests shouldn't be executed with Xcomp - 8368982: Test sun/security/tools/jarsigner/EC.java completed and timed out - 8313770: jdk/internal/platform/docker/TestSystemMetrics.java fails on Ubuntu - 8368960: Adjust java UL logging in the build - 8369563: Gtest dll_address_to_function_and_library_name has issues with stripped pdb files - 8343340: Swapping checking do not work for MetricsMemoryTester failcount - 8369032: Add test to ensure serialized ICC_Profile stores only necessary optional data - ... and 1 more: https://git.openjdk.org/shenandoah-jdk21u/compare/2f897401...3c0530fd The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/230/files Stats: 1413 lines in 32 files changed: 678 ins; 680 del; 55 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/230.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/230/head:pull/230 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/230 From btaylor at openjdk.org Thu Dec 4 16:06:13 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Thu, 4 Dec 2025 16:06:13 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v3] In-Reply-To: References: Message-ID: <7uBhjxIgI5nWimIcak1Id641QwQNKYhvXzo9_EQvFx8=.fa06f712-0cae-47bc-90d9-c650ae7ad86c@github.com> > The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. > > A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 > > This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: Update another outdated comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28642/files - new: https://git.openjdk.org/jdk/pull/28642/files/eec662f6..830c8348 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28642&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28642&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28642/head:pull/28642 PR: https://git.openjdk.org/jdk/pull/28642 From wkemper at openjdk.org Thu Dec 4 16:44:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 16:44:54 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v3] In-Reply-To: <7uBhjxIgI5nWimIcak1Id641QwQNKYhvXzo9_EQvFx8=.fa06f712-0cae-47bc-90d9-c650ae7ad86c@github.com> References: <7uBhjxIgI5nWimIcak1Id641QwQNKYhvXzo9_EQvFx8=.fa06f712-0cae-47bc-90d9-c650ae7ad86c@github.com> Message-ID: On Thu, 4 Dec 2025 16:06:13 GMT, Ben Taylor wrote: >> The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. >> >> A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 >> >> This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Update another outdated comment Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28642#pullrequestreview-3540960610 From wkemper at openjdk.org Thu Dec 4 16:46:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 16:46:54 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v3] In-Reply-To: References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: On Thu, 4 Dec 2025 01:23:55 GMT, Xiaolong Peng wrote: >> Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. >> >> In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: >> * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) >> >> Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [ ] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Remove outdated comments Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28649#pullrequestreview-3540973973 From wkemper at openjdk.org Thu Dec 4 16:55:22 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 16:55:22 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots [v2] In-Reply-To: References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: On Wed, 3 Dec 2025 22:07:15 GMT, Ben Taylor wrote: >> The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. >> >> The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix misleading comment in previous commit Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28648#pullrequestreview-3541015046 From duke at openjdk.org Thu Dec 4 18:37:47 2025 From: duke at openjdk.org (duke) Date: Thu, 4 Dec 2025 18:37:47 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots [v2] In-Reply-To: References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: On Wed, 3 Dec 2025 22:07:15 GMT, Ben Taylor wrote: >> The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. >> >> The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix misleading comment in previous commit @benty-amzn Your change (at version a1a9bf11e271967973fd4d759ce13b1137434be1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28648#issuecomment-3613782426 From duke at openjdk.org Thu Dec 4 18:51:17 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 4 Dec 2025 18:51:17 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:44:37 GMT, William Kemper wrote: >> Rui Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused freeset includes > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 258: > >> 256: size_t min_threshold = min_free_threshold(); >> 257: if (available < min_threshold) { >> 258: log_trigger("Free (Soft mutator free) (%zu%s) is below minimum threshold (%zu%s)", > > Changing this will break some log parsers, do we really need this? Talked offline. `Free` is overloaded in logs. Sometimes it means soft free, sometimes it means total free. Make it as `Free (Soft)` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2590194712 From dlong at openjdk.org Thu Dec 4 20:14:32 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 Dec 2025 20:14:32 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: <7OlfD2Jc5Vu7a8x_QmCuDONR_u7AjPQJwqeTJLkAzR0=.723e9b17-44c2-4982-a54c-5b0bc07f3f81@github.com> On Wed, 3 Dec 2025 08:37:02 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: > > - add guard to the include > - add load_reference_barrier_Type > - add clone_barrier_Type > - add write_barrier_pre_Type > - revert shenandoah changes Do you have a branch that reproduces the problem, so I can take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3614148857 From wkemper at openjdk.org Thu Dec 4 20:33:07 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 20:33:07 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 02:19:32 GMT, Rui Li wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused freeset includes src/hotspot/share/gc/shenandoah/shenandoahGlobalGeneration.cpp line 81: > 79: } > 80: > 81: size_t ShenandoahGlobalGeneration::soft_available_exclude_evac_reserve() const { Two questions: * How is this override different from the default implementation now? * Should we not also take the minimum of this value and `free_set()->available()` as we do elsewhere? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2590478334 From wkemper at openjdk.org Thu Dec 4 20:42:27 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 20:42:27 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification Message-ID: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. ------------- Commit messages: - Expand scope of control lock so that it can't miss cancellation notifications Changes: https://git.openjdk.org/jdk/pull/28665/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28665&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373100 Stats: 8 lines in 1 file changed: 2 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28665.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28665/head:pull/28665 PR: https://git.openjdk.org/jdk/pull/28665 From eastigeevich at openjdk.org Thu Dec 4 21:16:26 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 4 Dec 2025 21:16:26 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: <7IU85M4fl7Hk58g_oBLgw5g9QEHDhmSCLN6K5IhH9YQ=.46291089-efb6-432f-ac17-3480200c494a@github.com> On Wed, 3 Dec 2025 15:54:24 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix linux-cross-compile build aarch64 >> - Merge branch 'master' into JDK-8370947 >> - Remove trailing whitespaces >> - Add support of deferred icache invalidation to other GCs and JIT >> - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence >> - Add jtreg test >> - Fix linux-cross-compile aarch64 build >> - Fix regressions for Java methods without field accesses >> - Fix code style >> - Correct ifdef; Add dsb after ic >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f > > src/hotspot/share/asm/codeBuffer.cpp line 371: > >> 369: !((oop_Relocation*)reloc)->oop_is_immediate()) { >> 370: _has_non_immediate_oops = true; >> 371: } > > Honestly, this looks fragile? We can go into nmethods patching for some other reason, not for patching oops. > > Also, we still might need to go and patch immediate oops? I see this: > > > // Instruct loadConP of x86_64.ad places oops in code that are not also > // listed in the oop section. > static bool mustIterateImmediateOopsInCode() { return true; } > > > Is there a substantial loss is doing icache invalidation without checking for the existence of interesting oops? Do you have an idea how many methods this filters? @shipilev Moving `ICacheInvalidationContext icic` to `nmethod::fix_oop_relocations` works. The fragile code is no more needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2590596345 From duke at openjdk.org Thu Dec 4 21:24:04 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 4 Dec 2025 21:24:04 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 20:30:27 GMT, William Kemper wrote: >> Rui Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused freeset includes > > src/hotspot/share/gc/shenandoah/shenandoahGlobalGeneration.cpp line 81: > >> 79: } >> 80: >> 81: size_t ShenandoahGlobalGeneration::soft_available_exclude_evac_reserve() const { > > Two questions: > * How is this override different from the default implementation now? > * Should we not also take the minimum of this value and `free_set()->available()` as we do elsewhere? - Good call. No functional differences except for a safety assert: `assert(max_capacity() >= soft_max`, which isn't that necessary since the app wouldn't start if this wasn't true: [code](https://github.com/openjdk/jdk/blob/8e653d394e45180e16714124ed6584f912eb5cba/src/hotspot/share/gc/shared/jvmFlagConstraintsGC.cpp#L277). Will remove the override. - I don't think it's needed for global. `free_set()->available()` (space reserved for mutators) could be smaller than `mutator_soft_max` when there's an old gen taking space. If there's no generation, the `mutator_soft_max` should always be less or equal than `free_set()->available()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2590621429 From btaylor at openjdk.org Thu Dec 4 21:40:11 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Thu, 4 Dec 2025 21:40:11 GMT Subject: Integrated: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots In-Reply-To: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: On Wed, 3 Dec 2025 21:33:50 GMT, Ben Taylor wrote: > The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. > > The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. This pull request has now been integrated. Changeset: 5ec5a6ea Author: Ben Taylor Committer: William Kemper URL: https://git.openjdk.org/jdk/commit/5ec5a6ea6c8e887b4e21f81e382f57129bffbab8 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots Reviewed-by: wkemper, ysr, shade ------------- PR: https://git.openjdk.org/jdk/pull/28648 From duke at openjdk.org Thu Dec 4 21:50:52 2025 From: duke at openjdk.org (duke) Date: Thu, 4 Dec 2025 21:50:52 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v3] In-Reply-To: <7uBhjxIgI5nWimIcak1Id641QwQNKYhvXzo9_EQvFx8=.fa06f712-0cae-47bc-90d9-c650ae7ad86c@github.com> References: <7uBhjxIgI5nWimIcak1Id641QwQNKYhvXzo9_EQvFx8=.fa06f712-0cae-47bc-90d9-c650ae7ad86c@github.com> Message-ID: On Thu, 4 Dec 2025 16:06:13 GMT, Ben Taylor wrote: >> The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. >> >> A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 >> >> This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Update another outdated comment @benty-amzn Your change (at version 830c83480540d57b147fe26f6ea6742b4788c5e2) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28642#issuecomment-3614461395 From btaylor at openjdk.org Thu Dec 4 22:15:07 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Thu, 4 Dec 2025 22:15:07 GMT Subject: Integrated: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered In-Reply-To: References: Message-ID: <_arO7HyaZNuiCeydQ9IKZuIZfi1z6sNH--IIbk49Mcc=.68fa2fa2-e0dc-40bd-934f-49ae0b4b39ec@github.com> On Wed, 3 Dec 2025 17:16:02 GMT, Ben Taylor wrote: > The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. > > A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 > > This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build This pull request has now been integrated. Changeset: c8b30da7 Author: Ben Taylor Committer: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/jdk/commit/c8b30da7ef48edb3d43e07d2c1b8622d8123c3a9 Stats: 13 lines in 1 file changed: 0 ins; 10 del; 3 mod 8373039: Remove Incorrect Asserts in shenandoahScanRemembered Reviewed-by: wkemper, ysr, xpeng ------------- PR: https://git.openjdk.org/jdk/pull/28642 From kdnilsen at openjdk.org Thu Dec 4 22:23:41 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 Dec 2025 22:23:41 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification In-Reply-To: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Thu, 4 Dec 2025 20:35:42 GMT, William Kemper wrote: > In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. Thanks. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28665#pullrequestreview-3542353932 From kdnilsen at openjdk.org Thu Dec 4 22:31:14 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 Dec 2025 22:31:14 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v3] In-Reply-To: References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: <9NmDxH8twYZVjup5VIUUI3Aw-dfOINY8eyPTNnFEZNA=.af38ae63-c607-46f4-979f-632eb7a44817@github.com> On Thu, 4 Dec 2025 01:23:55 GMT, Xiaolong Peng wrote: >> Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. >> >> In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: >> * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) >> >> Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Remove outdated comments Thanks. NIce improvements to code. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28649#pullrequestreview-3542376515 From xpeng at openjdk.org Thu Dec 4 23:59:14 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Dec 2025 23:59:14 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v3] In-Reply-To: References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: On Thu, 4 Dec 2025 01:23:55 GMT, Xiaolong Peng wrote: >> Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. >> >> In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: >> * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) >> >> Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Remove outdated comments Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28649#issuecomment-3614774632 From xpeng at openjdk.org Thu Dec 4 23:59:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Dec 2025 23:59:16 GMT Subject: Integrated: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() In-Reply-To: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: On Wed, 3 Dec 2025 23:02:17 GMT, Xiaolong Peng wrote: > Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. > > In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: > * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) > > Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA This pull request has now been integrated. Changeset: 15f25389 Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/15f25389435288881644f7aeab48fd2eae410999 Stats: 80 lines in 6 files changed: 12 ins; 21 del; 47 mod 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() Reviewed-by: wkemper, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/28649 From xpeng at openjdk.org Fri Dec 5 00:27:20 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 00:27:20 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region Message-ID: Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 It is caused by the behavior change from follow code: Original: if (ShenandoahSATBBarrier) { T* array = dst; HeapWord* array_addr = reinterpret_cast(array); ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); if (is_old_marking) { // Generational, old marking assert(_heap->mode()->is_generational(), "Invariant"); if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { arraycopy_work(array, count); } } else if (_heap->mode()->is_generational()) { // Generational, young marking if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { arraycopy_work(array, count); } } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { // Non-generational, marking arraycopy_work(array, count); } } New: if (ShenandoahSATBBarrier) { if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { arraycopy_work(dst, count); } } With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. ### Test - [x] hotspot_gc_shenandoah - [ ] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix - [ ] GHA ------------- Commit messages: - Reorder the code - Assert only when the obj been pointed to is in young - Add assert to check card table to sure card table is correct - Merge branch 'openjdk:master' into JDK-8372498 - arraycopy_work should be done unconditionally if the array is in an old region Changes: https://git.openjdk.org/jdk/pull/28669/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373116 Stats: 16 lines in 1 file changed: 15 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From xpeng at openjdk.org Fri Dec 5 02:03:34 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 02:03:34 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v2] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. > > ### Test > - [x] hotspot_gc_shenandoah > - [ ] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [ ] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: uncomment the new added assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/44938dc7..5b951e6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=00-01 Stats: 16 lines in 1 file changed: 2 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From duke at openjdk.org Fri Dec 5 05:41:03 2025 From: duke at openjdk.org (Harshit470250) Date: Fri, 5 Dec 2025 05:41:03 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 08:37:02 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: > > - add guard to the include > - add load_reference_barrier_Type > - add clone_barrier_Type > - add write_barrier_pre_Type > - revert shenandoah changes I have reproduced it [here](https://github.com/Harshit470250/jdk/pull/2). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3615381956 From xpeng at openjdk.org Fri Dec 5 07:24:35 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 07:24:35 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v3] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Remove the asset code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/5b951e6d..85acca0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=01-02 Stats: 16 lines in 1 file changed: 0 ins; 16 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From syan at openjdk.org Fri Dec 5 07:26:56 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 5 Dec 2025 07:26:56 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v2] In-Reply-To: References: Message-ID: <3ib3dmwah2lDF7aNTiGY-4lav-SXfYPXAdu2AKSvtGM=.b5cdd6b8-3b40-41bc-8ce4-46e934928756@github.com> On Fri, 5 Dec 2025 02:03:34 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > uncomment the new added assert After apply the proposed patch, the jvm crash do not observed by run the test 1000 times. But there is one "java.lang.OutOfMemoryError: Java heap space" test fails observed. [848.log](https://github.com/user-attachments/files/23955591/848.log) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3615615694 From xpeng at openjdk.org Fri Dec 5 07:42:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 07:42:56 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v2] In-Reply-To: <3ib3dmwah2lDF7aNTiGY-4lav-SXfYPXAdu2AKSvtGM=.b5cdd6b8-3b40-41bc-8ce4-46e934928756@github.com> References: <3ib3dmwah2lDF7aNTiGY-4lav-SXfYPXAdu2AKSvtGM=.b5cdd6b8-3b40-41bc-8ce4-46e934928756@github.com> Message-ID: On Fri, 5 Dec 2025 07:24:13 GMT, SendaoYan wrote: > After apply the proposed patch, the jvm crash do not observed by run the test 1000 times. But there is one "java.lang.OutOfMemoryError: Java heap space" test fails observed. > > [848.log](https://github.com/user-attachments/files/23955591/848.log) Thank you Sendao for the test and verification! The OOM should be a unrelated issue, I'm not if there is open bug for it or not, please create a new one if there isn't. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3615655399 From xpeng at openjdk.org Fri Dec 5 07:42:57 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 07:42:57 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v2] In-Reply-To: References: <3ib3dmwah2lDF7aNTiGY-4lav-SXfYPXAdu2AKSvtGM=.b5cdd6b8-3b40-41bc-8ce4-46e934928756@github.com> Message-ID: On Fri, 5 Dec 2025 07:39:53 GMT, Xiaolong Peng wrote: > > After apply the proposed patch, the jvm crash do not observed by run the test 1000 times. But there is one "java.lang.OutOfMemoryError: Java heap space" test fails observed. > > [848.log](https://github.com/user-attachments/files/23955591/848.log) > > Thank you Sendao for the test and verification! > > The OOM should be a unrelated issue, I'm not if there is open bug for it or not, please create a new one if there isn't. Found it, https://bugs.openjdk.org/browse/JDK-8298781 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3615657332 From roland at openjdk.org Fri Dec 5 13:52:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 13:52:12 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v9] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/castnode.hpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/93b8b0c5..cab44429 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=07-08 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From roland at openjdk.org Fri Dec 5 14:05:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:06 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/cab44429..4a877c43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=08-09 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From roland at openjdk.org Fri Dec 5 14:05:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:09 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <5DHx3WmMb1UtSeyiEiYCiisVgRFggPFfxBggpgtuD6M=.d72a9c07-9624-47ea-9398-a0d1dee69755@github.com> On Tue, 2 Dec 2025 17:32:09 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into JDK-8354282 >> - whitespace >> - review >> - review >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java >> >> Co-authored-by: Christian Hagedorn >> - review >> - review >> - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 > > src/hotspot/share/opto/castnode.hpp line 105: > >> 103: // All the possible combinations of floating/narrowing with example use cases: >> 104: >> 105: // Use case example: Range Check CastII > > I believe this is incorrect, a range check should be floating non-narrowing. It is only narrowing if the length of the array is a constant. It is because this cast encodes the dependency on the condition `index u< length`. This condition cannot be expressed in terms of `Type` unless `length` is a constant. Range check `CastII` were added to protect the `ConvI2L` in the address expression on 64 bits. The problem there was, in some cases, that the `ConvI2L` would float above the range check (because `ConvI2L` has no control input) and could end up with an out of range input (which in turn would cause the `ConvI2L` to become `top` in places where it wasn't expected). So `CastII` doesn't carry the control dependency of an array access on its range check. That dependency is carried by the `MemNode` which has its control input set to the range check. What you're saying, if I understand it correctly, would be true if the `CastII` was required to prevent an array `Load` from floating. But that's not the case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2592801401 From roland at openjdk.org Fri Dec 5 14:05:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:10 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 17:41:37 GMT, Quan Anh Mai wrote: >> Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. > > Naming is hard, but it is worth pointing out in the comment that floating here refers to `depends_only_on_test`. In other words, a cast is considered floating if it is legal to change the control input of a cast from an `IfTrue` or `IfFalse` to an `IfTrue` and `IfFalse` that dominates the current control input, and the corresponding conditions of the `If`s are the same. In contrast, we cannot do that for a pinned cast, and if the control is folded away, the control input of the pinned cast is changed to the control predecessor of the folded node. > > It is also worth noting that we have `Node::pinned` which means the node is pinned AT the control input while pinned here means that it is pinned UNDER the control input. Very confusing! I added a mention of `depends_only_on_test`. Is that good enough? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2592784214 From roland at openjdk.org Fri Dec 5 14:52:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:52:51 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: References: <2oDqUvcW_3hJRPRri4uttpkgfeCovL4ZZkcI0R1bB1A=.173b3a58-d0f1-4b29-94d1-77b0a350c790@github.com> <2wAnS7drj_r3dqsy5CEF9vBG40KizHsQDOxMeNymwhw=.9bc29879-eead-401c-b750-814592feff63@github.com> <-1wiWF_UEvCO6xPuYvIsElBzPPQDejGahm9Xd5YszPU=.cfb41cb1-f681-4e75-8c29-2d928468f53b@github.com> Message-ID: <42lOFbyCuQt4xj-pK-ME6ScceXqTnGOY0HrWnJMK56k=.87b29936-511f-4ba4-a429-e8b9faed83a2@github.com> On Sun, 30 Nov 2025 08:03:32 GMT, Zihao Lin wrote: >> I had a closer look and I think you ran into an inconsistency. Let me see if I can get it fixed as a separate change. > > Sure, it's better to separate to another change. I am not familiar this part, please pin me if you have better solution. Thanks! I filed https://bugs.openjdk.org/browse/JDK-8373143 for this but I keep finding new issues. So this one will take some time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2592955645 From xpeng at openjdk.org Fri Dec 5 16:25:22 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 16:25:22 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v4] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add asserts back, the elem_ptr must be dirty either in read or write table ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/85acca0c..53316bd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=02-03 Stats: 18 lines in 1 file changed: 18 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From pwinchester at palantir.com Fri Dec 5 16:32:36 2025 From: pwinchester at palantir.com (Parker Winchester) Date: Fri, 5 Dec 2025 16:32:36 +0000 Subject: Reference leak in old gen in Generational Shenandoah Message-ID: We just upgraded to JDK25 and are trying out Generational Shenandoah, coming from ZGC. We noticed native memory (in the "other" category) due to direct byte buffers steadily increasing and not getting freed - despite these DirectByteBuffer objects becoming unreachable and the GC clearly running frequently. One service of ours hit 2GB of native memory used after 24 hours, ultimately causing our service to be OOMKilled. Triggering GC's manually by taking a (live) heap histogram clears the native memory, so this seems to be a failure of the GC to find and clean up certain objects, rather than a true "leak." We tracked this down to issues with Undertow's DefaultByteBufferPool, which uses Finalizers and WeakHashMaps - these both use types of references (eg WeakReferences) that need at least one additional GC cycle to be removed by the GC. I plan to submit a change to Undertow's code to reduce its reliance on these, but it's possible this issue impacts other code, so I produced a minimal repro of it that doesn't use native memory. I believe the issue is a Reference in the old generation will sometimes fail to be discovered by the GC. A reference in the old gen will not be encountered by any young gen collections. And when it gets encountered in the old gen, should_discover() is returning false, so there's no way for it to ever be enqueued. I think this is due to the references being wrongly considered strongly live: [23.999s][trace][gc,ref ] GC(259) Encountered Reference: 0x000000030000b6e8 (Weak, OLD) [23.999s][trace][gc,ref ] GC(259) Reference strongly live: 0x000000030000b6e8 My minimal repro uses weak references, but I also noticed the issue with phantom references due to DirectByteBuffer Summary of my repro Each iteration it: * Allocates a simple object (MyLeakedObject - only necessary so it has a class name in the heap histogram) as well as a WeakReference to it. * It stores the WeakReference in a static list (this part appears to be necessary to the repro) * It then allocates a lot of garbage (80GB in a 8GB heap size) to force the object and the WeakReference to be promoted to the old gen * It then iterates over the static list and removes any WeakReferences with null referents * It then takes a heap histogram (not live, so we don't trigger GC), and prints the counts of MyLeakedObject and WeakReference * The loop then continues, allowing the object and its WeakReference to go out of scope. * Every 20 iterations it runs several system.gc() calls to prove that the counts return to 0 (system.gc() triggers a "global" GC which is different than an old gen GC). The count will go up each iteration until the system.gc(): Iteration 1: MyLeakedObject=1, WeakReference=5, WeakRefs with live referent=1 Iteration 2: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Iteration 3: MyLeakedObject=3, WeakReference=7, WeakRefs with live referent=3 Iteration 4: MyLeakedObject=4, WeakReference=8, WeakRefs with live referent=4 Iteration 5: MyLeakedObject=5, WeakReference=9, WeakRefs with live referent=5 Iteration 6: MyLeakedObject=6, WeakReference=10, WeakRefs with live referent=6 Iteration 7: MyLeakedObject=7, WeakReference=11, WeakRefs with live referent=7 Iteration 8: MyLeakedObject=8, WeakReference=12, WeakRefs with live referent=8 Iteration 9: MyLeakedObject=9, WeakReference=13, WeakRefs with live referent=9 Iteration 10: MyLeakedObject=10, WeakReference=14, WeakRefs with live referent=10 Iteration 11: MyLeakedObject=11, WeakReference=15, WeakRefs with live referent=11 Iteration 12: MyLeakedObject=12, WeakReference=16, WeakRefs with live referent=12 Iteration 13: MyLeakedObject=13, WeakReference=17, WeakRefs with live referent=13 Iteration 14: MyLeakedObject=14, WeakReference=18, WeakRefs with live referent=14 Iteration 15: MyLeakedObject=15, WeakReference=19, WeakRefs with live referent=15 Iteration 16: MyLeakedObject=16, WeakReference=20, WeakRefs with live referent=16 Iteration 17: MyLeakedObject=17, WeakReference=21, WeakRefs with live referent=17 Iteration 18: MyLeakedObject=18, WeakReference=22, WeakRefs with live referent=18 Iteration 19: MyLeakedObject=19, WeakReference=23, WeakRefs with live referent=19 Iteration 20: MyLeakedObject=20, WeakReference=24, WeakRefs with live referent=20 Forcing GCs... Iteration 21: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Expected behavior: Each iteration should see only 1 at most 2 of MyLeakedObject, since they are no longer in scope and sufficient GC activity (young + old gen GCs) has occurred Actual behavior: Each iteration adds an additional MyLeakedObject and its WeakReference, leading to a leak I have only tested with Corretto on Ubuntu & OSX openjdk 25.0.1 2025-10-21 LTS OpenJDK Runtime Environment Corretto-25.0.1.8.1 (build 25.0.1+8-LTS) OpenJDK 64-Bit Server VM Corretto-25.0.1.8.1 (build 25.0.1+8-LTS, mixed mode, sharing) I've tried with non-generational shenandoah (mode=satb) and the issue does not occur. It also does not occur for ZGC or G1. I had a version of the repro that used DirectByteBuffers which yielded these results, strictly looking at reference processing in old gen GCs (running with -Xlog:gc*=info,gc+ref=trace) Iteration 1: Native Memory = 1 KB [20.423s][info ][gc,ref ] GC(46) Encountered references: Soft: 66, Weak: 183, Final: 0, Phantom: 3 [20.423s][info ][gc,ref ] GC(46) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [20.423s][info ][gc,ref ] GC(46) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 2: Native Memory = 2 KB [30.687s][info ][gc,ref ] GC(52) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 4 [30.688s][info ][gc,ref ] GC(52) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [30.688s][info ][gc,ref ] GC(52) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 3: Native Memory = 3 KB [54.496s][info ][gc,ref ] GC(70) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 5 [54.496s][info ][gc,ref ] GC(70) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 1 [54.496s][info ][gc,ref ] GC(70) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 4: Native Memory = 4 KB [93.706s][info ][gc,ref ] GC(91) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 6 [93.706s][info ][gc,ref ] GC(91) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [93.706s][info ][gc,ref ] GC(91) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 It's a little easier to see with DirectByteBuffer's Phantom references (there are 100+ unrelated WeakReferences, I believe these are used internally). Each iteration it adds another Phantom reference which is encountered, but fails to be discovered (due to being considered strongly live) Run the repro with: java -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 -XX:ShenandoahGuaranteedOldGCInterval=1000 -XX:+AlwaysPreTouch -Xmx8g -Xms8g GenShenWeakRefLeakRepro These flags help prove that the references are guaranteed to be encountered during each old gen GC cycle (otherwise they might be skipped over if the region has very little garbage) -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 This flag guarantees that references in old gen regions get processed every 1 second (each iteration takes about 2 seconds on my M1 macbook) -XX:ShenandoahGuaranteedOldGCInterval=1000 Note I played around with the heap size and the allocation rate and found 8GB heap & 80GB allocated to be the most reliable way to reproduce the issue. Source code for GenShenWeakRefLeakRepro.java import java.io.BufferedReader; import java.io.InputStreamReader; import java.lang.ref.WeakReference; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.List; /** * Tests if WeakReferences with old-gen referents leak in Generational Shenandoah. */ public class GenShenWeakRefLeakRepro { // Keep WeakReferences alive in a static list (will be in old gen) private static final List> WEAK_REFS = new ArrayList<>(); private static final long[] COUNTS = new long[2]; static class MyLeakedObject { private final int value; MyLeakedObject(int value) { this.value = value; } } public static void main(String[] args) throws Exception { //allocate garbage to promote WEAK_REFS to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } for (int iteration = 0; iteration < 100; iteration++) { // Create object and weak reference MyLeakedObject obj = new MyLeakedObject(iteration); WeakReference wr = new WeakReference<>(obj); // Store in static list (so WeakRef survives and gets promoted) WEAK_REFS.add(wr); // Allocate garbage to promote both WeakRef and referent to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } // Remove cleared WeakRefs (referent was collected) WEAK_REFS.removeIf(w -> w.get() == null); // Count objects getObjectCounts(); // What remains are WeakRefs with live referents long aliveCount = WEAK_REFS.size(); System.out.println("Iteration " + (iteration + 1) + ": MyLeakedObject=" + COUNTS[0] + ", WeakReference=" + COUNTS[1] + ", WeakRefs with live referent=" + aliveCount); // Periodically force GCs if ((iteration + 1) % 20 == 0) { System.out.println("Forcing GCs..."); for (int i = 0; i < 4; i++) { System.gc(); Thread.sleep(3000); } getObjectCounts(); System.out.println("After GC: MyLeakedObject=" + COUNTS[0] + ", WeakRefs with live referent=" + aliveCount); } } } private static void getObjectCounts() { COUNTS[0] = 0; COUNTS[1] = 0; try { Process p = new ProcessBuilder( "jcmd", String.valueOf(ProcessHandle.current().pid()), "GC.class_histogram", "-all") .start(); try (BufferedReader r = new BufferedReader( new InputStreamReader(p.getInputStream(), StandardCharsets.UTF_8))) { String line; while ((line = r.readLine()) != null) { String[] parts = line.trim().split("\\s+"); if (parts.length >= 4) { if (line.contains("GenShenWeakRefLeakRepro$MyLeakedObject")) { COUNTS[0] = Long.parseLong(parts[1]); } else if (line.contains("java.lang.ref.WeakReference ")) { COUNTS[1] = Long.parseLong(parts[1]); } } } } } catch (Exception e) { System.err.println("Histogram failed: " + e.getMessage()); } } } Thanks, Parker Winchester -------------- next part -------------- An HTML attachment was scrubbed... URL: From eastigeevich at openjdk.org Fri Dec 5 17:52:20 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 5 Dec 2025 17:52:20 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v14] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Benchmarking results: Neoverse-N1 r3p1 (Graviton 2) > > - Baseline > > $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1Errata1542419 -XX:+UseZGC -XX:ZYoungGCThreads=1 -XX:ZOldGC... Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Implement nested ICacheInvalidationContext ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/4b04496f..b9380fd8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=12-13 Stats: 402 lines in 27 files changed: 162 ins; 167 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From xpeng at openjdk.org Fri Dec 5 18:19:39 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 18:19:39 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add include header shenandoahOldGeneration.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/53316bd3..49ea3c93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From kemperw at amazon.com Fri Dec 5 18:23:01 2025 From: kemperw at amazon.com (Kemper, William) Date: Fri, 5 Dec 2025 18:23:01 +0000 Subject: Reference leak in old gen in Generational Shenandoah In-Reply-To: References: Message-ID: <526637d32a674ba3b83d024abaf25d29@amazon.com> Hi Parker - thank you for reporting this and writing a reproducer. I'll take a look and keep you apprised. ________________________________ From: shenandoah-dev on behalf of Parker Winchester Sent: Friday, December 5, 2025 8:32:36 AM To: shenandoah-dev at openjdk.org Subject: [EXTERNAL] Reference leak in old gen in Generational Shenandoah CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. We just upgraded to JDK25 and are trying out Generational Shenandoah, coming from ZGC. We noticed native memory (in the "other" category) due to direct byte buffers steadily increasing and not getting freed - despite these DirectByteBuffer objects becoming unreachable and the GC clearly running frequently. One service of ours hit 2GB of native memory used after 24 hours, ultimately causing our service to be OOMKilled. Triggering GC's manually by taking a (live) heap histogram clears the native memory, so this seems to be a failure of the GC to find and clean up certain objects, rather than a true "leak." We tracked this down to issues with Undertow's DefaultByteBufferPool, which uses Finalizers and WeakHashMaps - these both use types of references (eg WeakReferences) that need at least one additional GC cycle to be removed by the GC. I plan to submit a change to Undertow's code to reduce its reliance on these, but it's possible this issue impacts other code, so I produced a minimal repro of it that doesn't use native memory. I believe the issue is a Reference in the old generation will sometimes fail to be discovered by the GC. A reference in the old gen will not be encountered by any young gen collections. And when it gets encountered in the old gen, should_discover() is returning false, so there's no way for it to ever be enqueued. I think this is due to the references being wrongly considered strongly live: [23.999s][trace][gc,ref ] GC(259) Encountered Reference: 0x000000030000b6e8 (Weak, OLD) [23.999s][trace][gc,ref ] GC(259) Reference strongly live: 0x000000030000b6e8 My minimal repro uses weak references, but I also noticed the issue with phantom references due to DirectByteBuffer Summary of my repro Each iteration it: * Allocates a simple object (MyLeakedObject - only necessary so it has a class name in the heap histogram) as well as a WeakReference to it. * It stores the WeakReference in a static list (this part appears to be necessary to the repro) * It then allocates a lot of garbage (80GB in a 8GB heap size) to force the object and the WeakReference to be promoted to the old gen * It then iterates over the static list and removes any WeakReferences with null referents * It then takes a heap histogram (not live, so we don't trigger GC), and prints the counts of MyLeakedObject and WeakReference * The loop then continues, allowing the object and its WeakReference to go out of scope. * Every 20 iterations it runs several system.gc() calls to prove that the counts return to 0 (system.gc() triggers a "global" GC which is different than an old gen GC). The count will go up each iteration until the system.gc(): Iteration 1: MyLeakedObject=1, WeakReference=5, WeakRefs with live referent=1 Iteration 2: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Iteration 3: MyLeakedObject=3, WeakReference=7, WeakRefs with live referent=3 Iteration 4: MyLeakedObject=4, WeakReference=8, WeakRefs with live referent=4 Iteration 5: MyLeakedObject=5, WeakReference=9, WeakRefs with live referent=5 Iteration 6: MyLeakedObject=6, WeakReference=10, WeakRefs with live referent=6 Iteration 7: MyLeakedObject=7, WeakReference=11, WeakRefs with live referent=7 Iteration 8: MyLeakedObject=8, WeakReference=12, WeakRefs with live referent=8 Iteration 9: MyLeakedObject=9, WeakReference=13, WeakRefs with live referent=9 Iteration 10: MyLeakedObject=10, WeakReference=14, WeakRefs with live referent=10 Iteration 11: MyLeakedObject=11, WeakReference=15, WeakRefs with live referent=11 Iteration 12: MyLeakedObject=12, WeakReference=16, WeakRefs with live referent=12 Iteration 13: MyLeakedObject=13, WeakReference=17, WeakRefs with live referent=13 Iteration 14: MyLeakedObject=14, WeakReference=18, WeakRefs with live referent=14 Iteration 15: MyLeakedObject=15, WeakReference=19, WeakRefs with live referent=15 Iteration 16: MyLeakedObject=16, WeakReference=20, WeakRefs with live referent=16 Iteration 17: MyLeakedObject=17, WeakReference=21, WeakRefs with live referent=17 Iteration 18: MyLeakedObject=18, WeakReference=22, WeakRefs with live referent=18 Iteration 19: MyLeakedObject=19, WeakReference=23, WeakRefs with live referent=19 Iteration 20: MyLeakedObject=20, WeakReference=24, WeakRefs with live referent=20 Forcing GCs... Iteration 21: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Expected behavior: Each iteration should see only 1 at most 2 of MyLeakedObject, since they are no longer in scope and sufficient GC activity (young + old gen GCs) has occurred Actual behavior: Each iteration adds an additional MyLeakedObject and its WeakReference, leading to a leak I have only tested with Corretto on Ubuntu & OSX openjdk 25.0.1 2025-10-21 LTS OpenJDK Runtime Environment Corretto-25.0.1.8.1 (build 25.0.1+8-LTS) OpenJDK 64-Bit Server VM Corretto-25.0.1.8.1 (build 25.0.1+8-LTS, mixed mode, sharing) I've tried with non-generational shenandoah (mode=satb) and the issue does not occur. It also does not occur for ZGC or G1. I had a version of the repro that used DirectByteBuffers which yielded these results, strictly looking at reference processing in old gen GCs (running with -Xlog:gc*=info,gc+ref=trace) Iteration 1: Native Memory = 1 KB [20.423s][info ][gc,ref ] GC(46) Encountered references: Soft: 66, Weak: 183, Final: 0, Phantom: 3 [20.423s][info ][gc,ref ] GC(46) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [20.423s][info ][gc,ref ] GC(46) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 2: Native Memory = 2 KB [30.687s][info ][gc,ref ] GC(52) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 4 [30.688s][info ][gc,ref ] GC(52) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [30.688s][info ][gc,ref ] GC(52) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 3: Native Memory = 3 KB [54.496s][info ][gc,ref ] GC(70) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 5 [54.496s][info ][gc,ref ] GC(70) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 1 [54.496s][info ][gc,ref ] GC(70) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 4: Native Memory = 4 KB [93.706s][info ][gc,ref ] GC(91) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 6 [93.706s][info ][gc,ref ] GC(91) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [93.706s][info ][gc,ref ] GC(91) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 It's a little easier to see with DirectByteBuffer's Phantom references (there are 100+ unrelated WeakReferences, I believe these are used internally). Each iteration it adds another Phantom reference which is encountered, but fails to be discovered (due to being considered strongly live) Run the repro with: java -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 -XX:ShenandoahGuaranteedOldGCInterval=1000 -XX:+AlwaysPreTouch -Xmx8g -Xms8g GenShenWeakRefLeakRepro These flags help prove that the references are guaranteed to be encountered during each old gen GC cycle (otherwise they might be skipped over if the region has very little garbage) -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 This flag guarantees that references in old gen regions get processed every 1 second (each iteration takes about 2 seconds on my M1 macbook) -XX:ShenandoahGuaranteedOldGCInterval=1000 Note I played around with the heap size and the allocation rate and found 8GB heap & 80GB allocated to be the most reliable way to reproduce the issue. Source code for GenShenWeakRefLeakRepro.java import java.io.BufferedReader; import java.io.InputStreamReader; import java.lang.ref.WeakReference; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.List; /** * Tests if WeakReferences with old-gen referents leak in Generational Shenandoah. */ public class GenShenWeakRefLeakRepro { // Keep WeakReferences alive in a static list (will be in old gen) private static final List> WEAK_REFS = new ArrayList<>(); private static final long[] COUNTS = new long[2]; static class MyLeakedObject { private final int value; MyLeakedObject(int value) { this.value = value; } } public static void main(String[] args) throws Exception { //allocate garbage to promote WEAK_REFS to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } for (int iteration = 0; iteration < 100; iteration++) { // Create object and weak reference MyLeakedObject obj = new MyLeakedObject(iteration); WeakReference wr = new WeakReference<>(obj); // Store in static list (so WeakRef survives and gets promoted) WEAK_REFS.add(wr); // Allocate garbage to promote both WeakRef and referent to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } // Remove cleared WeakRefs (referent was collected) WEAK_REFS.removeIf(w -> w.get() == null); // Count objects getObjectCounts(); // What remains are WeakRefs with live referents long aliveCount = WEAK_REFS.size(); System.out.println("Iteration " + (iteration + 1) + ": MyLeakedObject=" + COUNTS[0] + ", WeakReference=" + COUNTS[1] + ", WeakRefs with live referent=" + aliveCount); // Periodically force GCs if ((iteration + 1) % 20 == 0) { System.out.println("Forcing GCs..."); for (int i = 0; i < 4; i++) { System.gc(); Thread.sleep(3000); } getObjectCounts(); System.out.println("After GC: MyLeakedObject=" + COUNTS[0] + ", WeakRefs with live referent=" + aliveCount); } } } private static void getObjectCounts() { COUNTS[0] = 0; COUNTS[1] = 0; try { Process p = new ProcessBuilder( "jcmd", String.valueOf(ProcessHandle.current().pid()), "GC.class_histogram", "-all") .start(); try (BufferedReader r = new BufferedReader( new InputStreamReader(p.getInputStream(), StandardCharsets.UTF_8))) { String line; while ((line = r.readLine()) != null) { String[] parts = line.trim().split("\\s+"); if (parts.length >= 4) { if (line.contains("GenShenWeakRefLeakRepro$MyLeakedObject")) { COUNTS[0] = Long.parseLong(parts[1]); } else if (line.contains("java.lang.ref.WeakReference ")) { COUNTS[1] = Long.parseLong(parts[1]); } } } } } catch (Exception e) { System.err.println("Histogram failed: " + e.getMessage()); } } } Thanks, Parker Winchester -------------- next part -------------- An HTML attachment was scrubbed... URL: From btaylor at openjdk.org Fri Dec 5 18:50:41 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Fri, 5 Dec 2025 18:50:41 GMT Subject: RFR: 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics Message-ID: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> The `STATIC_ASSERT` below this typedef appears to be out of date. The barriers check thread local copy of gc state, which is stored in `ShenandoahThreadLocalData::_gc_state` and is type `char`, so the size requirement described by the assert is maintained even after this change. Change passes all tier1 tests locally when run with Shenandoah GC. ------------- Commit messages: - 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics Changes: https://git.openjdk.org/jdk/pull/28681/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28681&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352914 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28681.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28681/head:pull/28681 PR: https://git.openjdk.org/jdk/pull/28681 From wkemper at openjdk.org Fri Dec 5 18:53:37 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 Dec 2025 18:53:37 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: > In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Set requested gc cause under a lock when allocation fails ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28665/files - new: https://git.openjdk.org/jdk/pull/28665/files/89af1701..1081f21e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28665&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28665&range=00-01 Stats: 27 lines in 2 files changed: 2 ins; 8 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/28665.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28665/head:pull/28665 PR: https://git.openjdk.org/jdk/pull/28665 From wkemper at openjdk.org Fri Dec 5 18:53:37 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 Dec 2025 18:53:37 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Fri, 5 Dec 2025 18:50:08 GMT, William Kemper wrote: >> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Set requested gc cause under a lock when allocation fails src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 145: > 143: // Notifies the control thread, but does not update the requested cause or generation. > 144: // The overloaded variant should be used when the _control_lock is already held. > 145: void notify_cancellation(GCCause::Cause cause); These methods were the root cause here. `ShenandoahHeap::_canceled_gc` is read/written atomically, but `ShenandoahGenerationalControlThread::_requested_gc_cause` is read/written under a lock. These `notify_cancellation` methods did _not_ update `_requested_gc_cause` at all. So, in the failure I observed we had: 1. Control thread finishes cycle and sees no cancellation is requested (no lock used). 2. Mutator thread fails allocation, cancels GC (again, no lock used), and does _not_ change `_requested_gc_cause`. 3. Control thread takes `_control_lock` and checks `_requested_gc_cause` and sees `_no_gc` (because `notify_cancellation` didn't change it) and `waits` forever now. The fix here is to replace `notify_cancellation` with `notify_control_thread` which serializes updates to `_requested_gc_cause` under `_control_lock`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28665#discussion_r2593632599 From wkemper at openjdk.org Fri Dec 5 19:04:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 Dec 2025 19:04:56 GMT Subject: RFR: 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics In-Reply-To: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> References: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> Message-ID: On Fri, 5 Dec 2025 18:44:08 GMT, Ben Taylor wrote: > The `STATIC_ASSERT` below this typedef appears to be out of date. > > The barriers check thread local copy of gc state, which is stored in `ShenandoahThreadLocalData::_gc_state` and is type `char`, so the size requirement described by the assert is maintained even after this change. > > Change passes all tier1 tests locally when run with Shenandoah GC. Marked as reviewed by wkemper (Reviewer). This looks good to me, but would appreciate another reviewer. ------------- PR Review: https://git.openjdk.org/jdk/pull/28681#pullrequestreview-3546051530 PR Comment: https://git.openjdk.org/jdk/pull/28681#issuecomment-3618172121 From kdnilsen at openjdk.org Fri Dec 5 19:21:56 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 5 Dec 2025 19:21:56 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Fri, 5 Dec 2025 18:53:37 GMT, William Kemper wrote: >> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Set requested gc cause under a lock when allocation fails Thanks for diligent testing and analysis. Subtle code here. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28665#pullrequestreview-3546110509 From kdnilsen at openjdk.org Fri Dec 5 19:36:56 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 5 Dec 2025 19:36:56 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 02:19:32 GMT, Rui Li wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused freeset includes Changes requested by kdnilsen (Committer). src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 940: > 938: > 939: size_t ShenandoahGeneration::soft_available_exclude_evac_reserve() const { > 940: size_t result = available(ShenandoahHeap::heap()->soft_max_capacity() * (100.0 - ShenandoahEvacReserve) / 100); I'm a little uncomfortable with this approach. It's mostly a question of how we name it. The evac reserve is not always this value. In particular, we may shrink the young evac reserves after we have selected the cset. Also of concern is that if someone invokes this function on old_generation(), it looks like they'll get a bogus (not meaningful) value. I think I'd be more comfortable with naming this to something like "mutator_available_when_gc_is_idle()". If we keep it virtual, then OldGeneration should override with "assert(false, "Not relevant to old generation") ------------- PR Review: https://git.openjdk.org/jdk/pull/28622#pullrequestreview-3546162874 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2593766590 From wkemper at openjdk.org Fri Dec 5 20:02:57 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 Dec 2025 20:02:57 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: References: Message-ID: <0_7ZOhkCLi17a3aMtxAoV_6hfr9FzZPyto3uOeBqODw=.95f213af-ec8b-4ca3-82a0-c0c95e30ad6d@github.com> On Fri, 5 Dec 2025 18:19:39 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add include header shenandoahOldGeneration.hpp The issue, as I understand it, is that mutators are racing with the concurrent remembered set scan. If a mutator changes a pointer covered by a dirty card, it could prevent the remembered set scan from tracing the original object that was reachable at the beginning of marking. Since we may not be marking old, we cannot rely on the TAMS for objects in old regions and must unconditionally enqueue all of the overwritten pointers in the old array. Should we only do this when young marking is in progress? Perhaps we should have a version of `arraycopy_work` that only enqueues young pointers here? ------------- PR Review: https://git.openjdk.org/jdk/pull/28669#pullrequestreview-3546247628 From xpeng at openjdk.org Fri Dec 5 23:01:57 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 23:01:57 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: <0_7ZOhkCLi17a3aMtxAoV_6hfr9FzZPyto3uOeBqODw=.95f213af-ec8b-4ca3-82a0-c0c95e30ad6d@github.com> References: <0_7ZOhkCLi17a3aMtxAoV_6hfr9FzZPyto3uOeBqODw=.95f213af-ec8b-4ca3-82a0-c0c95e30ad6d@github.com> Message-ID: On Fri, 5 Dec 2025 20:00:04 GMT, William Kemper wrote: > The issue, as I understand it, is that mutators are racing with the concurrent remembered set scan. If a mutator changes a pointer covered by a dirty card, it could prevent the remembered set scan from tracing the original object that was reachable at the beginning of marking. Since we may not be marking old, we cannot rely on the TAMS for objects in old regions and must unconditionally enqueue all of the overwritten pointers in the old array. Should we only do this when young marking is in progress? Perhaps we should have a version of `arraycopy_work` that only enqueues young pointers here? I don't think it is related the any racing on remembered set, I got some GC logs from which I think we may know how it actually happens. [15.653s][info][gc,start ] GC(188) Pause Full ... [15.763s][info][gc ] GC(188) Pause Full 913M->707M(1024M) 109.213ms [15.767s][info][gc,ergo ] GC(189) Start GC cycle (Young) ... [15.802s][info][gc ] GC(189) Concurrent reset after collect (Young) 1.160ms [15.802s][info][gc,ergo ] GC(189) At end of Interrupted Concurrent Young GC: Young generation used: 874M, used regions: 874M, humongous waste: 7066K, soft capacity: 1024M, max capacity: 1022M, available: 99071K [15.802s][info][gc,ergo ] GC(189) At end of Interrupted Concurrent Young GC: Old generation used: 1273K, used regions: 1536K, humongous waste: 0B, soft capacity: 1024M, max capacity: 1536K, available: 262K [15.803s][info][gc,metaspace ] GC(189) Metaspace: 759K(960K)->759K(960K) NonClass: 721K(832K)->721K(832K) Class: 38K(128K)->38K(128K) [15.803s][info][gc ] Trigger (Young): Handle Allocation Failure [15.803s][info][gc,start ] GC(190) Pause Full [15.803s][info][gc,task ] GC(190) Using 8 of 8 workers for full gc [15.803s][info][gc,phases,start] GC(190) Phase 1: Mark live objects [15.806s][info][gc,ref ] GC(190) Clearing All SoftReferences References: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> Message-ID: On Fri, 5 Dec 2025 18:44:08 GMT, Ben Taylor wrote: > The `STATIC_ASSERT` below this typedef appears to be out of date. > > The barriers check thread local copy of gc state, which is stored in `ShenandoahThreadLocalData::_gc_state` and is type `char`, so the size requirement described by the assert is maintained even after this change. > > Change passes all tier1 tests locally when run with Shenandoah GC. Any comparative performance numbers? ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28681#pullrequestreview-3546716696 From wkemper at openjdk.org Fri Dec 5 23:23:00 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 Dec 2025 23:23:00 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 18:19:39 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add include header shenandoahOldGeneration.hpp At step 2, we have an element in the old array pointing to young, correct? Why is it not represented in the remembered set at the beginning of young mark? If it is because the old -> young pointer was created _after_ init mark, then the young pointer was either reachable when mark started, or it was created after mark started. Either way, the young pointer should have been found without this SATB modification. Unless, it was in the remembered set, but it didn't get scanned because a mutator modified it before it was scanned. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3618939846 From xpeng at openjdk.org Fri Dec 5 23:34:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 23:34:56 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: References: Message-ID: <8yfYsnqygCX37e1fTQOGMs-MRDjVrgmDX-pp799MDfk=.35732036-5d3a-4cc6-ad4e-872e099b6ebf@github.com> On Fri, 5 Dec 2025 23:19:54 GMT, William Kemper wrote: > At step 2, we have an element in the old array pointing to young, correct? Why is it not represented in the remembered set at the beginning of young mark? If it is because the old -> young pointer was created _after_ init mark, then the young pointer was either reachable when mark started, or it was created after mark started. Either way, the young pointer should have been found without this SATB modification. Unless, it was in the remembered set, but it didn't get scanned because a mutator modified it before it was scanned. array copy involves two array object, src and dst. The dst array is an old array, the src may not be an old, it could be young. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3618958253 From kemperw at amazon.com Sat Dec 6 00:51:28 2025 From: kemperw at amazon.com (Kemper, William) Date: Sat, 6 Dec 2025 00:51:28 +0000 Subject: Reference leak in old gen in Generational Shenandoah In-Reply-To: References: Message-ID: I created https://bugs.openjdk.org/browse/JDK-8373203 to track progress. ________________________________ From: shenandoah-dev on behalf of Parker Winchester Sent: Friday, December 5, 2025 8:32:36 AM To: shenandoah-dev at openjdk.org Subject: [EXTERNAL] Reference leak in old gen in Generational Shenandoah CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. We just upgraded to JDK25 and are trying out Generational Shenandoah, coming from ZGC. We noticed native memory (in the "other" category) due to direct byte buffers steadily increasing and not getting freed - despite these DirectByteBuffer objects becoming unreachable and the GC clearly running frequently. One service of ours hit 2GB of native memory used after 24 hours, ultimately causing our service to be OOMKilled. Triggering GC's manually by taking a (live) heap histogram clears the native memory, so this seems to be a failure of the GC to find and clean up certain objects, rather than a true "leak." We tracked this down to issues with Undertow's DefaultByteBufferPool, which uses Finalizers and WeakHashMaps - these both use types of references (eg WeakReferences) that need at least one additional GC cycle to be removed by the GC. I plan to submit a change to Undertow's code to reduce its reliance on these, but it's possible this issue impacts other code, so I produced a minimal repro of it that doesn't use native memory. I believe the issue is a Reference in the old generation will sometimes fail to be discovered by the GC. A reference in the old gen will not be encountered by any young gen collections. And when it gets encountered in the old gen, should_discover() is returning false, so there's no way for it to ever be enqueued. I think this is due to the references being wrongly considered strongly live: [23.999s][trace][gc,ref ] GC(259) Encountered Reference: 0x000000030000b6e8 (Weak, OLD) [23.999s][trace][gc,ref ] GC(259) Reference strongly live: 0x000000030000b6e8 My minimal repro uses weak references, but I also noticed the issue with phantom references due to DirectByteBuffer Summary of my repro Each iteration it: * Allocates a simple object (MyLeakedObject - only necessary so it has a class name in the heap histogram) as well as a WeakReference to it. * It stores the WeakReference in a static list (this part appears to be necessary to the repro) * It then allocates a lot of garbage (80GB in a 8GB heap size) to force the object and the WeakReference to be promoted to the old gen * It then iterates over the static list and removes any WeakReferences with null referents * It then takes a heap histogram (not live, so we don't trigger GC), and prints the counts of MyLeakedObject and WeakReference * The loop then continues, allowing the object and its WeakReference to go out of scope. * Every 20 iterations it runs several system.gc() calls to prove that the counts return to 0 (system.gc() triggers a "global" GC which is different than an old gen GC). The count will go up each iteration until the system.gc(): Iteration 1: MyLeakedObject=1, WeakReference=5, WeakRefs with live referent=1 Iteration 2: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Iteration 3: MyLeakedObject=3, WeakReference=7, WeakRefs with live referent=3 Iteration 4: MyLeakedObject=4, WeakReference=8, WeakRefs with live referent=4 Iteration 5: MyLeakedObject=5, WeakReference=9, WeakRefs with live referent=5 Iteration 6: MyLeakedObject=6, WeakReference=10, WeakRefs with live referent=6 Iteration 7: MyLeakedObject=7, WeakReference=11, WeakRefs with live referent=7 Iteration 8: MyLeakedObject=8, WeakReference=12, WeakRefs with live referent=8 Iteration 9: MyLeakedObject=9, WeakReference=13, WeakRefs with live referent=9 Iteration 10: MyLeakedObject=10, WeakReference=14, WeakRefs with live referent=10 Iteration 11: MyLeakedObject=11, WeakReference=15, WeakRefs with live referent=11 Iteration 12: MyLeakedObject=12, WeakReference=16, WeakRefs with live referent=12 Iteration 13: MyLeakedObject=13, WeakReference=17, WeakRefs with live referent=13 Iteration 14: MyLeakedObject=14, WeakReference=18, WeakRefs with live referent=14 Iteration 15: MyLeakedObject=15, WeakReference=19, WeakRefs with live referent=15 Iteration 16: MyLeakedObject=16, WeakReference=20, WeakRefs with live referent=16 Iteration 17: MyLeakedObject=17, WeakReference=21, WeakRefs with live referent=17 Iteration 18: MyLeakedObject=18, WeakReference=22, WeakRefs with live referent=18 Iteration 19: MyLeakedObject=19, WeakReference=23, WeakRefs with live referent=19 Iteration 20: MyLeakedObject=20, WeakReference=24, WeakRefs with live referent=20 Forcing GCs... Iteration 21: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Expected behavior: Each iteration should see only 1 at most 2 of MyLeakedObject, since they are no longer in scope and sufficient GC activity (young + old gen GCs) has occurred Actual behavior: Each iteration adds an additional MyLeakedObject and its WeakReference, leading to a leak I have only tested with Corretto on Ubuntu & OSX openjdk 25.0.1 2025-10-21 LTS OpenJDK Runtime Environment Corretto-25.0.1.8.1 (build 25.0.1+8-LTS) OpenJDK 64-Bit Server VM Corretto-25.0.1.8.1 (build 25.0.1+8-LTS, mixed mode, sharing) I've tried with non-generational shenandoah (mode=satb) and the issue does not occur. It also does not occur for ZGC or G1. I had a version of the repro that used DirectByteBuffers which yielded these results, strictly looking at reference processing in old gen GCs (running with -Xlog:gc*=info,gc+ref=trace) Iteration 1: Native Memory = 1 KB [20.423s][info ][gc,ref ] GC(46) Encountered references: Soft: 66, Weak: 183, Final: 0, Phantom: 3 [20.423s][info ][gc,ref ] GC(46) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [20.423s][info ][gc,ref ] GC(46) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 2: Native Memory = 2 KB [30.687s][info ][gc,ref ] GC(52) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 4 [30.688s][info ][gc,ref ] GC(52) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [30.688s][info ][gc,ref ] GC(52) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 3: Native Memory = 3 KB [54.496s][info ][gc,ref ] GC(70) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 5 [54.496s][info ][gc,ref ] GC(70) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 1 [54.496s][info ][gc,ref ] GC(70) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 4: Native Memory = 4 KB [93.706s][info ][gc,ref ] GC(91) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 6 [93.706s][info ][gc,ref ] GC(91) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [93.706s][info ][gc,ref ] GC(91) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 It's a little easier to see with DirectByteBuffer's Phantom references (there are 100+ unrelated WeakReferences, I believe these are used internally). Each iteration it adds another Phantom reference which is encountered, but fails to be discovered (due to being considered strongly live) Run the repro with: java -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 -XX:ShenandoahGuaranteedOldGCInterval=1000 -XX:+AlwaysPreTouch -Xmx8g -Xms8g GenShenWeakRefLeakRepro These flags help prove that the references are guaranteed to be encountered during each old gen GC cycle (otherwise they might be skipped over if the region has very little garbage) -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 This flag guarantees that references in old gen regions get processed every 1 second (each iteration takes about 2 seconds on my M1 macbook) -XX:ShenandoahGuaranteedOldGCInterval=1000 Note I played around with the heap size and the allocation rate and found 8GB heap & 80GB allocated to be the most reliable way to reproduce the issue. Source code for GenShenWeakRefLeakRepro.java import java.io.BufferedReader; import java.io.InputStreamReader; import java.lang.ref.WeakReference; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.List; /** * Tests if WeakReferences with old-gen referents leak in Generational Shenandoah. */ public class GenShenWeakRefLeakRepro { // Keep WeakReferences alive in a static list (will be in old gen) private static final List> WEAK_REFS = new ArrayList<>(); private static final long[] COUNTS = new long[2]; static class MyLeakedObject { private final int value; MyLeakedObject(int value) { this.value = value; } } public static void main(String[] args) throws Exception { //allocate garbage to promote WEAK_REFS to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } for (int iteration = 0; iteration < 100; iteration++) { // Create object and weak reference MyLeakedObject obj = new MyLeakedObject(iteration); WeakReference wr = new WeakReference<>(obj); // Store in static list (so WeakRef survives and gets promoted) WEAK_REFS.add(wr); // Allocate garbage to promote both WeakRef and referent to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } // Remove cleared WeakRefs (referent was collected) WEAK_REFS.removeIf(w -> w.get() == null); // Count objects getObjectCounts(); // What remains are WeakRefs with live referents long aliveCount = WEAK_REFS.size(); System.out.println("Iteration " + (iteration + 1) + ": MyLeakedObject=" + COUNTS[0] + ", WeakReference=" + COUNTS[1] + ", WeakRefs with live referent=" + aliveCount); // Periodically force GCs if ((iteration + 1) % 20 == 0) { System.out.println("Forcing GCs..."); for (int i = 0; i < 4; i++) { System.gc(); Thread.sleep(3000); } getObjectCounts(); System.out.println("After GC: MyLeakedObject=" + COUNTS[0] + ", WeakRefs with live referent=" + aliveCount); } } } private static void getObjectCounts() { COUNTS[0] = 0; COUNTS[1] = 0; try { Process p = new ProcessBuilder( "jcmd", String.valueOf(ProcessHandle.current().pid()), "GC.class_histogram", "-all") .start(); try (BufferedReader r = new BufferedReader( new InputStreamReader(p.getInputStream(), StandardCharsets.UTF_8))) { String line; while ((line = r.readLine()) != null) { String[] parts = line.trim().split("\\s+"); if (parts.length >= 4) { if (line.contains("GenShenWeakRefLeakRepro$MyLeakedObject")) { COUNTS[0] = Long.parseLong(parts[1]); } else if (line.contains("java.lang.ref.WeakReference ")) { COUNTS[1] = Long.parseLong(parts[1]); } } } } } catch (Exception e) { System.err.println("Histogram failed: " + e.getMessage()); } } } Thanks, Parker Winchester -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkemper at openjdk.org Sat Dec 6 00:52:01 2025 From: wkemper at openjdk.org (William Kemper) Date: Sat, 6 Dec 2025 00:52:01 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 18:19:39 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add include header shenandoahOldGeneration.hpp We talked offline. The assertion must be weakened to account for dirty write cards because the young pointer could be put in the old array _after_ init mark. We cannot expect the read card to be dirty in this case. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28669#pullrequestreview-3546805335 From zlin at openjdk.org Sat Dec 6 12:07:04 2025 From: zlin at openjdk.org (Zihao Lin) Date: Sat, 6 Dec 2025 12:07:04 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v15] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' into 8344116 - Merge branch 'master' into 8344116 - remove adr_type from graphKit - Fix test failed - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - fix conflict - Merge master - remove C2AccessValuePtr - fix assert - ... and 8 more: https://git.openjdk.org/jdk/compare/b0f59f60...c526f021 ------------- Changes: https://git.openjdk.org/jdk/pull/24258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=14 Stats: 316 lines in 22 files changed: 47 ins; 89 del; 180 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From qamai at openjdk.org Sun Dec 7 12:12:20 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Dec 2025 12:12:20 GMT Subject: RFR: 8372779: C2: Disambiguate Node::adr_type for the IR graph [v3] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Node::adr_type` is ambiguous. For some, it refers to the memory the node consumes, while for the others, it refer to the memory the node produces. This PR removes that ambiguity by introducing `Node::in_adr_type` and `Node::out_adr_type` that refer to those properties, respectively. It also introduces a local verification of the memory graph during compilation. These additions uncover some issues: > > - Sometimes, the memory is wired incorrectly, such as in `LibraryCall::extend_setCurrentThread`, the `Phi` collect the `StoreNode`s instead of the whole memory state. I think these issues do not result in crashes or miscompilation, though. > - `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. > - For nodes such as `StrInflatedCopyNode`, as it consumes more than it produces, during scheduling, we need to compute anti-dependencies. This is not the case, so I fixed it by making it kill all the memory it consumes. > - `GraphKit::set_output_for_allocation` uses a raw `ProjNode` as the base for a `MergeMem`, this is really suspicious. I didn't fix it, as it seems to not result in any symptom at the moment. > > In the end, the execution of the compiler is strictly more restricted than before, and there is less room for ambiguity. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into adrtype - store_to_memory does not emit MemBars - Disambiguate Node::adr_type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28570/files - new: https://git.openjdk.org/jdk/pull/28570/files/b39029a3..ec31fb75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28570&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28570&range=01-02 Stats: 29305 lines in 803 files changed: 17601 ins; 8334 del; 3370 mod Patch: https://git.openjdk.org/jdk/pull/28570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28570/head:pull/28570 PR: https://git.openjdk.org/jdk/pull/28570 From kdnilsen at openjdk.org Sun Dec 7 17:54:24 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 7 Dec 2025 17:54:24 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics Message-ID: When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. ------------- Commit messages: - make old evac ratio adaptive - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers - change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers - Adjust test for new defaults - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers - Change secondary old trigger to be percent of young-gen heap size - add trigger for percent of heap growth Changes: https://git.openjdk.org/jdk/pull/28561/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373225 Stats: 92 lines in 9 files changed: 74 ins; 1 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/28561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28561/head:pull/28561 PR: https://git.openjdk.org/jdk/pull/28561 From kdnilsen at openjdk.org Sun Dec 7 17:54:24 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 7 Dec 2025 17:54:24 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 01:10:02 GMT, Kelvin Nilsen wrote: > When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. > > When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. The benefits of this PR are demonstrated on an Extremem workload. Comparisons with master are highighted in this spreadsheet: image Highlights: 1. Far fewer old GCs, with slight increase in young GCs (74.45% improvement) 2. Since old GCs are much more costly than young GCs, 4.5% improvement in CPU utilization. 3. Latencies improved across all percentiles (from small increase of 0.3% at p50 to significant increase of 51.2% at p99.999) The workload is configured as follows: ~/github/jdk.11-17-2025/build/linux-x86_64-server-release/images/jdk/bin/java \ -XX:+UnlockExperimentalVMOptions \ -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms8g -Xmx8g \ -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \ -XX:ShenandoahMinFreeThreshold=5 \ -XX:ShenandoahFullGCThreshold=1024 \ -Xlog:"gc*=info,ergo" \ -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \ -XX:+UnlockDiagnosticVMOptions \ -jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \ -dInitializationDelay=45s \ -dDictionarySize=3000000 \ -dNumCustomers=300000 \ -dNumProducts=60000 \ -dCustomerThreads=750 \ -dCustomerPeriod=1600ms \ -dCustomerThinkTime=300ms \ -dKeywordSearchCount=4 \ -dServerThreads=5 \ -dServerPeriod=1s \ -dProductNameLength=10 \ -dBrowsingHistoryQueueCount=5 \ -dSalesTransactionQueueCount=5 \ -dProductDescriptionLength=32 \ -dProductReplacementPeriod=10s \ -dProductReplacementCount=10000 \ -dCustomerReplacementPeriod=5s \ -dCustomerReplacementCount=1000 \ -dBrowsingExpiration=1m \ -dPhasedUpdates=true \ -dPhasedUpdateInterval=30s \ -dSimulationDuration=25m \ -dResponseTimeMeasurements=100000 \ >$t.genshen.reproducer.baseline-8g.out 2>$t.genshen.reproducer.baseline-8g.err & job_pid=$! max_rss_kb=0 for s in {1..99} do sleep 15 rss_kb=$(ps -o rss= -p $job_pid) if (( $rss_kb > $max_rss_kb )) then max_rss_kb=$rss_kb fi done rss_mb=$((max_rss_kb / 1024)) cpu_percent=$(ps -o cputime -o etime -p $job_pid) wait $job_pid echo "RSS: $rss_mb MB" >>$t.genshen.reproducer.baseline-8g.out 2>>$t.genshen.reproducer.share-collector-reserves.err echo "$cpu_percent" >>$t.genshen.reproducer.baseline-8g.out 2>>$t.genshen.reproducer.share-collector-reserves.err gzip $t.genshen.reproducer.baseline-8g.out $t.genshen.reproducer.baseline-8g.err Note that this PR causes us to operate closer to the edge of the operating envelope. In more aggressively provisioned configurations (same workload in smaller heap, for example), we see some regression in latencies compared to tip. This results because of increased numbers of degenerated GCs which result from starvation of mixed evacuations. This PR causes us to do fewer old GCs, but each old GC is expected to work more efficiently. We expect these regressions to be mitigated by other PRs that are currently under development and review, including: 1. Sharing of collector reserves between young and old 2. Accelerated triggers 3. Surging of GC workers 4. Adaptive old-evac ratio ------------- PR Comment: https://git.openjdk.org/jdk/pull/28561#issuecomment-3622610260 PR Comment: https://git.openjdk.org/jdk/pull/28561#issuecomment-3622625901 From xpeng at openjdk.org Sun Dec 7 20:34:13 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sun, 7 Dec 2025 20:34:13 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v6] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: enqueue objects stored in old array at ShenandoahSATBBarrier when concurrent young marking is in progress ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/49ea3c93..c649cf2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669