From qamai at openjdk.org Mon Dec 1 06:28:04 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 1 Dec 2025 06:28:04 GMT Subject: RFR: 8372779: C2: Disambiguate Node::adr_type for the IR graph Message-ID: Hi, Currently, `Node::adr_type` is ambiguous. For some, it refers to the memory the node consumes, while for the others, it refer to the memory the node produces. This PR removes that ambiguity by introducing `Node::in_adr_type` and `Node::out_adr_type` that refer to those properties, respectively. It also introduces a local verification of the memory graph during compilation. These additions uncover some issues: - Sometimes, the memory is wired incorrectly, such as in `LibraryCall::extend_setCurrentThread`, the `Phi` collect the `StoreNode`s instead of the whole memory state. I think these issues do not result in crashes or miscompilation, though. - `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. - For nodes such as `StrInflatedCopyNode`, as it consumes more than it produces, during scheduling, we need to compute anti-dependencies. This is not the case, so I fixed it by making it kill all the memory it consumes. - `GraphKit::set_output_for_allocation` uses a raw `ProjNode` as the base for a `MergeMem`, this is really suspicious. I didn't fix it, as it seems to not result in any symptom at the moment. In the end, the execution of the compiler is strictly more restricted than before, and there is less room for ambiguity. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - Disambiguate Node::adr_type Changes: https://git.openjdk.org/jdk/pull/28570/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28570&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372779 Stats: 629 lines in 36 files changed: 403 ins; 72 del; 154 mod Patch: https://git.openjdk.org/jdk/pull/28570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28570/head:pull/28570 PR: https://git.openjdk.org/jdk/pull/28570 From qamai at openjdk.org Mon Dec 1 10:53:48 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 1 Dec 2025 10:53:48 GMT Subject: RFR: 8372779: C2: Disambiguate Node::adr_type for the IR graph [v2] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Node::adr_type` is ambiguous. For some, it refers to the memory the node consumes, while for the others, it refer to the memory the node produces. This PR removes that ambiguity by introducing `Node::in_adr_type` and `Node::out_adr_type` that refer to those properties, respectively. It also introduces a local verification of the memory graph during compilation. These additions uncover some issues: > > - Sometimes, the memory is wired incorrectly, such as in `LibraryCall::extend_setCurrentThread`, the `Phi` collect the `StoreNode`s instead of the whole memory state. I think these issues do not result in crashes or miscompilation, though. > - `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. > - For nodes such as `StrInflatedCopyNode`, as it consumes more than it produces, during scheduling, we need to compute anti-dependencies. This is not the case, so I fixed it by making it kill all the memory it consumes. > - `GraphKit::set_output_for_allocation` uses a raw `ProjNode` as the base for a `MergeMem`, this is really suspicious. I didn't fix it, as it seems to not result in any symptom at the moment. > > In the end, the execution of the compiler is strictly more restricted than before, and there is less room for ambiguity. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: store_to_memory does not emit MemBars ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28570/files - new: https://git.openjdk.org/jdk/pull/28570/files/10c0303f..b39029a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28570&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28570&range=00-01 Stats: 9 lines in 1 file changed: 4 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28570/head:pull/28570 PR: https://git.openjdk.org/jdk/pull/28570 From wkemper at openjdk.org Mon Dec 1 15:40:34 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 Dec 2025 15:40:34 GMT Subject: Integrated: 8372444: Genshen: Optimize evacuation function In-Reply-To: References: Message-ID: <2aym_7c9MkBuruXbGMpz4DjKubklrkrf_U7w_Yc81Ck=.3fe75278-3246-48bb-8c28-7de179adadb0@github.com> On Tue, 25 Nov 2025 17:33:01 GMT, William Kemper wrote: > This is a hot code path. Many of the branches can be eliminated at compile time by introducing template parameters. This change shows a 5% reduction in concurrent evacuation times at the trimmed-10 average on the extremem benchmark: > > > gen/control/extremem > Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum > concurrent_evacuation_young_data | 65 | 9625198.000 | 118747.249 | 148079.969 | 145182.189 | 76534.845 | 7216.000 | 317261.000 > > gen/template/extremem > Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum > concurrent_evacuation_young_data | 65 | 9095084.000 | 113036.539 | 139924.369 | 137661.226 | 71091.273 | 7523.000 | 294442.000 This pull request has now been integrated. Changeset: a1cc8f4e Author: William Kemper URL: https://git.openjdk.org/jdk/commit/a1cc8f4e4107e361f64cf51ff73985e471cdde03 Stats: 54 lines in 5 files changed: 15 ins; 13 del; 26 mod 8372444: Genshen: Optimize evacuation function Reviewed-by: ysr, xpeng ------------- PR: https://git.openjdk.org/jdk/pull/28496 From wkemper at openjdk.org Mon Dec 1 15:40:32 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 Dec 2025 15:40:32 GMT Subject: RFR: 8372444: Genshen: Optimize evacuation function In-Reply-To: <1H7ReDFaqSzRUxHiPQgoHwqr9nGnieCTkypgz2m5Z4I=.0bfbe08e-c85f-4ad0-805e-d94c709057d1@github.com> References: <1H7ReDFaqSzRUxHiPQgoHwqr9nGnieCTkypgz2m5Z4I=.0bfbe08e-c85f-4ad0-805e-d94c709057d1@github.com> Message-ID: On Wed, 26 Nov 2025 02:13:39 GMT, Y. Srinivas Ramakrishna wrote: >> This is a hot code path. Many of the branches can be eliminated at compile time by introducing template parameters. This change shows a 5% reduction in concurrent evacuation times at the trimmed-10 average on the extremem benchmark: >> >> >> gen/control/extremem >> Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum >> concurrent_evacuation_young_data | 65 | 9625198.000 | 118747.249 | 148079.969 | 145182.189 | 76534.845 | 7216.000 | 317261.000 >> >> gen/template/extremem >> Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum >> concurrent_evacuation_young_data | 65 | 9095084.000 | 113036.539 | 139924.369 | 137661.226 | 71091.273 | 7523.000 | 294442.000 > > LGTM. Impressed that templatization led to such a substantial improvement. > Should the 2-case switch in `ShenandoahGenerationalHeap::evacuate_object()` be converted to an `if-else` ? @ysramakrishna , we have had reviewers in the past ask us to use a switch with this enumeration. I agree an `if/else` would be more compact here, but it would probably belong on a different pull request. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28496#issuecomment-3597245231 From ysr at openjdk.org Mon Dec 1 16:34:40 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 1 Dec 2025 16:34:40 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 In-Reply-To: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: <4h1gCnsTk4_Gbl77bm_oe2S5deR7LGajZmvsPcDV5dw=.b29abfd8-1649-433f-b392-f959e63a817c@github.com> On Thu, 27 Nov 2025 03:10:18 GMT, Xiaolong Peng wrote: > For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. > > The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. > > Tests: > - [x] specjbb, no crash > - [x] hotspot_gc_shenandoah Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28521#pullrequestreview-3525935185 From wkemper at openjdk.org Mon Dec 1 16:37:40 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 Dec 2025 16:37:40 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 In-Reply-To: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: On Thu, 27 Nov 2025 03:10:18 GMT, Xiaolong Peng wrote: > For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. > > The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. > > Tests: > - [x] specjbb, no crash > - [x] hotspot_gc_shenandoah Given that https://github.com/openjdk/jdk/pull/28247 significantly changed the encoding of the request `type` here, and we already missed an incorrect usage in one spot, I think we should use the member function to test for a shared/lab allocation. Can we also check if there are other uses of `type()` that may not be safe now? ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28521#pullrequestreview-3525951105 From xpeng at openjdk.org Mon Dec 1 16:44:27 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 16:44:27 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 In-Reply-To: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: On Thu, 27 Nov 2025 03:10:18 GMT, Xiaolong Peng wrote: > For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. > > The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. > > Tests: > - [x] specjbb, no crash > - [x] hotspot_gc_shenandoah > Given that #28247 significantly changed the encoding of the request `type` here, and we already missed an incorrect usage in one spot, I think we should use the member function to test for a shared/lab allocation. Can we also check if there are other uses of `type()` that may not be safe now? I did briefly check all the places where `type()` is called, it should be good now, there are two more places we could improve but won't causes bug, I initially added them in the PR and decided to revert them since they are not related to this bug fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28521#issuecomment-3597622168 From xpeng at openjdk.org Mon Dec 1 16:59:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 16:59:16 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 [v2] In-Reply-To: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> > For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. > > The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. > > Tests: > - [x] specjbb, no crash > - [x] hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Use member function is_lab_alloc() instead of test the value of type() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28521/files - new: https://git.openjdk.org/jdk/pull/28521/files/24dae41f..ad82a691 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28521&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28521&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28521/head:pull/28521 PR: https://git.openjdk.org/jdk/pull/28521 From xpeng at openjdk.org Mon Dec 1 16:59:18 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 16:59:18 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 In-Reply-To: References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: On Mon, 1 Dec 2025 16:42:16 GMT, Xiaolong Peng wrote: > Given that #28247 significantly changed the encoding of the request `type` here, and we already missed an incorrect usage in one spot, I think we should use the member function to test for a shared/lab allocation. Can we also check if there are other uses of `type()` that may not be safe now? I have updated the PR to use the member function to test for a shared/lab allocation, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28521#issuecomment-3597721121 From wkemper at openjdk.org Mon Dec 1 17:26:26 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 Dec 2025 17:26:26 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 [v2] In-Reply-To: <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> Message-ID: On Mon, 1 Dec 2025 16:59:16 GMT, Xiaolong Peng wrote: >> For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. >> >> The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. >> >> Tests: >> - [x] specjbb, no crash >> - [x] hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Use member function is_lab_alloc() instead of test the value of type() Thank you. I think we should try to remove other uses of `type()` in a separate PR. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28521#pullrequestreview-3526162449 From xpeng at openjdk.org Mon Dec 1 17:31:28 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 17:31:28 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 [v2] In-Reply-To: References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> Message-ID: On Mon, 1 Dec 2025 17:23:34 GMT, William Kemper wrote: > Thank you. I think we should try to remove other uses of `type()` in a separate PR. Thanks, I'll see if we can remove it, I believe we should be able to remove it from all the places except logging. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28521#issuecomment-3597890180 From xpeng at openjdk.org Mon Dec 1 18:30:48 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 18:30:48 GMT Subject: RFR: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 [v2] In-Reply-To: <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> <6Y104R2wl_Z1TSFnYaYXRQdOzZGCCxXigrwBK8RM_r4=.265e3f37-88ff-4189-865b-8c17b6b8317a@github.com> Message-ID: On Mon, 1 Dec 2025 16:59:16 GMT, Xiaolong Peng wrote: >> For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. >> >> The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. >> >> Tests: >> - [x] specjbb, no crash >> - [x] hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Use member function is_lab_alloc() instead of test the value of type() Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28521#issuecomment-3598205309 From xpeng at openjdk.org Mon Dec 1 18:34:02 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 1 Dec 2025 18:34:02 GMT Subject: Integrated: 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 In-Reply-To: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> References: <9K39uUxWtW6O-UsFRqrUXttqHu1K29lVAYNHcFMTaoc=.ab3856ff-55ff-4a23-ab33-870c9713e6ab@github.com> Message-ID: On Thu, 27 Nov 2025 03:10:18 GMT, Xiaolong Peng wrote: > For non-plab allocs in old gen, the objects need to be registered in card table, which was missed in the [PR](https://git.openjdk.org/jdk/pull/28247) for JDK-8371667. The bug didn't cause jtreg test failures in GHA and my local test, but when I ran specjbb benchmarks, it did cause crash at ShenandoahScanRemembered::process_clusters when GC scans remembered set. > > The bug may cause other issue since the object in old gen is not properly registered, e.g. marking phase have wrong result. > > Tests: > - [x] specjbb, no crash > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: 79e99bb0 Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/79e99bb0778608733a677821a0bb35041e9fb939 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8372566: Genshen: crash at ShenandoahScanRemembered::process_clusters after JDK-8371667 Reviewed-by: wkemper, kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/28521 From dlong at openjdk.org Tue Dec 2 05:35:50 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 2 Dec 2025 05:35:50 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 05:11:47 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge master > - update make_barrier_type > - Merge branch 'openjdk:master' into new_pr > - Merge branch 'openjdk:master' into new_pr > - My chages Yes, bring it over, as it's an improvement. However, I was wondering if there was a way we can get rid of the remaining `#if INCLUDE_SHENANDOAHGC` in shared c2 code. The first idea that I came up with is for the GC init to reference a callback function for C2, but I'm not sure if the complexity is worth it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3600260250 From roland at openjdk.org Tue Dec 2 09:20:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:20:41 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into JDK-8354282 - whitespace - review - review - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java Co-authored-by: Christian Hagedorn - review - review - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 ------------- Changes: https://git.openjdk.org/jdk/pull/24575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=07 Stats: 365 lines in 13 files changed: 264 ins; 27 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From chagedorn at openjdk.org Tue Dec 2 13:54:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:26 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <2xxjKX6hMeKDfS9SGBEvll8yadDthCoUjCIRpaE8ObA=.b567ec00-7dad-4b57-82a4-db1149fc8942@github.com> On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 Thanks for the update, it looks good to me! If @eme64 also agrees with the latest patch, we can submit some testing and then hopefully get it in right before the fork. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3530251375 From chagedorn at openjdk.org Tue Dec 2 13:54:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:29 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> Message-ID: On Thu, 27 Nov 2025 12:29:10 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/castnode.hpp line 101: >> >>> 99: } >>> 100: return NonFloatingNonNarrowing; >>> 101: } >> >> Just a side note: We seem to mix the terms "(non-)pinned" with "(non-)floating" freely. Should we stick to just one? But maybe it's justified to use both depending on the situation/code context. > > The patch as it is now adds some extra uses of "pinned" and "floating". What could make sense, I suppose, would be to try to use "floating"/"non floating" instead but there are so many uses of "pinned" in the code base already, and I don't see us getting rid of them, that I wonder if it would make a difference. So, I'm not too sure what to do. Yes, that's true. I was also unsure about whether we should stick with one or just allow both interchangeably. I guess since there are so many uses, we can just move forward with what you have now and still come back to clean it up if necessary - we can always do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581285955 From chagedorn at openjdk.org Tue Dec 2 13:54:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:34 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> References: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> Message-ID: On Wed, 26 Nov 2025 13:24:05 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - review >> - review >> - Merge branch 'master' into JDK-8354282 >> - review >> - infinite loop in gvn fix >> - renaming >> - merge >> - Merge branch 'master' into JDK-8354282 >> - fix & test > > src/hotspot/share/opto/castnode.hpp line 120: > >> 118: // be removed in any case otherwise the sunk node floats back into the loop. >> 119: static const DependencyType NonFloatingNonNarrowing; >> 120: > > I needed a moment to completely understand all these combinations. I rewrote the definitions in this process a little bit. Feel free to take some of it over: > > > // All the possible combinations of floating/narrowing with example use cases: > > // Use case example: Range Check CastII > // Floating: The Cast is only dependent on the single range check. > // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > // remove the cast because the array access will be safe. > static const DependencyType FloatingNarrowing; > > // Use case example: Widening Cast nodes' types after loop opts: We want to common Casts with slightly different types. > // Floating: These Casts only depend on the single control. > // NonNarrowing: Even when the input type is narrower, we are not removing the Cast. Otherwise, the dependency > // to the single control is lost, and an array access could float above its range check because we > // just removed the dependency to the range check by removing the Cast. This could lead to an > // out-of-bounds access. > static const DependencyType FloatingNonNarrowing; > > // Use case example: An array accesses that is no longer dependent on a single range check (e.g. range check smearing). > // NonFloating: The array access must be pinned below all the checks it depends on. If the check it directly depends > // on with a control input is hoisted, we do hoist the Cast as well. If we allowed the Cast to float, > // we risk that the array access ends up above another check it depends on (we cannot model two control > // dependencies for a node in the IR). This could lead to an out-of-bounds access. > // Narrowing: If the Cast does not narrow the input type, then it's safe to remove the cast because the array access > // will be safe. > static const DependencyType NonFloatingNarrowing; > > // Use case example: Sinking nodes out of a loop > // Non-Floating & Non-Narrowing: We don't want the Cast that forces the node to be out of loop to be removed in any > // case. Otherwise, the sunk node could float back into the l... Thanks for taking it over :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581287358 From epeter at openjdk.org Tue Dec 2 15:32:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:55 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 src/hotspot/share/opto/castnode.hpp line 108: > 106: // Floating: The Cast is only dependent on the single range check. > 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > 108: // remove the cast because the array access will be safe. The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? So is it not therefore already pinned? Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581630546 From epeter at openjdk.org Tue Dec 2 15:32:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:57 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> Message-ID: <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> On Tue, 2 Dec 2025 15:17:38 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/castnode.hpp line 108: >> >>> 106: // Floating: The Cast is only dependent on the single range check. >>> 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >>> 108: // remove the cast because the array access will be safe. >> >> The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? >> So is it not therefore already pinned? >> >> Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? > > Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581649395 From epeter at openjdk.org Tue Dec 2 15:32:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:56 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> On Tue, 2 Dec 2025 15:14:28 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into JDK-8354282 >> - whitespace >> - review >> - review >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java >> >> Co-authored-by: Christian Hagedorn >> - review >> - review >> - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 > > src/hotspot/share/opto/castnode.hpp line 108: > >> 106: // Floating: The Cast is only dependent on the single range check. >> 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >> 108: // remove the cast because the array access will be safe. > > The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? > So is it not therefore already pinned? > > Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581642290 From epeter at openjdk.org Tue Dec 2 15:32:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:58 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> Message-ID: <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> On Tue, 2 Dec 2025 15:19:26 GMT, Emanuel Peter wrote: >> Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... > > At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. Suggestion: // Use case example: Range Check CastII // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted // is would be safe to let the the Cast float to where the range check is hoisted up to. // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely // remove the cast because the array access will be safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581692285 From epeter at openjdk.org Tue Dec 2 16:52:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 16:52:30 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <9zey9SqquL1zLlFLuyKV_18OiZs2UQSokhREx9ln0l0=.edc15ede-e798-4d88-b61a-d2ed086d99da@github.com> On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 @rwestrel Nice work! We not just only fixed the bug but made the concepts much clearer. This makes me very happy ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3531172652 From epeter at openjdk.org Tue Dec 2 16:52:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 16:52:32 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 15:29:42 GMT, Emanuel Peter wrote: >> At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. > > Suggestion: > > // Use case example: Range Check CastII > // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted > // is would be safe to let the the Cast float to where the range check is hoisted up to. > // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > // remove the cast because the array access will be safe. Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582034834 From qamai at openjdk.org Tue Dec 2 17:48:43 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 17:48:43 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 src/hotspot/share/opto/castnode.hpp line 105: > 103: // All the possible combinations of floating/narrowing with example use cases: > 104: > 105: // Use case example: Range Check CastII I believe this is incorrect, a range check should be floating non-narrowing. It is only narrowing if the length of the array is a constant. It is because this cast encodes the dependency on the condition `index u< length`. This condition cannot be expressed in terms of `Type` unless `length` is a constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582188782 From qamai at openjdk.org Tue Dec 2 17:48:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 17:48:44 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 16:48:55 GMT, Emanuel Peter wrote: >> Suggestion: >> >> // Use case example: Range Check CastII >> // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted >> // it would be safe to let the the Cast float to where the range check is hoisted up to. >> // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >> // remove the cast because the array access will be safe. > > Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. Naming is hard, but it is worth pointing out in the comment that floating here refers to `depends_only_on_test`. In other words, a cast is considered floating if it is legal to change the control input of a cast from an `IfTrue` or `IfFalse` to an `IfTrue` and `IfFalse` that dominates the current control input, and the corresponding conditions of the `If`s are the same. In contrast, we cannot do that for a pinned cast, and if the control is folded away, the control input of the pinned cast is changed to the control predecessor of the folded node. It is also worth noting that we have `Node::pinned` which means the node is pinned AT the control input while pinned here means that it is pinned UNDER the control input. Very confusing! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582215477 From wkemper at openjdk.org Tue Dec 2 18:29:21 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 Dec 2025 18:29:21 GMT Subject: RFR: 8314599: [GenShen] Couple adaptive tenuring and generation size budgeting [v10] In-Reply-To: References: Message-ID: > Notable changes: > * Improvements to logging > * More accurate tracking of promotion failures > * Use shared allocation for promotions only when the size is above the maximum plab size (not the minimum size) > * Use census information gathered during mark to size promotion reserves and old generation > > With these changes, GenShen is expected to have fewer promotion failures and this is indeed the case. As a result of this, we expect less time to be spent in concurrent marking and update refs for young collections. We may also expect shorter concurrent evacuation phases because GenShen will have fewer densely packed regions stuck in the young generation. With more objects being promoted, we also expect to see longer remembered set scan times. This is generally the case across all benchmarks, but we do also see some counter-intuitive results. > > Here we are comparing 20 executions (10 on x86, 10 on aarch64) of the changes in the PR (experiment) against 20 executions of the same benchmarks results from tip. This is a summary of statistically significant changes of more than 5% across all benchmarks: > > > Concurrent Evacuation: 7 improvements, 3 regressions > ? Best improvements: extremem-large-45g (-29.6%), neo4j-analytics (-26.9%) > ? Worst regression: xalan (+53.7%) > > Concurrent Marking: 15 improvements, 1 regression > ? Best improvements: hyperalloc_a2048_o4096 (-30.1%), crypto.rsa (-27.3%) > ? Only regression: serial (+8.9%) > > Concurrent Scan Remembered Set: 7 improvements, 2 regressions > ? Best improvements: xalan (-49.4%), pmd (-49.0%), crypto.rsa (-41.8%) > ? Worst regression: extremem-phased (+52.4%) > > Concurrent Update Refs: 5 improvements, 4 regressions > ? Best improvements: crypto.rsa (-36.4%), mnemonics (-28.4%) > ? Worst regression: xalan (+89.4%) William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 73 commits: - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Adaptive tenuring is no longer optional We are using age census data to compute promotion reserves. The tenuring threshold may still be fixed by setting the min/max threshold to the same value. - Remove bad asserts - Don't include tenurable bytes for current cycle in the next cycle Also remove vestigial promotion potential calculation - Idle fix ups - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Disable assertion (will revisit later) - Print global evac tracking after other gc stats This makes it easier for parsers to distinguish from per cycle reports - Instrumentation and assertions - Idle cleanup as I read - ... and 63 more: https://git.openjdk.org/jdk/compare/5627ff2d...0c682e1c ------------- Changes: https://git.openjdk.org/jdk/pull/27632/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27632&range=09 Stats: 406 lines in 11 files changed: 166 ins; 182 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/27632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27632/head:pull/27632 PR: https://git.openjdk.org/jdk/pull/27632 From xpeng at openjdk.org Tue Dec 2 18:40:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 18:40:16 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class the allocator, most of the allocation code is in the class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2608 u... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: - Add missing header for ShenandoahFreeSetPartitionId - Declare ShenandoahFreeSetPartitionId as enum instead of enum class - Fix a typo - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition - Port the fix of JDK-8372566 - Merge branch 'master' into cas-alloc-1 - Merge remote-tracking branch 'origin/master' into cas-alloc-1 - Remove junk code - Remove unnecessary change and tidy up - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=13 Stats: 1637 lines in 25 files changed: 1283 ins; 242 del; 112 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Tue Dec 2 18:40:21 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 18:40:21 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v5] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 20:31:18 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 135 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - format >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Move ShenandoahHeapRegionIterationClosure to shenandoahFreeSet.hpp >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix errors caused by renaming ofAtomic to AtomicAccess >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 125 more: https://git.openjdk.org/jdk/compare/2f613911...e6bfef05 > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 271: > >> 269: _used[int(which_partition)] = value; >> 270: _available[int(which_partition)] = _capacity[int(which_partition)] - value; >> 271: AtomicAccess::store(_used + int(which_partition), value); > > Also here, should not require AtomicAccess. Sorry, it is junk code left over, I'm tidying up the changes in the PR, this line will be removed. > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 363: > >> 361: } >> 362: >> 363: void ShenandoahHeapRegion::reset_alloc_metadata() { > > Do we need to make these atomic because we now increment asynchronously from within mutator CAS allocations? Before, they were only adjusted while holding heap lock? I'm wondering if add-with-fetch() or CAS() would be more/less efficient than AtomicAccess::stores. Can we test the tradeoffs? Yes, we need to update these from mutator after every allocation w/o heap lock. `reset_alloc_metadata` it to reset the values, we have to use AtomicAccess::store, it is not in the hot path, it is only invoked when the region is recycled, I don't think there is performance issue here. For the code in hot path of mem allocation, I simply use `AtomicAccess::add` with `memory_order_relaxed`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2562099016 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2562086131 From xpeng at openjdk.org Tue Dec 2 19:07:35 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 19:07:35 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism Message-ID: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. Test result: java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" With the change: [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) Original: [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 100%+ improvement. ### Other tests - [x] hotspot_gc_shenandoah ------------- Commit messages: - Fix wrong impl of parallel_region_stride in ShenandoahExcludeRegionClosure & ShenandoahIncludeRegionClosure - Add comments - Set parallel_region_stride to 8 for ShenandoahResetBitmapClosure - Tidying - Override ShenandoahParallelRegionStride to 8 when wrap the closure with ShenandoahIncludeRegionClosure - Override ShenandoahParallelRegionStride to 8 when wrap the closure with ShenandoahExcludeRegionClosure Changes: https://git.openjdk.org/jdk/pull/28613/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28613&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372861 Stats: 16 lines in 4 files changed: 15 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28613/head:pull/28613 PR: https://git.openjdk.org/jdk/pull/28613 From wkemper at openjdk.org Tue Dec 2 19:18:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 Dec 2025 19:18:54 GMT Subject: Integrated: Merge openjdk/jdk21u:master In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:24:27 GMT, William Kemper wrote: > Merges tag jdk-21.0.10+4 This pull request has now been integrated. Changeset: fe585bd6 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/fe585bd6d0f9678f1cc8bcc56dc9c8a56af5d044 Stats: 632 lines in 23 files changed: 598 ins; 0 del; 34 mod Merge ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/229 From wkemper at openjdk.org Tue Dec 2 19:18:51 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 Dec 2025 19:18:51 GMT Subject: RFR: Merge openjdk/jdk21u:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.10+4 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/229/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/229/files/2f897401..2f897401 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=229&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=229&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/229.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/229/head:pull/229 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/229 From wkemper at openjdk.org Tue Dec 2 19:31:25 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 Dec 2025 19:31:25 GMT Subject: RFR: 8314599: [GenShen] Couple adaptive tenuring and generation size budgeting [v11] In-Reply-To: References: Message-ID: > Notable changes: > * Improvements to logging > * More accurate tracking of promotion failures > * Use shared allocation for promotions only when the size is above the maximum plab size (not the minimum size) > * Use census information gathered during mark to size promotion reserves and old generation > > With these changes, GenShen is expected to have fewer promotion failures and this is indeed the case. As a result of this, we expect less time to be spent in concurrent marking and update refs for young collections. We may also expect shorter concurrent evacuation phases because GenShen will have fewer densely packed regions stuck in the young generation. With more objects being promoted, we also expect to see longer remembered set scan times. This is generally the case across all benchmarks, but we do also see some counter-intuitive results. > > Here we are comparing 20 executions (10 on x86, 10 on aarch64) of the changes in the PR (experiment) against 20 executions of the same benchmarks results from tip. This is a summary of statistically significant changes of more than 5% across all benchmarks: > > > Concurrent Evacuation: 7 improvements, 3 regressions > ? Best improvements: extremem-large-45g (-29.6%), neo4j-analytics (-26.9%) > ? Worst regression: xalan (+53.7%) > > Concurrent Marking: 15 improvements, 1 regression > ? Best improvements: hyperalloc_a2048_o4096 (-30.1%), crypto.rsa (-27.3%) > ? Only regression: serial (+8.9%) > > Concurrent Scan Remembered Set: 7 improvements, 2 regressions > ? Best improvements: xalan (-49.4%), pmd (-49.0%), crypto.rsa (-41.8%) > ? Worst regression: extremem-phased (+52.4%) > > Concurrent Update Refs: 5 improvements, 4 regressions > ? Best improvements: crypto.rsa (-36.4%), mnemonics (-28.4%) > ? Worst regression: xalan (+89.4%) William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove commented out assertion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27632/files - new: https://git.openjdk.org/jdk/pull/27632/files/0c682e1c..502797a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27632&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27632&range=09-10 Stats: 7 lines in 1 file changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27632/head:pull/27632 PR: https://git.openjdk.org/jdk/pull/27632 From kdnilsen at openjdk.org Tue Dec 2 19:31:45 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 Dec 2025 19:31:45 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Tue, 2 Dec 2025 18:59:25 GMT, Xiaolong Peng wrote: > In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. > > In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. > > Test result: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > With the change: > > [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) > [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) > > Original: > > > [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) > [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) > > > The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 100%+ improvement. > > ### Other tests > - [x] hotspot_gc_shenandoah I think this is a 58% improvement (in the header) rather than a 100% improvement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28613#issuecomment-3603643649 From kdnilsen at openjdk.org Tue Dec 2 19:43:04 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 Dec 2025 19:43:04 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> On Tue, 2 Dec 2025 19:33:01 GMT, Kelvin Nilsen wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 100%+ improvement. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 83: > >> 81: // For a 31G heap resetting bitmaps could take more than 60ms for single thread, we should use a small >> 82: // parallel region stride for ShenandoahResetBitmapClosure. >> 83: size_t parallel_region_stride() override { return 8; } > > Should this be: > > if (ShenandoahParallelRegionStride == 0) { > return 8; > } else { > return ShenandoahParallelRegionStride; > } In fact, rather than the "constant" value 8, should we return ShenandoahWorkerPolicy::calc_workers_for_conc_reset()? This makes the change robust against future integration of "worker surge". In fact, the parallel stride depends on the reason we are iterating. Can we make this change more "generic"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582566703 From kdnilsen at openjdk.org Tue Dec 2 19:43:03 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 Dec 2025 19:43:03 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Tue, 2 Dec 2025 18:59:25 GMT, Xiaolong Peng wrote: > In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. > > In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. > > Test result: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > With the change: > > [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) > [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) > > Original: > > > [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) > [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) > > > The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 100%+ improvement. > > ### Other tests > - [x] hotspot_gc_shenandoah See below src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 83: > 81: // For a 31G heap resetting bitmaps could take more than 60ms for single thread, we should use a small > 82: // parallel region stride for ShenandoahResetBitmapClosure. > 83: size_t parallel_region_stride() override { return 8; } Should this be: if (ShenandoahParallelRegionStride == 0) { return 8; } else { return ShenandoahParallelRegionStride; } ------------- Changes requested by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3531815307 PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582551300 From xpeng at openjdk.org Tue Dec 2 20:02:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 20:02:16 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Tue, 2 Dec 2025 19:28:58 GMT, Kelvin Nilsen wrote: > I think this is a 58% improvement (in the header) rather than a 100% improvement. To be precise, it is 58% time reduction, 100+% speed/throughput improvement, I have updated the description. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28613#issuecomment-3603737834 From xpeng at openjdk.org Tue Dec 2 22:03:25 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 22:03:25 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> Message-ID: <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> On Tue, 2 Dec 2025 19:40:19 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 83: >> >>> 81: // For a 31G heap resetting bitmaps could take more than 60ms for single thread, we should use a small >>> 82: // parallel region stride for ShenandoahResetBitmapClosure. >>> 83: size_t parallel_region_stride() override { return 8; } >> >> Should this be: >> >> if (ShenandoahParallelRegionStride == 0) { >> return 8; >> } else { >> return ShenandoahParallelRegionStride; >> } > > In fact, rather than the "constant" value 8, should we return ShenandoahWorkerPolicy::calc_workers_for_conc_reset()? > > This makes the change robust against future integration of "worker surge". > > In fact, the parallel stride depends on the reason we are iterating. Can we make this change more "generic"? The intention of the change is to let `ShenandoahResetBitmapClosure` not use the ShenandoahParallelRegionStride global value at all, here is the reasons: ShenandoahParallelRegionStride is usually set to a large value, the default value used to be 1024, large value won't help with the performance ShenandoahHeapRegionClosure at all for some reasons: 1. ShenandoahResetBitmapClosure reset the marking bitmaps before/after GC cycle, the resetting may not not needed for each region. e.g. when `top_bitmap == bottom`(immediate trash regions?) or the region is not current gc generation. 2. Withe large ShenandoahParallelRegionStride, each task will get large number of successive regions, e.g. worker 0 will process region 1 to region 1024, in this way it is not possible to make sure the actual workload is evenly distributed to all workers, some of the workers may have most of the regions need bitmap reset, some of the worker may not really do any actual bitmap reset at all. A smaller parallel region stride value will help with the workload distribution and also make it adaptive to different number of workers, it should be also working just fine with "worker surge" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582857990 From xpeng at openjdk.org Tue Dec 2 22:03:28 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 22:03:28 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> Message-ID: <6TAyVPTOcp6ykpzuzVuTP04UU8mk0tSkakJEinh4dnA=.4c43b199-09b5-4b9c-984d-c69cdca0b294@github.com> On Tue, 2 Dec 2025 21:38:26 GMT, Xiaolong Peng wrote: >> In fact, rather than the "constant" value 8, should we return ShenandoahWorkerPolicy::calc_workers_for_conc_reset()? >> >> This makes the change robust against future integration of "worker surge". >> >> In fact, the parallel stride depends on the reason we are iterating. Can we make this change more "generic"? > > The intention of the change is to let `ShenandoahResetBitmapClosure` not use the ShenandoahParallelRegionStride global value at all, here is the reasons: > ShenandoahParallelRegionStride is usually set to a large value, the default value used to be 1024, large value won't help with the performance ShenandoahHeapRegionClosure at all for some reasons: > 1. ShenandoahResetBitmapClosure reset the marking bitmaps before/after GC cycle, the resetting may not not needed for each region. e.g. when `top_bitmap == bottom`(immediate trash regions?) or the region is not current gc generation. > 2. Withe large ShenandoahParallelRegionStride, each task will get large number of successive regions, e.g. worker 0 will process region 1 to region 1024, in this way it is not possible to make sure the actual workload is evenly distributed to all workers, some of the workers may have most of the regions need bitmap reset, some of the worker may not really do any actual bitmap reset at all. > > A smaller parallel region stride value will help with the workload distribution and also make it adaptive to different number of workers, it should be also working just fine with "worker surge" In the JBS bug report, I attached a test I did for this, I have tested value from 1 to 4096: java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -XX:ShenandoahParallelRegionStride=-jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" [1] [77.444s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3078 us) (n = 14) (lvls, us = 1172, 1289, 1328, 1406, 14780) [77.444s][info][gc,stats ] Concurrent Reset After Collect = 0.044 s (a = 3150 us) (n = 14) (lvls, us = 1074, 1504, 1895, 4121, 8952) [2] [77.304s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3036 us) (n = 14) (lvls, us = 1152, 1211, 1289, 1328, 14872) [77.305s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3297 us) (n = 14) (lvls, us = 939, 1602, 2148, 3945, 8744) [4] [76.898s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3048 us) (n = 14) (lvls, us = 1152, 1230, 1270, 1328, 14989) [76.898s][info][gc,stats ] Concurrent Reset After Collect = 0.045 s (a = 3215 us) (n = 14) (lvls, us = 1016, 1309, 1914, 3301, 7076) [8] [77.916s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3067 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1309, 15091) [77.916s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3050 us) (n = 14) (lvls, us = 1133, 1484, 1934, 3086, 8113) [16] [77.071s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3019 us) (n = 14) (lvls, us = 1152, 1250, 1270, 1328, 14615) [77.071s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3284 us) (n = 14) (lvls, us = 932, 1523, 2090, 2930, 8841) [32] [76.965s][info][gc,stats ] Concurrent Reset = 0.044 s (a = 3117 us) (n = 14) (lvls, us = 1191, 1211, 1328, 1348, 14768) [76.965s][info][gc,stats ] Concurrent Reset After Collect = 0.047 s (a = 3323 us) (n = 14) (lvls, us = 930, 1406, 1875, 4316, 8565) [64] [77.255s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3033 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1406, 14635) [77.255s][info][gc,stats ] Concurrent Reset After Collect = 0.054 s (a = 3862 us) (n = 14) (lvls, us = 1133, 1504, 2852, 5508, 8947) [128] [76.502s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3027 us) (n = 14) (lvls, us = 1133, 1230, 1250, 1426, 14264) [76.502s][info][gc,stats ] Concurrent Reset After Collect = 0.053 s (a = 3762 us) (n = 14) (lvls, us = 1172, 1582, 2129, 5273, 9272) [256] [76.751s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3057 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1426, 14713) [76.751s][info][gc,stats ] Concurrent Reset After Collect = 0.056 s (a = 4029 us) (n = 14) (lvls, us = 1484, 1602, 3027, 4629, 11267) [512] [77.508s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3082 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1426, 14893) [77.508s][info][gc,stats ] Concurrent Reset After Collect = 0.068 s (a = 4822 us) (n = 14) (lvls, us = 1953, 2285, 3633, 5605, 16366) [1024] [76.933s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3073 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1426, 14957) [76.933s][info][gc,stats ] Concurrent Reset After Collect = 0.082 s (a = 5877 us) (n = 14) (lvls, us = 1895, 3203, 4258, 7793, 15587) [2048] [76.746s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3022 us) (n = 14) (lvls, us = 1133, 1172, 1211, 1406, 14586) [76.746s][info][gc,stats ] Concurrent Reset After Collect = 0.099 s (a = 7104 us) (n = 14) (lvls, us = 1875, 3281, 4590, 7695, 19292) [4096] [77.356s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3031 us) (n = 14) (lvls, us = 1133, 1191, 1250, 1426, 14606) [77.356s][info][gc,stats ] Concurrent Reset After Collect = 0.101 s (a = 7213 us) (n = 14) (lvls, us = 1914, 3262, 4238, 7871, 19862) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582863336 From wkemper at openjdk.org Tue Dec 2 22:15:36 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 Dec 2025 22:15:36 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <6TAyVPTOcp6ykpzuzVuTP04UU8mk0tSkakJEinh4dnA=.4c43b199-09b5-4b9c-984d-c69cdca0b294@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> <6TAyVPTOcp6ykpzuzVuTP04UU8mk0tSkakJEinh4dnA=.4c43b199-09b5-4b9c-984d-c69cdca0b294@github.com> Message-ID: On Tue, 2 Dec 2025 21:40:29 GMT, Xiaolong Peng wrote: >> The intention of the change is to let `ShenandoahResetBitmapClosure` not use the ShenandoahParallelRegionStride global value at all, here is the reasons: >> ShenandoahParallelRegionStride is usually set to a large value, the default value used to be 1024, large value won't help with the performance ShenandoahHeapRegionClosure at all for some reasons: >> 1. ShenandoahResetBitmapClosure reset the marking bitmaps before/after GC cycle, the resetting may not not needed for each region. e.g. when `top_bitmap == bottom`(immediate trash regions?) or the region is not current gc generation. >> 2. Withe large ShenandoahParallelRegionStride, each task will get large number of successive regions, e.g. worker 0 will process region 1 to region 1024, in this way it is not possible to make sure the actual workload is evenly distributed to all workers, some of the workers may have most of the regions need bitmap reset, some of the worker may not really do any actual bitmap reset at all. >> >> A smaller parallel region stride value will help with the workload distribution and also make it adaptive to different number of workers, it should be also working just fine with "worker surge" > > In the JBS bug report, I attached a test I did for this, I have tested value from 1 to 4096: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -XX:ShenandoahParallelRegionStride=-jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > > > [1] > [77.444s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3078 us) (n = 14) (lvls, us = 1172, 1289, 1328, 1406, 14780) > [77.444s][info][gc,stats ] Concurrent Reset After Collect = 0.044 s (a = 3150 us) (n = 14) (lvls, us = 1074, 1504, 1895, 4121, 8952) > > [2] > [77.304s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3036 us) (n = 14) (lvls, us = 1152, 1211, 1289, 1328, 14872) > [77.305s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3297 us) (n = 14) (lvls, us = 939, 1602, 2148, 3945, 8744) > > > [4] > [76.898s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3048 us) (n = 14) (lvls, us = 1152, 1230, 1270, 1328, 14989) > [76.898s][info][gc,stats ] Concurrent Reset After Collect = 0.045 s (a = 3215 us) (n = 14) (lvls, us = 1016, 1309, 1914, 3301, 7076) > > [8] > [77.916s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3067 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1309, 15091) > [77.916s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3050 us) (n = 14) (lvls, us = 1133, 1484, 1934, 3086, 8113) > > [16] > [77.071s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3019 us) (n = 14) (lvls, us = 1152, 1250, 1270, 1328, 14615) > [77.071s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3284 us) (n = 14) (lvls, us = 932, 1523, 2090, 2930, 8841) > > [32] > [76.965s][info][gc,stats ] Concurrent Reset = 0.044 s (a = 3117 us) (n = 14) (lvls, us = 1191, 1211, 1328, 1348, 14768) > [76.965s][info][gc,stats ] Concurrent Reset After Collect = 0.047 s (a = 3323 us) (n = 14) (lvls, us = 930, 1406, 1875, 4316, 8565) > > > [64] > [77.255s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3033 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1406, 14635) > [77.255s][info][gc,stats ] Concurrent Reset After Collect = 0.054 s (a = 3862 us) (n = 14) (lvls, us = 1133, 1504, 2852, 5508, 8947) > > [128] > [76.502s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3027 us) (n = 14) (lvls, us = 1133, 1230, 1250, 1426, 14264) > [76.502s][info][gc,stats ] Concurrent Reset After Collect = 0.053 s (a = 3762 us) (n = 14) (lvls, us = 1172, 15... Maybe amend the comment to explain that using a smaller value yields better task distribution for a lumpy workload like resetting bitmaps? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582942471 From xpeng at openjdk.org Tue Dec 2 23:28:58 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 23:28:58 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: > In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. > > In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. > > Test result: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > With the change: > > [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) > [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) > > Original: > > > [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) > [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) > > > The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. > > ### Other tests > - [x] hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add more comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28613/files - new: https://git.openjdk.org/jdk/pull/28613/files/06f27543..3b964995 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28613&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28613&range=00-01 Stats: 14 lines in 2 files changed: 10 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28613/head:pull/28613 PR: https://git.openjdk.org/jdk/pull/28613 From xpeng at openjdk.org Tue Dec 2 23:29:00 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 2 Dec 2025 23:29:00 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> <6TAyVPTOcp6ykpzuzVuTP04UU8mk0tSkakJEinh4dnA=.4c43b199-09b5-4b9c-984d-c69cdca0b294@github.com> Message-ID: On Tue, 2 Dec 2025 22:13:12 GMT, William Kemper wrote: >> In the JBS bug report, I attached a test I did for this, I have tested value from 1 to 4096: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -XX:ShenandoahParallelRegionStride=-jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> >> >> [1] >> [77.444s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3078 us) (n = 14) (lvls, us = 1172, 1289, 1328, 1406, 14780) >> [77.444s][info][gc,stats ] Concurrent Reset After Collect = 0.044 s (a = 3150 us) (n = 14) (lvls, us = 1074, 1504, 1895, 4121, 8952) >> >> [2] >> [77.304s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3036 us) (n = 14) (lvls, us = 1152, 1211, 1289, 1328, 14872) >> [77.305s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3297 us) (n = 14) (lvls, us = 939, 1602, 2148, 3945, 8744) >> >> >> [4] >> [76.898s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3048 us) (n = 14) (lvls, us = 1152, 1230, 1270, 1328, 14989) >> [76.898s][info][gc,stats ] Concurrent Reset After Collect = 0.045 s (a = 3215 us) (n = 14) (lvls, us = 1016, 1309, 1914, 3301, 7076) >> >> [8] >> [77.916s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3067 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1309, 15091) >> [77.916s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3050 us) (n = 14) (lvls, us = 1133, 1484, 1934, 3086, 8113) >> >> [16] >> [77.071s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3019 us) (n = 14) (lvls, us = 1152, 1250, 1270, 1328, 14615) >> [77.071s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3284 us) (n = 14) (lvls, us = 932, 1523, 2090, 2930, 8841) >> >> [32] >> [76.965s][info][gc,stats ] Concurrent Reset = 0.044 s (a = 3117 us) (n = 14) (lvls, us = 1191, 1211, 1328, 1348, 14768) >> [76.965s][info][gc,stats ] Concurrent Reset After Collect = 0.047 s (a = 3323 us) (n = 14) (lvls, us = 930, 1406, 1875, 4316, 8565) >> >> >> [64] >> [77.255s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3033 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1406, 14635) >> [77.255s][info][gc,stats ] Concurrent Reset After Collect = 0.054 s (a = 3862 us) (n = 14) (lvls, us = 1133, 1504, 2852, 5508, 8947) >> >> [128] >> [76.502s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3027 us) (n = 14) (lvls, us = 1133, 1230, 1250, 1426, 14264) >> [76.502s][info][gc,stats ] Concur... > > Maybe amend the comment to explain that using a smaller value yields better task distribution for a lumpy workload like resetting bitmaps? I have added more comments on ShenandoahResetBitmapClosure and the base class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2583109244 From xpeng at openjdk.org Wed Dec 3 00:16:10 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 00:16:10 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <8YLFumpKphkiO-9PU9GbDz8g5XA388yLqD9xzlm9LUg=.ed2f7155-a371-4eeb-90bf-2bc45f63e900@github.com> <3MGUEGMsFxAnh_r0WE8hRt9FpH3ey0_xgKz_K3jaruQ=.4167d995-ba80-405f-bd17-9b77168d26ad@github.com> <6TAyVPTOcp6ykpzuzVuTP04UU8mk0tSkakJEinh4dnA=.4c43b199-09b5-4b9c-984d-c69cdca0b294@github.com> Message-ID: On Tue, 2 Dec 2025 23:24:11 GMT, Xiaolong Peng wrote: >> Maybe amend the comment to explain that using a smaller value yields better task distribution for a lumpy workload like resetting bitmaps? > > I have added more comments on ShenandoahResetBitmapClosure and the base class. >Can we make this change more "generic"? I thought about making it more "generic", the current design with new method `parallel_region_stride` make it possible to customize the behavior if needed. I was looking into other closures which may have similar problems but impact should be much smaller than this one: 1. ShenandoahMergeWriteTable: Copy the write-version of the card-table into the read-version, clearing the write-copy, only for old gen. 2. ShenandoahEnsureHeapActiveClosure: Make sure regions are in good state: committed, active, clean, it may commit region if the region is not committed. Only in FullGC, also it is not threa-safe(but should be) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2583186822 From kdnilsen at openjdk.org Wed Dec 3 01:00:11 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 3 Dec 2025 01:00:11 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: Message-ID: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> On Tue, 2 Dec 2025 18:40:16 GMT, Xiaolong Peng wrote: >> Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: >> >> * ShenandoahAllocator: base class the allocator, most of the allocation code is in the class. >> * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. >> * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. >> * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` >> >> I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: >> >> 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. >> >> java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" >> >> >> Openjdk TIP: >> >> ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== >> ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== >> ===== DaCapo tail laten... > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: > > - Add missing header for ShenandoahFreeSetPartitionId > - Declare ShenandoahFreeSetPartitionId as enum instead of enum class > - Fix a typo > - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php > - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition > - Port the fix of JDK-8372566 > - Merge branch 'master' into cas-alloc-1 > - Merge remote-tracking branch 'origin/master' into cas-alloc-1 > - Remove junk code > - Remove unnecessary change and tidy up > - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 I'm still reading through the code, but have these comments far... src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 80: > 78: break; > 79: case ShenandoahFreeSetPartitionId::OldCollector: > 80: _free_set->recompute_total_used 98: HeapWord* ShenandoahAllocator::attempt_allocation(ShenandoahAllocRequest& req, bool& in_new_region) { > 99: if (_alloc_region_count == 0u) { > 100: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); Looking for more comments here as well. What does it mean that _alloc_region_count == 0? Does this mean we have not yet initialized the directly allocatable regions (following a particular GC event)? Or does it mean that we have depleted all of the available regions and we are out of memory? In the first case, it seems we would want to replenish our supply of directly allocatable regions while we hold the GC lock. In the second case, it seems there's really no value in even attempting a slow allocation. (If we were unable to refresh our directly allocatable regions, then it will not find allocatable memory even on the other side of the heap lock...) src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 121: > 119: template > 120: HeapWord* ShenandoahAllocator::attempt_allocation_slow(ShenandoahAllocRequest& req, bool& in_new_region) { > 121: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); I think this is an error. We don't want to acquire the lock here. We also don't want to introduce accounting_update here. Instead, I think these belong before line 130, in case we need to refresh the alloc regions. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 155: > 153: size_t min_free_words = req.is_lab_alloc() ? req.min_size() : req.size(); > 154: ShenandoahHeapRegion* r = _free_set->find_heap_region_for_allocation(ALLOC_PARTITION, min_free_words, req.is_lab_alloc(), in_new_region); > 155: // The region returned by find_heap_region_for_allocation must have sufficient free space for the allocation it if it is not nullptr comment has an extra "it" src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 158: > 156: if (r != nullptr) { > 157: bool ready_for_retire = false; > 158: obj = atomic_allocate_in(r, false, req, in_new_region, ready_for_retire); Not sure why we use atomic_allocate_in() here. We hold the heap lock so we don't need to use atomic operations. We should clarify with comments. src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 69: > 67: > 68: // Attempt to allocate in shared alloc regions, the allocation attempt is done with atomic operation w/o > 69: // holding heap lock. I would rewrite comment: // Attempt to allocate in a shared alloc region using atomic operation without holding the heap lock. // Returns nullptr and overwrites regions_ready_for_refresh with the number of shared alloc regions that are ready // to be retired if it is unable to satisfy the allocation request from the existing shared alloc regions. ------------- PR Review: https://git.openjdk.org/jdk/pull/26171#pullrequestreview-3532274683 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582914041 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582950768 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582966259 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582922617 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582936150 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582959962 From kdnilsen at openjdk.org Wed Dec 3 01:00:12 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 3 Dec 2025 01:00:12 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: <30wgBRAA7e-TzwsWTFagwcJdAxRcOeFD6-WJS-ashhA=.05995833-528a-4dec-820f-0307bf769520@github.com> On Tue, 2 Dec 2025 22:00:17 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: >> >> - Add missing header for ShenandoahFreeSetPartitionId >> - Declare ShenandoahFreeSetPartitionId as enum instead of enum class >> - Fix a typo >> - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php >> - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition >> - Port the fix of JDK-8372566 >> - Merge branch 'master' into cas-alloc-1 >> - Merge remote-tracking branch 'origin/master' into cas-alloc-1 >> - Remove junk code >> - Remove unnecessary change and tidy up >> - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 80: > >> 78: break; >> 79: case ShenandoahFreeSetPartitionId::OldCollector: >> 80: _free_set->recompute_total_used > These parameters seem overly conservative. Can we distinguish what needs to be recomputed? > Normally, OldCollector allocation does not change UsedByMutator or UsedByCollector. It will only change MutatorEmpties if we did flip_to_old. It will normally not changed OldCollectorEmpties (unless it flips multiple mutator to OldCollector. if might flip one region from mutator, but that region will not be empty after we allocate fro it... I suppose we could use conservative values for a first implementation, as long as we file a "low priority" ticket to come back and revisit for improved efficiency at a later time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2582916275 From wkemper at openjdk.org Wed Dec 3 01:07:03 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 01:07:03 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Tue, 2 Dec 2025 23:28:58 GMT, Xiaolong Peng wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add more comments. Looks good to me. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3532721468 From xpeng at openjdk.org Wed Dec 3 01:09:03 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 01:09:03 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <30wgBRAA7e-TzwsWTFagwcJdAxRcOeFD6-WJS-ashhA=.05995833-528a-4dec-820f-0307bf769520@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> <30wgBRAA7e-TzwsWTFagwcJdAxRcOeFD6-WJS-ashhA=.05995833-528a-4dec-820f-0307bf769520@github.com> Message-ID: On Tue, 2 Dec 2025 22:01:17 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 80: >> >>> 78: break; >>> 79: case ShenandoahFreeSetPartitionId::OldCollector: >>> 80: _free_set->recompute_total_used> >> These parameters seem overly conservative. Can we distinguish what needs to be recomputed? >> Normally, OldCollector allocation does not change UsedByMutator or UsedByCollector. It will only change MutatorEmpties if we did flip_to_old. It will normally not changed OldCollectorEmpties (unless it flips multiple mutator to OldCollector. if might flip one region from mutator, but that region will not be empty after we allocate fro it... > > I suppose we could use conservative values for a first implementation, as long as we file a "low priority" ticket to come back and revisit for improved efficiency at a later time. We don't really know what need to be recompute until the allocation finishes, we can make it less conservative, but then we needs more code branches here because the template methods require explicit template parameters. I'll create to ticket to follow up on this, given that I also want to see if we can defer the recomputation to the read side, if we can do that we don't even need the ShenandoahHeapAccountingUpdater here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2583271726 From xpeng at openjdk.org Wed Dec 3 01:12:23 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 01:12:23 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Tue, 2 Dec 2025 22:16:56 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: >> >> - Add missing header for ShenandoahFreeSetPartitionId >> - Declare ShenandoahFreeSetPartitionId as enum instead of enum class >> - Fix a typo >> - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php >> - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition >> - Port the fix of JDK-8372566 >> - Merge branch 'master' into cas-alloc-1 >> - Merge remote-tracking branch 'origin/master' into cas-alloc-1 >> - Remove junk code >> - Remove unnecessary change and tidy up >> - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 100: > >> 98: HeapWord* ShenandoahAllocator::attempt_allocation(ShenandoahAllocRequest& req, bool& in_new_region) { >> 99: if (_alloc_region_count == 0u) { >> 100: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); > > Looking for more comments here as well. What does it mean that _alloc_region_count == 0? Does this mean we have not yet initialized the directly allocatable regions (following a particular GC event)? Or does it mean that we have depleted all of the available regions and we are out of memory? In the first case, it seems we would want to replenish our supply of directly allocatable regions while we hold the GC lock. In the second case, it seems there's really no value in even attempting a slow allocation. (If we were unable to refresh our directly allocatable regions, then it will not find allocatable memory even on the other side of the heap lock...) I'll add comments on this, _alloc_region_count == 0 means we don't want to use any shared alloc region, it will also allocate with a heap lock, ideally the performance should be same as before, so it always simply find a region with enough space and allocate in the region. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2583283894 From xpeng at openjdk.org Wed Dec 3 01:17:43 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 01:17:43 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Tue, 2 Dec 2025 22:24:55 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: >> >> - Add missing header for ShenandoahFreeSetPartitionId >> - Declare ShenandoahFreeSetPartitionId as enum instead of enum class >> - Fix a typo >> - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php >> - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition >> - Port the fix of JDK-8372566 >> - Merge branch 'master' into cas-alloc-1 >> - Merge remote-tracking branch 'origin/master' into cas-alloc-1 >> - Remove junk code >> - Remove unnecessary change and tidy up >> - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 121: > >> 119: template >> 120: HeapWord* ShenandoahAllocator::attempt_allocation_slow(ShenandoahAllocRequest& req, bool& in_new_region) { >> 121: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); > > I think this is an error. We don't want to acquire the lock here. We also don't want to introduce accounting_update here. Instead, I think these belong before line 130, in case we need to refresh the alloc regions. It is not an error, before calling into attempt_allocation_slow, it already called attempt_allocation_in_alloc_regions once and failed to allocate, slow path is always with heap lock. After taking the lock, we should try the attempt_allocation_in_alloc_regions right away, because other mutator thread may have refreshed the alloc regions while holding the lock. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2583295858 From xpeng at openjdk.org Wed Dec 3 01:17:44 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 01:17:44 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Wed, 3 Dec 2025 01:13:41 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 121: >> >>> 119: template >>> 120: HeapWord* ShenandoahAllocator::attempt_allocation_slow(ShenandoahAllocRequest& req, bool& in_new_region) { >>> 121: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); >> >> I think this is an error. We don't want to acquire the lock here. We also don't want to introduce accounting_update here. Instead, I think these belong before line 130, in case we need to refresh the alloc regions. > > It is not an error, before calling into attempt_allocation_slow, it already called attempt_allocation_in_alloc_regions once and failed to allocate, slow path is always with heap lock. > > After taking the lock, we should try the attempt_allocation_in_alloc_regions right away, because other mutator thread may have refreshed the alloc regions while holding the lock. accounting_update is required for slow path, but you are right, it can be moved to somewhere later, e.g. line 128. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2583298253 From xpeng at openjdk.org Wed Dec 3 01:21:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 01:21:56 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Tue, 2 Dec 2025 22:10:34 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: >> >> - Add missing header for ShenandoahFreeSetPartitionId >> - Declare ShenandoahFreeSetPartitionId as enum instead of enum class >> - Fix a typo >> - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php >> - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition >> - Port the fix of JDK-8372566 >> - Merge branch 'master' into cas-alloc-1 >> - Merge remote-tracking branch 'origin/master' into cas-alloc-1 >> - Remove junk code >> - Remove unnecessary change and tidy up >> - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 158: > >> 156: if (r != nullptr) { >> 157: bool ready_for_retire = false; >> 158: obj = atomic_allocate_in(r, false, req, in_new_region, ready_for_retire); > > Not sure why we use atomic_allocate_in() here. We hold the heap lock so we don't need to use atomic operations. > We should clarify with comments. It is not really necessary to `atomic_allocate_in` here, but I wanted reuse some of the codes in atomic_allocate_in, we can discuss this later, I can change it back to non-atomic version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2583304908 From duke at openjdk.org Wed Dec 3 07:31:42 2025 From: duke at openjdk.org (Rui Li) Date: Wed, 3 Dec 2025 07:31:42 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect Message-ID: Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used soft_tail = Xmx - soft_max if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. Suggested fix: when deciding when to trigger gc, use logic similar to below: mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; available = mutator_soft_capacity - used; if (available < mutator_soft_capacity) // trigger gc ``` ------- This change also improved gc logging: Before: [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B After: [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: 122, Reserved: 102M, Max free available in a single region: 1024K; ------------- Commit messages: - 8372543: Shenandoah: undercalculated the available size when soft max takes effect Changes: https://git.openjdk.org/jdk/pull/28622/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372543 Stats: 226 lines in 7 files changed: 157 ins; 44 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/28622.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28622/head:pull/28622 PR: https://git.openjdk.org/jdk/pull/28622 From duke at openjdk.org Wed Dec 3 08:37:02 2025 From: duke at openjdk.org (Harshit470250) Date: Wed, 3 Dec 2025 08:37:02 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: > This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: - add guard to the include - add load_reference_barrier_Type - add clone_barrier_Type - add write_barrier_pre_Type - revert shenandoah changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27279/files - new: https://git.openjdk.org/jdk/pull/27279/files/6e6a2bbf..4dfa36ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27279&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27279&range=01-02 Stats: 145 lines in 5 files changed: 67 ins; 73 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/27279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27279/head:pull/27279 PR: https://git.openjdk.org/jdk/pull/27279 From shade at openjdk.org Wed Dec 3 10:14:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 10:14:51 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: <_UCh_KR6-uzVFOJ9MM-gK3gsTessZ03FuecOFkS2F8c=.86fbe32d-b450-4285-8906-ead7bd003b8f@github.com> On Tue, 2 Dec 2025 23:28:58 GMT, Xiaolong Peng wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add more comments. Changes requested by shade (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 88: > 86: // Using a smaller value here yields better task distribution for a lumpy workload. The task will be split > 87: // into smaller batches with 8 regions in batch, the worker processes more regions w/o needs to reset bitmaps > 88: // will process more batches, but overall all workers will be saturated throughout the whole concurrent reset phase. I have a very general comment about writing comments like this one. This entire block of prose is really excessive, is set up to be outdated (are you tracking the real behavior of `SH::parallel_heap_region_iterate` and its magical `4096`?), and can be boiled down to much more succinct: Bitmap reset task is heavy-weight and benefits from much smaller tasks than the default. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 119: > 117: // ShenandoahHeap::parallel_heap_region_iterate will derive a reasonable value based > 118: // on active worker threads and number of regions. > 119: // For some lumpy workload, the value can be overridden for better task distribution. Again, excessive. You can just drop the comment; its purpose is obvious from the code. ------------- PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3534247421 PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2584465890 PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2584468632 From eastigeevich at openjdk.org Wed Dec 3 14:55:17 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 14:55:17 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v12] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Benchmarking results: Neoverse-N1 r3p1 (Graviton 2) > > - Baseline > > $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1Errata1542419 -XX:+UseZGC -XX:ZYoungGCThreads=1 -XX:ZOldGC... Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Remove trailing whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/79f9a2a0..8c5ef0e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From eastigeevich at openjdk.org Wed Dec 3 15:13:21 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 15:13:21 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v3] In-Reply-To: <-cnMy4YHNCrKRqt_2Kkh9ksi-qE8ndZLB5yoyKkS3gM=.3f328f98-15a2-4736-9a6c-f9ab0705b830@github.com> References: <-cnMy4YHNCrKRqt_2Kkh9ksi-qE8ndZLB5yoyKkS3gM=.3f328f98-15a2-4736-9a6c-f9ab0705b830@github.com> Message-ID: On Tue, 25 Nov 2025 13:04:55 GMT, Andrew Haley wrote: >> Yeah patching all nmethods as one unit is basically equivalent to making the code cache processing a STW operation. Last time we processed the code cache STW was JDK 11. A dark place I don't want to go back to. It can get pretty big and mess up latency. So I'm in favour of limiting the fix and not re-introduce STW code cache processing. >> >> Otherwise yes you are correct; we perform synchronous cross modifying code with no assumptions about instruction cache coherency because we didn't trust it would actually work for all ARM implementations. Seems like that was a good bet. We rely on it on x64 still though. >> >> It's a bit surprising to me if they invalidate all TLB entries, effectively ripping out the entire virtual address space, even when a range is passed in. If so, a horrible alternative might be to use mprotect to temporarily remove execution permission on the affected per nmethod pages, and detect over shooting in the signal handler, resuming execution when execution privileges are then restored immediately after. That should limit the affected VA to close to what is actually invalidated. But it would look horrible. > >> It's a bit surprising to me if they invalidate all TLB entries, effectively ripping out the entire virtual address space, even when a range is passed in. If so, > > "Because the cache-maintenance wasn't needed, we can do the TLBI instead. > In fact, the I-Cache line-size isn't relevant anymore, we can reduce > the number of traps by producing a fake value. > > "For user-space, the kernel's work is now to trap CTR_EL0 to hide DIC, > and produce a fake IminLine. EL3 traps the now-necessary I-Cache > maintenance and performs the inner-shareable-TLBI that makes everything > better." > > My interpretation of this is that we only need to do the synchronization dance once, at the end of the patching. But I guess we don't know exactly if we have an affected core or if the kernel workaround is in action. @theRealAph @fisk @shipilev I have updated all places to use optimized icache invalidation. Could you please have a look? I am running different tests and benchmarks. @fisk @shipilev - I added `nmethod::has_non_immediate_oops`. I think it's easy to detect them when we generate code. If this is OK, we might need to update `ZNMethod::attach_gc_data` and `ShenandoahNMethod::detect_reloc_oops`. - Code of `G1NMethodClosure::do_evacuation_and_fixup(nmethod* nm)` looks strange: _oc.set_nm(nm); // Evacuate objects pointed to by the nmethod nm->oops_do(&_oc); if (_strong) { // CodeCache unloading support nm->mark_as_maybe_on_stack(); BarrierSetNMethod* bs_nm = BarrierSet::barrier_set()->barrier_set_nmethod(); bs_nm->disarm(nm); } ICacheInvalidationContext icic(nm->has_non_immediate_oops()); nm->fix_oop_relocations(); If `_strong` is true, we disarm `nm` and patch it with `fix_oop_relocations`. I have assertions checking we can defer icache invalidation. Neither of them are triggered. I thing this path always happens at a safepoint. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3607330040 From eastigeevich at openjdk.org Wed Dec 3 15:42:38 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 15:42:38 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Benchmarking results: Neoverse-N1 r3p1 (Graviton 2) > > - Baseline > > $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1Errata1542419 -XX:+UseZGC -XX:ZYoungGCThreads=1 -XX:ZOldGC... Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Fix linux-cross-compile build aarch64 - Merge branch 'master' into JDK-8370947 - Remove trailing whitespaces - Add support of deferred icache invalidation to other GCs and JIT - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence - Add jtreg test - Fix linux-cross-compile aarch64 build - Fix regressions for Java methods without field accesses - Fix code style - Correct ifdef; Add dsb after ic - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f ------------- Changes: https://git.openjdk.org/jdk/pull/28328/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=12 Stats: 879 lines in 25 files changed: 839 ins; 7 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From shade at openjdk.org Wed Dec 3 16:14:05 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 16:14:05 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 15:42:38 GMT, Evgeny Astigeevich wrote: >> Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. >> >> Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: >> - Disable coherent icache. >> - Trap IC IVAU instructions. >> - Execute: >> - `tlbi vae3is, xzr` >> - `dsb sy` >> >> `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. >> >> As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: >> >> "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." >> >> This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. >> >> Changes include: >> >> * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. >> * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. >> * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. >> * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. >> >> Benchmarking results: Neoverse-N1 r3p1 (Graviton 2) >> >> - Baseline >> >> $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1... > > Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Fix linux-cross-compile build aarch64 > - Merge branch 'master' into JDK-8370947 > - Remove trailing whitespaces > - Add support of deferred icache invalidation to other GCs and JIT > - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence > - Add jtreg test > - Fix linux-cross-compile aarch64 build > - Fix regressions for Java methods without field accesses > - Fix code style > - Correct ifdef; Add dsb after ic > - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f Interesting work! I was able to look through it very briefly: src/hotspot/cpu/aarch64/globals_aarch64.hpp line 133: > 131: "Enable workaround for Neoverse N1 erratum 1542419") \ > 132: product(bool, UseDeferredICacheInvalidation, false, DIAGNOSTIC, \ > 133: "Defer multiple ICache invalidation to single invalidation") \ Since the `ICacheInvalidationContext` is in shared code, and I suppose x86_64 would also benefit from this (at least eventually), this sounds like `globals.hpp` option. src/hotspot/share/asm/codeBuffer.cpp line 371: > 369: !((oop_Relocation*)reloc)->oop_is_immediate()) { > 370: _has_non_immediate_oops = true; > 371: } Honestly, this looks fragile? We can go into nmethods patching for some other reason, not for patching oops. Also, we still might need to go and patch immediate oops? I see this: // Instruct loadConP of x86_64.ad places oops in code that are not also // listed in the oop section. static bool mustIterateImmediateOopsInCode() { return true; } Is there a substantial loss is doing icache invalidation without checking for the existence of interesting oops? Do you have an idea how many methods this filters? src/hotspot/share/asm/codeBuffer.cpp line 939: > 937: // Move all the code and relocations to the new blob: > 938: relocate_code_to(&cb); > 939: } Here and later, the preferred style is: Suggestion: // Move all the code and relocations to the new blob: { ICacheInvalidationContext icic(ICacheInvalidation::NOT_NEEDED); relocate_code_to(&cb); } src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.cpp line 37: > 35: #include "memory/universe.hpp" > 36: #include "runtime/atomicAccess.hpp" > 37: #include "runtime/icache.hpp" Include is added, but no actual use? Is something missing, or this is a leftover include? test/hotspot/jtreg/gc/TestDeferredICacheInvalidation.java line 28: > 26: > 27: /* > 28: * @test id=ParallelGC Usually just: Suggestion: * @test id=parallel test/hotspot/jtreg/gc/TestDeferredICacheInvalidation.java line 34: > 32: * @requires vm.debug > 33: * @requires os.family=="linux" > 34: * @requires os.arch=="aarch64" I am guessing it is more future-proof to drop Linux/AArch64 filters, and rely on test doing the right thing, regardless of the config. I see it already skips when `UseDeferredICacheInvalidation` is off. test/micro/org/openjdk/bench/vm/gc/GCPatchingNmethodCost.java line 184: > 182: @Benchmark > 183: @Warmup(iterations = 0) > 184: @Measurement(iterations = 1) Not sure what is the intent here. Maybe you wanted `@BenchmarkMode(OneShot)` instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/28328#pullrequestreview-3535752098 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585729392 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585679778 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585704068 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585707389 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585735476 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585734553 PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2585743873 From xpeng at openjdk.org Wed Dec 3 16:23:29 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 16:23:29 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v3] In-Reply-To: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: > In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. > > In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. > > Test result: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > With the change: > > [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) > [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) > > Original: > > > [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) > [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) > > > The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. > > ### Other tests > - [x] hotspot_gc_shenandoah > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Simplify comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28613/files - new: https://git.openjdk.org/jdk/pull/28613/files/3b964995..892676c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28613&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28613&range=01-02 Stats: 14 lines in 2 files changed: 0 ins; 13 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28613/head:pull/28613 PR: https://git.openjdk.org/jdk/pull/28613 From shade at openjdk.org Wed Dec 3 16:23:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 16:23:30 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v3] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Wed, 3 Dec 2025 16:20:25 GMT, Xiaolong Peng wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Simplify comments Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3535887356 From xpeng at openjdk.org Wed Dec 3 16:23:32 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 16:23:32 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2] In-Reply-To: <_UCh_KR6-uzVFOJ9MM-gK3gsTessZ03FuecOFkS2F8c=.86fbe32d-b450-4285-8906-ead7bd003b8f@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> <_UCh_KR6-uzVFOJ9MM-gK3gsTessZ03FuecOFkS2F8c=.86fbe32d-b450-4285-8906-ead7bd003b8f@github.com> Message-ID: <5mKXql8U-bSulRVIzoVQXqwcQXlm24-3xExvFAk5oYU=.0ddb4082-5d95-4c08-9c8a-125585d05af4@github.com> On Wed, 3 Dec 2025 10:10:45 GMT, Aleksey Shipilev wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more comments. > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 88: > >> 86: // Using a smaller value here yields better task distribution for a lumpy workload. The task will be split >> 87: // into smaller batches with 8 regions in batch, the worker processes more regions w/o needs to reset bitmaps >> 88: // will process more batches, but overall all workers will be saturated throughout the whole concurrent reset phase. > > I have a very general comment about writing comments like this one. This entire block of prose is really excessive, is set up to be outdated (are you tracking the real behavior of `SH::parallel_heap_region_iterate` and its magical `4096`?), and can be boiled down to much more succinct: > > > Bitmap reset task is heavy-weight and benefits from much smaller tasks than the default. Thanks a lot! I have updated the PR to use the succinct one you suggested. > src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 119: > >> 117: // ShenandoahHeap::parallel_heap_region_iterate will derive a reasonable value based >> 118: // on active worker threads and number of regions. >> 119: // For some lumpy workload, the value can be overridden for better task distribution. > > Again, excessive. You can just drop the comment; its purpose is obvious from the code. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2585783241 PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2585784611 From btaylor at openjdk.org Wed Dec 3 17:26:19 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Wed, 3 Dec 2025 17:26:19 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered Message-ID: The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build ------------- Commit messages: - 8373039: Remove Incorrect Asserts in shenandoahScanRemembered Changes: https://git.openjdk.org/jdk/pull/28642/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28642&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373039 Stats: 9 lines in 1 file changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28642/head:pull/28642 PR: https://git.openjdk.org/jdk/pull/28642 From wkemper at openjdk.org Wed Dec 3 17:47:47 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 17:47:47 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 17:16:02 GMT, Ben Taylor wrote: > The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. > > A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 > > This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 384: > 382: oop obj = cast_to_oop(p); > 383: assert(oopDesc::is_oop(obj), "Should be an object"); > 384: assert(p <= left, "p should start at or before left end of card"); I think it's fine to take out this loop, but the assert on 384 now seems redundant to the assert on 363. I'm also not sure if the assert on 385 necessarily holds because `p` is no longer increased in the loop. Maybe remove this whole `#ifdef ASSERT` block, or leave in the loop and just take out the `Klass::is_valid` usage. ------------- PR Review: https://git.openjdk.org/jdk/pull/28642#pullrequestreview-3536232605 PR Review Comment: https://git.openjdk.org/jdk/pull/28642#discussion_r2586068246 From ysr at openjdk.org Wed Dec 3 18:30:29 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 3 Dec 2025 18:30:29 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 17:45:31 GMT, William Kemper wrote: >> The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. >> >> A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 >> >> This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build > > src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 384: > >> 382: oop obj = cast_to_oop(p); >> 383: assert(oopDesc::is_oop(obj), "Should be an object"); >> 384: assert(p <= left, "p should start at or before left end of card"); > > I think it's fine to take out this loop, but the assert on 384 now seems redundant to the assert on 363. I'm also not sure if the assert on 385 necessarily holds because `p` is no longer increased in the loop. Maybe remove this whole `#ifdef ASSERT` block, or leave in the loop and just take out the `Klass::is_valid` usage. I agree. In addition, the comment should be updated so it doesn't make the confusing reference to "the loop that follows", which just went away, etc. It's fine to leave a suitably modified comment as to why it is safe to query the size of the object at the oop being returned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28642#discussion_r2586182968 From ysr at openjdk.org Wed Dec 3 18:30:30 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 3 Dec 2025 18:30:30 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:24:58 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 384: >> >>> 382: oop obj = cast_to_oop(p); >>> 383: assert(oopDesc::is_oop(obj), "Should be an object"); >>> 384: assert(p <= left, "p should start at or before left end of card"); >> >> I think it's fine to take out this loop, but the assert on 384 now seems redundant to the assert on 363. I'm also not sure if the assert on 385 necessarily holds because `p` is no longer increased in the loop. Maybe remove this whole `#ifdef ASSERT` block, or leave in the loop and just take out the `Klass::is_valid` usage. > > I agree. In addition, the comment should be updated so it doesn't make the confusing reference to "the loop that follows", which just went away, etc. It's fine to leave a suitably modified comment as to why it is safe to query the size of the object at the oop being returned. > I'm also not sure if the assert on 385 necessarily holds because p is no longer increased in the loop. It should hold for the oop/object being returned here. It's a post-condition of the method which should have been stated in its API spec I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28642#discussion_r2586190041 From eastigeevich at openjdk.org Wed Dec 3 18:48:32 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 18:48:32 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 16:10:55 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix linux-cross-compile build aarch64 >> - Merge branch 'master' into JDK-8370947 >> - Remove trailing whitespaces >> - Add support of deferred icache invalidation to other GCs and JIT >> - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence >> - Add jtreg test >> - Fix linux-cross-compile aarch64 build >> - Fix regressions for Java methods without field accesses >> - Fix code style >> - Correct ifdef; Add dsb after ic >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f > > test/micro/org/openjdk/bench/vm/gc/GCPatchingNmethodCost.java line 184: > >> 182: @Benchmark >> 183: @Warmup(iterations = 0) >> 184: @Measurement(iterations = 1) > > Not sure what is the intent here. Maybe you wanted `@BenchmarkMode(OneShot)` instead? The current algorithm: - Create an object used in Java methods. - Run the methods in the interpreter. - Compile the methods. - Make the object garbage collectable. - Run GC (we measure this). There are not many things to warm-up. And setting up everything for multiple iterations of GC runs might be expensive. Instead we use forks. IMO, Yes it is `@BenchmarkMode(OneShot)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586236955 From shade at openjdk.org Wed Dec 3 18:53:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 18:53:13 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:45:25 GMT, Evgeny Astigeevich wrote: >> test/micro/org/openjdk/bench/vm/gc/GCPatchingNmethodCost.java line 184: >> >>> 182: @Benchmark >>> 183: @Warmup(iterations = 0) >>> 184: @Measurement(iterations = 1) >> >> Not sure what is the intent here. Maybe you wanted `@BenchmarkMode(OneShot)` instead? > > The current algorithm: > - Create an object used in Java methods. > - Run the methods in the interpreter. > - Compile the methods. > - Make the object garbage collectable. > - Run GC (we measure this). > > There are not many things to warm-up. And setting up everything for multiple iterations of GC runs might be expensive. Instead we use forks. > > IMO, Yes it is `@BenchmarkMode(OneShot)`. Yeah, but first GC would likely be slower, because it would have more real work to do. So you probably want OneShot with the default number of iterations. It will warmup by doing a few GCs, and then do a few other GCs for measurement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586250541 From eastigeevich at openjdk.org Wed Dec 3 18:53:10 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 18:53:10 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 16:00:05 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix linux-cross-compile build aarch64 >> - Merge branch 'master' into JDK-8370947 >> - Remove trailing whitespaces >> - Add support of deferred icache invalidation to other GCs and JIT >> - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence >> - Add jtreg test >> - Fix linux-cross-compile aarch64 build >> - Fix regressions for Java methods without field accesses >> - Fix code style >> - Correct ifdef; Add dsb after ic >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f > > src/hotspot/share/asm/codeBuffer.cpp line 939: > >> 937: // Move all the code and relocations to the new blob: >> 938: relocate_code_to(&cb); >> 939: } > > Here and later, the preferred style is: > > Suggestion: > > // Move all the code and relocations to the new blob: > { > ICacheInvalidationContext icic(ICacheInvalidation::NOT_NEEDED); > relocate_code_to(&cb); > } I followed @xmas92 comments on style to use a blank line. @xmas92, what style should I follow? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586248135 From wkemper at openjdk.org Wed Dec 3 18:56:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 18:56:56 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 02:02:18 GMT, Rui Li wrote: > Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. > > Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: > > > available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used > soft_tail = Xmx - soft_max > if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc > > > The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. > > > Suggested fix: when deciding when to trigger gc, use logic similar to below: > > mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; > available = mutator_soft_capacity - used; > if (available < mutator_soft_capacity) // trigger gc > ``` > > ------- > This change also improved gc logging: > > Before: > > [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) > [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% > external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B > > > After: > > [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) > [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: > 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: > 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: > 122, Reserved: 102M, Max free available in a single region: 1024K; A few nits. Thank you for adding a test case for this! src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 240: > 238: size_t allocated = _space_info->bytes_allocated_since_gc_start(); > 239: > 240: log_debug(gc)("should_start_gc calculation: available: %zu%s, soft_max_capacity: %zu%s" Can we add `ergo` tag to this message? Let's use the `PROPERFMT` and `PROPERFMTARGS` macros here and in other log messages we're changing. src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 258: > 256: size_t min_threshold = min_free_threshold(); > 257: if (available < min_threshold) { > 258: log_trigger("Free (Soft mutator free) (%zu%s) is below minimum threshold (%zu%s)", Changing this will break some log parsers, do we really need this? src/hotspot/share/gc/shenandoah/heuristics/shenandoahCompactHeuristics.cpp line 52: > 50: size_t capacity = ShenandoahHeap::heap()->soft_max_capacity(); > 51: size_t available = _space_info->soft_available(); > 52: size_t allocated = _space_info->bytes_allocated_since_gc_start(); This shadows `bytes_allocated` below. Let's just use one variable for this. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 3209: > 3207: log_freeset_stats(ShenandoahFreeSetPartitionId::Mutator, ls); > 3208: log_freeset_stats(ShenandoahFreeSetPartitionId::Collector, ls); > 3209: if (_heap->mode()->is_generational()) {log_freeset_stats(ShenandoahFreeSetPartitionId::OldCollector, ls);} Suggestion: if (_heap->mode()->is_generational()) { log_freeset_stats(ShenandoahFreeSetPartitionId::OldCollector, ls); } src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 632: > 630: size_t get_usable_free_words(size_t free_bytes) const; > 631: > 632: void log_freeset_stats(ShenandoahFreeSetPartitionId partition_id, LogStream& ls); `log_freeset_stats` should probably be `private`. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28622#pullrequestreview-3536428634 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586232667 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586234993 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586237553 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586243946 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586247150 From eastigeevich at openjdk.org Wed Dec 3 19:51:32 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 19:51:32 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:50:44 GMT, Aleksey Shipilev wrote: >> The current algorithm: >> - Create an object used in Java methods. >> - Run the methods in the interpreter. >> - Compile the methods. >> - Make the object garbage collectable. >> - Run GC (we measure this). >> >> There are not many things to warm-up. And setting up everything for multiple iterations of GC runs might be expensive. Instead we use forks. >> >> IMO, Yes it is `@BenchmarkMode(OneShot)`. > > Yeah, but first GC would likely be slower, because it would have more real work to do. So you probably want OneShot with the default number of iterations. It will warmup by doing a few GCs, and then do a few other GCs for measurement. I have `Thread.sleep(1000)` in `setupCodeCache()` to let everything to settle down. I use it because I saw high variance in GC times. With it variance became OK. Maybe I should use `System.gc()` instead of `Thread.sleep`. > So you probably want OneShot with the default number of iterations. Will I need to recreate an object and to rerun Java methods before each iteration? The first iteration will collect garbage object `fields`. So following iterations running GC will do nothing. Or will they patch nmethods again? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586405992 From xpeng at openjdk.org Wed Dec 3 21:18:42 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 21:18:42 GMT Subject: RFR: 8373048: Genshen: Remove dead code from Shenandoah Message-ID: Trivial PR to remove dead code from Shenandoah. I noticed some dead code in shenandoahFreeSet.cpp when I was working on https://github.com/openjdk/jdk/pull/26171, this PR is to clean up the dead code in shenandoahFreeSet.cpp and some other files, no functional change at all. ### Test - [x] hotspot_gc_shenandoah - [ ] GHA ------------- Commit messages: - Removed dead code Changes: https://git.openjdk.org/jdk/pull/28647/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28647&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373048 Stats: 145 lines in 7 files changed: 0 ins; 143 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28647.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28647/head:pull/28647 PR: https://git.openjdk.org/jdk/pull/28647 From wkemper at openjdk.org Wed Dec 3 21:28:24 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 21:28:24 GMT Subject: RFR: 8373048: Genshen: Remove dead code from Shenandoah In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 20:55:09 GMT, Xiaolong Peng wrote: > Trivial PR to remove dead code from Shenandoah. I noticed some dead code in shenandoahFreeSet.cpp when I was working on https://github.com/openjdk/jdk/pull/26171, this PR is to clean up the dead code in shenandoahFreeSet.cpp and some other files, no functional change at all. > > ### Test > - [x] hotspot_gc_shenandoah > - [ ] GHA Nice cleanup, thank you! ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28647#pullrequestreview-3536983896 From kdnilsen at openjdk.org Wed Dec 3 21:33:59 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 3 Dec 2025 21:33:59 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v3] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Wed, 3 Dec 2025 16:23:29 GMT, Xiaolong Peng wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Simplify comments Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3537003493 From btaylor at openjdk.org Wed Dec 3 21:42:55 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Wed, 3 Dec 2025 21:42:55 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots Message-ID: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. ------------- Commit messages: - 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots Changes: https://git.openjdk.org/jdk/pull/28648/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28648&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373054 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28648.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28648/head:pull/28648 PR: https://git.openjdk.org/jdk/pull/28648 From wkemper at openjdk.org Wed Dec 3 21:42:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 21:42:56 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots In-Reply-To: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: On Wed, 3 Dec 2025 21:33:50 GMT, Ben Taylor wrote: > The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. > > The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. Let's change the misleading comment. src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.cpp line 147: > 145: ShenandoahReentrantLocker locker(nm_data->lock()); > 146: > 147: // Heal oops and disarm Suggestion: // Heal oops and leave the nmethod armed because code cache unloading needs to know about on-stack nmethods. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28648#pullrequestreview-3537029053 PR Review Comment: https://git.openjdk.org/jdk/pull/28648#discussion_r2586693272 From btaylor at openjdk.org Wed Dec 3 22:07:15 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Wed, 3 Dec 2025 22:07:15 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots [v2] In-Reply-To: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: > The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. > > The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: Fix misleading comment in previous commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28648/files - new: https://git.openjdk.org/jdk/pull/28648/files/d830a0a1..a1a9bf11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28648&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28648&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28648.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28648/head:pull/28648 PR: https://git.openjdk.org/jdk/pull/28648 From btaylor at openjdk.org Wed Dec 3 22:08:11 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Wed, 3 Dec 2025 22:08:11 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v2] In-Reply-To: References: Message-ID: > The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. > > A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 > > This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: Fix up comment and remove additional assert from previous commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28642/files - new: https://git.openjdk.org/jdk/pull/28642/files/03456d6f..eec662f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28642&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28642&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28642/head:pull/28642 PR: https://git.openjdk.org/jdk/pull/28642 From xpeng at openjdk.org Wed Dec 3 22:45:02 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 22:45:02 GMT Subject: RFR: 8373048: Genshen: Remove dead code from Shenandoah In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 20:55:09 GMT, Xiaolong Peng wrote: > Trivial PR to remove dead code from Shenandoah. I noticed some dead code in shenandoahFreeSet.cpp when I was working on https://github.com/openjdk/jdk/pull/26171, this PR is to clean up the dead code in shenandoahFreeSet.cpp and some other files, no functional change at all. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28647#issuecomment-3609153899 From xpeng at openjdk.org Wed Dec 3 22:46:14 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 22:46:14 GMT Subject: RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v3] In-Reply-To: References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Wed, 3 Dec 2025 16:23:29 GMT, Xiaolong Peng wrote: >> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. >> >> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. >> >> Test result: >> >> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" >> >> With the change: >> >> [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) >> [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) >> >> Original: >> >> >> [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) >> [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) >> >> >> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. >> >> ### Other tests >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Simplify comments Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28613#issuecomment-3609152280 From xpeng at openjdk.org Wed Dec 3 22:46:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 22:46:16 GMT Subject: Integrated: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism In-Reply-To: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> References: <6uz-mrC1sU0Q8kxBHKCDFLarpR2mNERthlu_w8s0ym4=.00d5486d-1704-4484-8339-a081f68f8793@github.com> Message-ID: On Tue, 2 Dec 2025 18:59:25 GMT, Xiaolong Peng wrote: > In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. > > In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers. > > Test result: > > java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset" > > With the change: > > [77.867s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3039 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1328, 14650) > [77.867s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3107 us) (n = 14) (lvls, us = 1094, 1230, 1855, 3457, 8348) > > Original: > > > [77.289s][info][gc,stats ] Concurrent Reset = 0.045 s (a = 3197 us) (n = 14) (lvls, us = 1172, 1191, 1309, 1426, 15582) > [77.289s][info][gc,stats ] Concurrent Reset After Collect = 0.105 s (a = 7476 us) (n = 14) (lvls, us = 2246, 3828, 4395, 7695, 21266) > > > The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. > > ### Other tests > - [x] hotspot_gc_shenandoah > - [x] GHA This pull request has now been integrated. Changeset: db2a5420 Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/db2a5420a2e3d0f5f0f066eace37a8fd4f075802 Stats: 13 lines in 4 files changed: 12 ins; 0 del; 1 mod 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism Reviewed-by: kdnilsen, shade, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/28613 From xpeng at openjdk.org Wed Dec 3 22:49:07 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 22:49:07 GMT Subject: Integrated: 8373048: Genshen: Remove dead code from Shenandoah In-Reply-To: References: Message-ID: <7zQpw05McTnnh2XNSZ4jc1FIMOcGiKUObOM-_sZhAfo=.1a091412-513e-4018-97c0-62cd4b004016@github.com> On Wed, 3 Dec 2025 20:55:09 GMT, Xiaolong Peng wrote: > Trivial PR to remove dead code from Shenandoah. I noticed some dead code in shenandoahFreeSet.cpp when I was working on https://github.com/openjdk/jdk/pull/26171, this PR is to clean up the dead code in shenandoahFreeSet.cpp and some other files, no functional change at all. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA This pull request has now been integrated. Changeset: 8f8fda7c Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/8f8fda7c80b57e8a36827cc260f0be0e5d61f6a6 Stats: 145 lines in 7 files changed: 0 ins; 143 del; 2 mod 8373048: Genshen: Remove dead code from Shenandoah Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/28647 From duke at openjdk.org Wed Dec 3 22:50:49 2025 From: duke at openjdk.org (Rui Li) Date: Wed, 3 Dec 2025 22:50:49 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:43:48 GMT, William Kemper wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 240: > >> 238: size_t allocated = _space_info->bytes_allocated_since_gc_start(); >> 239: >> 240: log_debug(gc)("should_start_gc calculation: available: %zu%s, soft_max_capacity: %zu%s" > > Can we add `ergo` tag to this message? Let's use the `PROPERFMT` and `PROPERFMTARGS` macros here and in other log messages we're changing. Sure. > src/hotspot/share/gc/shenandoah/heuristics/shenandoahCompactHeuristics.cpp line 52: > >> 50: size_t capacity = ShenandoahHeap::heap()->soft_max_capacity(); >> 51: size_t available = _space_info->soft_available(); >> 52: size_t allocated = _space_info->bytes_allocated_since_gc_start(); > > This shadows `bytes_allocated` below. Let's just use one variable for this. Good catch. Removed one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586856567 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586856298 From xpeng at openjdk.org Wed Dec 3 23:24:29 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 23:24:29 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() Message-ID: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread ### Test - [ ] hotspot_gc_shenandoah - [ ] GHA ------------- Commit messages: - Remove direct use of alloc type from ShenandoahHeapRegion::adjust_alloc_metadata - Revert log change - Remove unnecessary use of ShenandoahAllocRequest.type() Changes: https://git.openjdk.org/jdk/pull/28649/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28649&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373056 Stats: 79 lines in 6 files changed: 12 ins; 20 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/28649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28649/head:pull/28649 PR: https://git.openjdk.org/jdk/pull/28649 From duke at openjdk.org Wed Dec 3 23:26:56 2025 From: duke at openjdk.org (Rui Li) Date: Wed, 3 Dec 2025 23:26:56 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:49:28 GMT, William Kemper wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 632: > >> 630: size_t get_usable_free_words(size_t free_bytes) const; >> 631: >> 632: void log_freeset_stats(ShenandoahFreeSetPartitionId partition_id, LogStream& ls); > > `log_freeset_stats` should probably be `private`. I thought it was private already? The `private` starts from [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp#L478). Or, if you expand this section a bit to line 636, another `public` starts after these declaration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586918976 From xpeng at openjdk.org Wed Dec 3 23:30:21 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 23:30:21 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v2] In-Reply-To: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: <1wB8K5uAm9h-sVDOlHskuhpH_kNuJIcxhBTHrkfDck0=.07cd6bd4-5cc5-4ec6-afbb-bb4b9cfa1cde@github.com> > Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. > > In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: > * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) > > Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread > > ### Test > - [x] hotspot_gc_shenandoah > - [ ] GHA Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8373056 - Remove direct use of alloc type from ShenandoahHeapRegion::adjust_alloc_metadata - Revert log change - Remove unnecessary use of ShenandoahAllocRequest.type() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28649/files - new: https://git.openjdk.org/jdk/pull/28649/files/28f802d8..59087c8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28649&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28649&range=00-01 Stats: 9796 lines in 279 files changed: 6048 ins; 2286 del; 1462 mod Patch: https://git.openjdk.org/jdk/pull/28649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28649/head:pull/28649 PR: https://git.openjdk.org/jdk/pull/28649 From eastigeevich at openjdk.org Wed Dec 3 23:36:03 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 23:36:03 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 15:54:24 GMT, Aleksey Shipilev wrote: > Honestly, this looks fragile? We can go into nmethods patching for some other reason, not for patching oops. For GCs on ARM64, I found only patching `nmethod::fix_oop_relocations` and patching ZGC barriers. This may be because `mustIterateImmediateOopsInCode` return false on ARM64. We will need to add support of instructions modified through `OopClosure::do_oop`. > Is there a substantial loss is doing icache invalidation without checking for the existence of interesting oops? Do you have an idea how many methods this filters? https://github.com/openjdk/jdk/pull/28328#issuecomment-3558673810 Axel (@xmas92) saw some SpecJVM regressions. I think they might be caused by the increased number of icache invalidation. We had not patched methods, no icache invalidation, before this PR and started always-icache invalidation after this PR. I will be checking SpecJVM, SpecJBB and other benchmarks (dacapo, renaissance). I might check if the following approach does not have much overhead: - In `nmethod::fix_oop_relocations` ICacheInvalidationContext icic(UseDeferredICacheInvalidation : ICacheInvalidation::DEFERRED ? ICacheInvalidation::IMMEDIATE); bool patching_code = false; while (iter.next()) { ... patching_code = reloc->fix_oop_relocation(); ... patching_code = reloc->fix_metadata_relocation(); } If (icic.mode() == ICacheInvalidation::DEFERRED && !patching_code) { icic.set_mode(ICacheInvalidation::NOT_NEEDED); } If it works, it will reduce amount of changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586934914 From xpeng at openjdk.org Wed Dec 3 23:41:26 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 3 Dec 2025 23:41:26 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v15] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 256 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - Add missing header for ShenandoahFreeSetPartitionId - Declare ShenandoahFreeSetPartitionId as enum instead of enum class - Fix a typo - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition - Port the fix of JDK-8372566 - Merge branch 'master' into cas-alloc-1 - Merge remote-tracking branch 'origin/master' into cas-alloc-1 - Remove junk code - ... and 246 more: https://git.openjdk.org/jdk/compare/8f8fda7c...f9f74ff0 ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=14 Stats: 1637 lines in 25 files changed: 1283 ins; 242 del; 112 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From wkemper at openjdk.org Wed Dec 3 23:51:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 Dec 2025 23:51:59 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 23:24:38 GMT, Rui Li wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 632: >> >>> 630: size_t get_usable_free_words(size_t free_bytes) const; >>> 631: >>> 632: void log_freeset_stats(ShenandoahFreeSetPartitionId partition_id, LogStream& ls); >> >> `log_freeset_stats` should probably be `private`. > > I thought it was private already? The `private` starts from [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp#L478). Or, if you expand this section a bit to line 636, another `public` starts after these declaration. :face-palm:, you're right. I misread the diff. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2586958311 From eastigeevich at openjdk.org Wed Dec 3 23:58:04 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 3 Dec 2025 23:58:04 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 15:42:38 GMT, Evgeny Astigeevich wrote: >> Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. >> >> Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: >> - Disable coherent icache. >> - Trap IC IVAU instructions. >> - Execute: >> - `tlbi vae3is, xzr` >> - `dsb sy` >> >> `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. >> >> As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: >> >> "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." >> >> This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. >> >> Changes include: >> >> * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. >> * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. >> * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. >> * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. >> >> Benchmarking results: Neoverse-N1 r3p1 (Graviton 2) >> >> - Baseline >> >> $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1... > > Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Fix linux-cross-compile build aarch64 > - Merge branch 'master' into JDK-8370947 > - Remove trailing whitespaces > - Add support of deferred icache invalidation to other GCs and JIT > - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence > - Add jtreg test > - Fix linux-cross-compile aarch64 build > - Fix regressions for Java methods without field accesses > - Fix code style > - Correct ifdef; Add dsb after ic > - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f src/hotspot/os_cpu/linux_aarch64/icache_linux_aarch64.hpp line 114: > 112: _code = nullptr; > 113: _size = 0; > 114: _mode = ICacheInvalidation::NOT_NEEDED; This should be inside IF. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2586966933 From wkemper at openjdk.org Thu Dec 4 00:41:00 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 00:41:00 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v2] In-Reply-To: <1wB8K5uAm9h-sVDOlHskuhpH_kNuJIcxhBTHrkfDck0=.07cd6bd4-5cc5-4ec6-afbb-bb4b9cfa1cde@github.com> References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> <1wB8K5uAm9h-sVDOlHskuhpH_kNuJIcxhBTHrkfDck0=.07cd6bd4-5cc5-4ec6-afbb-bb4b9cfa1cde@github.com> Message-ID: On Wed, 3 Dec 2025 23:30:21 GMT, Xiaolong Peng wrote: >> Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. >> >> In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: >> * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) >> >> Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [ ] GHA > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8373056 > - Remove direct use of alloc type from ShenandoahHeapRegion::adjust_alloc_metadata > - Revert log change > - Remove unnecessary use of ShenandoahAllocRequest.type() Looks good. Left a minor nit about a now stale comment. src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 198: > 196: void > 197: ShenandoahOldGeneration::configure_plab_for_current_thread(const ShenandoahAllocRequest &req) { > 198: // Note: Even when a mutator is performing a promotion outside a LAB, we use a 'shared_gc' request. Is this comment vestigial now? This method doesn't handle shared allocations anymore. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28649#pullrequestreview-3537426868 PR Review Comment: https://git.openjdk.org/jdk/pull/28649#discussion_r2587024810 From dlong at openjdk.org Thu Dec 4 00:46:59 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 Dec 2025 00:46:59 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 08:37:02 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: > > - add guard to the include > - add load_reference_barrier_Type > - add clone_barrier_Type > - add write_barrier_pre_Type > - revert shenandoah changes How about leaving make_clone_type_Type() in barrierSetC2.cpp? I don't see a need to move it into runtime.cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3609423313 From ysr at openjdk.org Thu Dec 4 01:22:57 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 4 Dec 2025 01:22:57 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v2] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 22:08:11 GMT, Ben Taylor wrote: >> The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. >> >> A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 >> >> This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix up comment and remove additional assert from previous commit One more fix to the comment. LGTM otherwise. Thanks for the cleanups. src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 372: > 370: // and then too only during promotion/evacuation phases. Thus there is no danger > 371: // of races between reading from and writing to the object start array, > 372: // or of asking partially initialized objects their size (in the loop below). Remove reference to "in the loop below". ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28642#pullrequestreview-3537496302 PR Review Comment: https://git.openjdk.org/jdk/pull/28642#discussion_r2587092135 From xpeng at openjdk.org Thu Dec 4 01:23:55 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Dec 2025 01:23:55 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v3] In-Reply-To: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: > Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. > > In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: > * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) > > Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread > > ### Test > - [x] hotspot_gc_shenandoah > - [ ] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Remove outdated comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28649/files - new: https://git.openjdk.org/jdk/pull/28649/files/59087c8e..57305932 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28649&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28649&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28649/head:pull/28649 PR: https://git.openjdk.org/jdk/pull/28649 From xpeng at openjdk.org Thu Dec 4 01:23:59 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Dec 2025 01:23:59 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v2] In-Reply-To: References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> <1wB8K5uAm9h-sVDOlHskuhpH_kNuJIcxhBTHrkfDck0=.07cd6bd4-5cc5-4ec6-afbb-bb4b9cfa1cde@github.com> Message-ID: On Thu, 4 Dec 2025 00:36:56 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8373056 >> - Remove direct use of alloc type from ShenandoahHeapRegion::adjust_alloc_metadata >> - Revert log change >> - Remove unnecessary use of ShenandoahAllocRequest.type() > > src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 198: > >> 196: void >> 197: ShenandoahOldGeneration::configure_plab_for_current_thread(const ShenandoahAllocRequest &req) { >> 198: // Note: Even when a mutator is performing a promotion outside a LAB, we use a 'shared_gc' request. > > Is this comment vestigial now? This method doesn't handle shared allocations anymore. Yeah, it is outdated, I will remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28649#discussion_r2587092735 From ysr at openjdk.org Thu Dec 4 01:27:58 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 4 Dec 2025 01:27:58 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots [v2] In-Reply-To: References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: <3dmdwyrhOIf8HobQN8f_s_OuhA1DI-cMTuJ7jD-oCUU=.ca7fa0a7-a41e-46c4-8d86-435f02db1c9b@github.com> On Wed, 3 Dec 2025 22:07:15 GMT, Ben Taylor wrote: >> The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. >> >> The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix misleading comment in previous commit LGTM ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28648#pullrequestreview-3537506166 From xpeng at openjdk.org Thu Dec 4 01:29:55 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Dec 2025 01:29:55 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v2] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 22:08:11 GMT, Ben Taylor wrote: >> The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. >> >> A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 >> >> This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix up comment and remove additional assert from previous commit LGTM, thanks for looking into this. ------------- Marked as reviewed by xpeng (Committer). PR Review: https://git.openjdk.org/jdk/pull/28642#pullrequestreview-3537509599 From duke at openjdk.org Thu Dec 4 01:43:36 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 4 Dec 2025 01:43:36 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v2] In-Reply-To: References: Message-ID: > Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. > > Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: > > > available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used > soft_tail = Xmx - soft_max > if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc > > > The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. > > > Suggested fix: when deciding when to trigger gc, use logic similar to below: > > mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; > available = mutator_soft_capacity - used; > if (available < mutator_soft_capacity) // trigger gc > ``` > > ------- > This change also improved gc logging: > > Before: > > [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) > [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% > external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B > > > After: > > [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) > [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: > 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: > 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: > 122, Reserved: 102M, Max free available in a single region: 1024K; Rui Li has updated the pull request incrementally with two additional commits since the last revision: - Rename soft_available. Change Generation soft avail impl - log format fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28622/files - new: https://git.openjdk.org/jdk/pull/28622/files/b23e9ff1..103ce8f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=00-01 Stats: 35 lines in 11 files changed: 2 ins; 7 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/28622.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28622/head:pull/28622 PR: https://git.openjdk.org/jdk/pull/28622 From kdnilsen at openjdk.org Thu Dec 4 01:50:04 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 Dec 2025 01:50:04 GMT Subject: RFR: 8373048: Genshen: Remove dead code from Shenandoah In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 20:55:09 GMT, Xiaolong Peng wrote: > Trivial PR to remove dead code from Shenandoah. I noticed some dead code in shenandoahFreeSet.cpp when I was working on https://github.com/openjdk/jdk/pull/26171, this PR is to clean up the dead code in shenandoahFreeSet.cpp and some other files, no functional change at all. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA Thanks for this cleanup ------------- PR Review: https://git.openjdk.org/jdk/pull/28647#pullrequestreview-3537557808 From duke at openjdk.org Thu Dec 4 02:19:32 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 4 Dec 2025 02:19:32 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: > Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. > > Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: > > > available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used > soft_tail = Xmx - soft_max > if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc > > > The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. > > > Suggested fix: when deciding when to trigger gc, use logic similar to below: > > mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; > available = mutator_soft_capacity - used; > if (available < mutator_soft_capacity) // trigger gc > ``` > > ------- > This change also improved gc logging: > > Before: > > [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) > [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% > external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B > > > After: > > [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) > [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: > 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: > 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: > 122, Reserved: 102M, Max free available in a single region: 1024K; Rui Li has updated the pull request incrementally with one additional commit since the last revision: Remove unused freeset includes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28622/files - new: https://git.openjdk.org/jdk/pull/28622/files/103ce8f8..599cc2d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=01-02 Stats: 3 lines in 3 files changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28622.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28622/head:pull/28622 PR: https://git.openjdk.org/jdk/pull/28622 From aboldtch at openjdk.org Thu Dec 4 06:16:05 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 4 Dec 2025 06:16:05 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: <834LXpq7tgXkAdLSbu_J-OoTWWYhCxr40d-y80Z5z3M=.84badc91-3b8f-44db-800b-b48cd1dfc8d6@github.com> On Wed, 3 Dec 2025 16:00:05 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix linux-cross-compile build aarch64 >> - Merge branch 'master' into JDK-8370947 >> - Remove trailing whitespaces >> - Add support of deferred icache invalidation to other GCs and JIT >> - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence >> - Add jtreg test >> - Fix linux-cross-compile aarch64 build >> - Fix regressions for Java methods without field accesses >> - Fix code style >> - Correct ifdef; Add dsb after ic >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f > > src/hotspot/share/asm/codeBuffer.cpp line 939: > >> 937: // Move all the code and relocations to the new blob: >> 938: relocate_code_to(&cb); >> 939: } > > Here and later, the preferred style is: > > Suggestion: > > // Move all the code and relocations to the new blob: > { > ICacheInvalidationContext icic(ICacheInvalidation::NOT_NEEDED); > relocate_code_to(&cb); > } Go ahead and use @shipilev suggested change. Following the code style of the surrounding code is usually my preference as well. _The style comments we had earlier all applied to the ZGC code which has a certain style._ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2587712509 From duke at openjdk.org Thu Dec 4 06:17:59 2025 From: duke at openjdk.org (Harshit470250) Date: Thu, 4 Dec 2025 06:17:59 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 08:37:02 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: > > - add guard to the include > - add load_reference_barrier_Type > - add clone_barrier_Type > - add write_barrier_pre_Type > - revert shenandoah changes After moving make_clone_type_Type() into barrierSetC2.cpp when I try to include the barrierSetC2.cpp file into runtime.cpp or type.cpp it causes redefinition of many functions. I get these errors duplicate symbol 'BarrierSetC2::load_at_resolved(C2Access&, Type const*) const' in: /Users/harshitdhiman/jdk/build/macosx-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/barrierSetC2.o /Users/harshitdhiman/jdk/build/macosx-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/runtime.o duplicate symbol 'BarrierStubC2::entry()' in: /Users/harshitdhiman/jdk/build/macosx-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/barrierSetC2.o /Users/harshitdhiman/jdk/build/macosx-aarch64-server-fastdebug/hotspot/variant-server/libjvm/objs/runtime.o Can you suggest a way to solve this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3610474295 From shade at openjdk.org Thu Dec 4 07:16:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Dec 2025 07:16:58 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots [v2] In-Reply-To: References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: On Wed, 3 Dec 2025 22:07:15 GMT, Ben Taylor wrote: >> The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. >> >> The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix misleading comment in previous commit Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28648#pullrequestreview-3538487931 From wkemper at openjdk.org Thu Dec 4 15:08:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 15:08:52 GMT Subject: RFR: Merge openjdk/jdk21u:master Message-ID: Merges tag jdk-21.0.10+5 ------------- Commit messages: - Merge - 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization - 8327980: Convert javax/swing/JToggleButton/4128979/bug4128979.java applet test to main - 8341131: Some jdk/jfr/event/compiler tests shouldn't be executed with Xcomp - 8368982: Test sun/security/tools/jarsigner/EC.java completed and timed out - 8313770: jdk/internal/platform/docker/TestSystemMetrics.java fails on Ubuntu - 8368960: Adjust java UL logging in the build - 8369563: Gtest dll_address_to_function_and_library_name has issues with stripped pdb files - 8343340: Swapping checking do not work for MetricsMemoryTester failcount - 8369032: Add test to ensure serialized ICC_Profile stores only necessary optional data - ... and 1 more: https://git.openjdk.org/shenandoah-jdk21u/compare/2f897401...3c0530fd The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/230/files Stats: 1413 lines in 32 files changed: 678 ins; 680 del; 55 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/230.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/230/head:pull/230 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/230 From btaylor at openjdk.org Thu Dec 4 16:06:13 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Thu, 4 Dec 2025 16:06:13 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v3] In-Reply-To: References: Message-ID: <7uBhjxIgI5nWimIcak1Id641QwQNKYhvXzo9_EQvFx8=.fa06f712-0cae-47bc-90d9-c650ae7ad86c@github.com> > The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. > > A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 > > This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: Update another outdated comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28642/files - new: https://git.openjdk.org/jdk/pull/28642/files/eec662f6..830c8348 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28642&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28642&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28642/head:pull/28642 PR: https://git.openjdk.org/jdk/pull/28642 From wkemper at openjdk.org Thu Dec 4 16:44:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 16:44:54 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v3] In-Reply-To: <7uBhjxIgI5nWimIcak1Id641QwQNKYhvXzo9_EQvFx8=.fa06f712-0cae-47bc-90d9-c650ae7ad86c@github.com> References: <7uBhjxIgI5nWimIcak1Id641QwQNKYhvXzo9_EQvFx8=.fa06f712-0cae-47bc-90d9-c650ae7ad86c@github.com> Message-ID: On Thu, 4 Dec 2025 16:06:13 GMT, Ben Taylor wrote: >> The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. >> >> A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 >> >> This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Update another outdated comment Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28642#pullrequestreview-3540960610 From wkemper at openjdk.org Thu Dec 4 16:46:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 16:46:54 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v3] In-Reply-To: References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: On Thu, 4 Dec 2025 01:23:55 GMT, Xiaolong Peng wrote: >> Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. >> >> In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: >> * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) >> >> Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [ ] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Remove outdated comments Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28649#pullrequestreview-3540973973 From wkemper at openjdk.org Thu Dec 4 16:55:22 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 16:55:22 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots [v2] In-Reply-To: References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: On Wed, 3 Dec 2025 22:07:15 GMT, Ben Taylor wrote: >> The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. >> >> The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix misleading comment in previous commit Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28648#pullrequestreview-3541015046 From duke at openjdk.org Thu Dec 4 18:37:47 2025 From: duke at openjdk.org (duke) Date: Thu, 4 Dec 2025 18:37:47 GMT Subject: RFR: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots [v2] In-Reply-To: References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: On Wed, 3 Dec 2025 22:07:15 GMT, Ben Taylor wrote: >> The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. >> >> The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Fix misleading comment in previous commit @benty-amzn Your change (at version a1a9bf11e271967973fd4d759ce13b1137434be1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28648#issuecomment-3613782426 From duke at openjdk.org Thu Dec 4 18:51:17 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 4 Dec 2025 18:51:17 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:44:37 GMT, William Kemper wrote: >> Rui Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused freeset includes > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 258: > >> 256: size_t min_threshold = min_free_threshold(); >> 257: if (available < min_threshold) { >> 258: log_trigger("Free (Soft mutator free) (%zu%s) is below minimum threshold (%zu%s)", > > Changing this will break some log parsers, do we really need this? Talked offline. `Free` is overloaded in logs. Sometimes it means soft free, sometimes it means total free. Make it as `Free (Soft)` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2590194712 From dlong at openjdk.org Thu Dec 4 20:14:32 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 Dec 2025 20:14:32 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: <7OlfD2Jc5Vu7a8x_QmCuDONR_u7AjPQJwqeTJLkAzR0=.723e9b17-44c2-4982-a54c-5b0bc07f3f81@github.com> On Wed, 3 Dec 2025 08:37:02 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: > > - add guard to the include > - add load_reference_barrier_Type > - add clone_barrier_Type > - add write_barrier_pre_Type > - revert shenandoah changes Do you have a branch that reproduces the problem, so I can take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3614148857 From wkemper at openjdk.org Thu Dec 4 20:33:07 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 20:33:07 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 02:19:32 GMT, Rui Li wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused freeset includes src/hotspot/share/gc/shenandoah/shenandoahGlobalGeneration.cpp line 81: > 79: } > 80: > 81: size_t ShenandoahGlobalGeneration::soft_available_exclude_evac_reserve() const { Two questions: * How is this override different from the default implementation now? * Should we not also take the minimum of this value and `free_set()->available()` as we do elsewhere? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2590478334 From wkemper at openjdk.org Thu Dec 4 20:42:27 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 Dec 2025 20:42:27 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification Message-ID: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. ------------- Commit messages: - Expand scope of control lock so that it can't miss cancellation notifications Changes: https://git.openjdk.org/jdk/pull/28665/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28665&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373100 Stats: 8 lines in 1 file changed: 2 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28665.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28665/head:pull/28665 PR: https://git.openjdk.org/jdk/pull/28665 From eastigeevich at openjdk.org Thu Dec 4 21:16:26 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 4 Dec 2025 21:16:26 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: <7IU85M4fl7Hk58g_oBLgw5g9QEHDhmSCLN6K5IhH9YQ=.46291089-efb6-432f-ac17-3480200c494a@github.com> On Wed, 3 Dec 2025 15:54:24 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix linux-cross-compile build aarch64 >> - Merge branch 'master' into JDK-8370947 >> - Remove trailing whitespaces >> - Add support of deferred icache invalidation to other GCs and JIT >> - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence >> - Add jtreg test >> - Fix linux-cross-compile aarch64 build >> - Fix regressions for Java methods without field accesses >> - Fix code style >> - Correct ifdef; Add dsb after ic >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f > > src/hotspot/share/asm/codeBuffer.cpp line 371: > >> 369: !((oop_Relocation*)reloc)->oop_is_immediate()) { >> 370: _has_non_immediate_oops = true; >> 371: } > > Honestly, this looks fragile? We can go into nmethods patching for some other reason, not for patching oops. > > Also, we still might need to go and patch immediate oops? I see this: > > > // Instruct loadConP of x86_64.ad places oops in code that are not also > // listed in the oop section. > static bool mustIterateImmediateOopsInCode() { return true; } > > > Is there a substantial loss is doing icache invalidation without checking for the existence of interesting oops? Do you have an idea how many methods this filters? @shipilev Moving `ICacheInvalidationContext icic` to `nmethod::fix_oop_relocations` works. The fragile code is no more needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2590596345 From duke at openjdk.org Thu Dec 4 21:24:04 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 4 Dec 2025 21:24:04 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 20:30:27 GMT, William Kemper wrote: >> Rui Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused freeset includes > > src/hotspot/share/gc/shenandoah/shenandoahGlobalGeneration.cpp line 81: > >> 79: } >> 80: >> 81: size_t ShenandoahGlobalGeneration::soft_available_exclude_evac_reserve() const { > > Two questions: > * How is this override different from the default implementation now? > * Should we not also take the minimum of this value and `free_set()->available()` as we do elsewhere? - Good call. No functional differences except for a safety assert: `assert(max_capacity() >= soft_max`, which isn't that necessary since the app wouldn't start if this wasn't true: [code](https://github.com/openjdk/jdk/blob/8e653d394e45180e16714124ed6584f912eb5cba/src/hotspot/share/gc/shared/jvmFlagConstraintsGC.cpp#L277). Will remove the override. - I don't think it's needed for global. `free_set()->available()` (space reserved for mutators) could be smaller than `mutator_soft_max` when there's an old gen taking space. If there's no generation, the `mutator_soft_max` should always be less or equal than `free_set()->available()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2590621429 From btaylor at openjdk.org Thu Dec 4 21:40:11 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Thu, 4 Dec 2025 21:40:11 GMT Subject: Integrated: 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots In-Reply-To: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> References: <0gQ6rQRdUyqFv7h48VYct_R6TSHQsauMiPpJeUEsc8E=.fb99f821-9249-49aa-a9c4-c257050c2208@github.com> Message-ID: On Wed, 3 Dec 2025 21:33:50 GMT, Ben Taylor wrote: > The call to arm is redundant, and can be replaced with an assert to ensure the precondition remains true. > > The same set of tier1 tests pass before and after this change with a fastdebug and Shenandoah GC. This pull request has now been integrated. Changeset: 5ec5a6ea Author: Ben Taylor Committer: William Kemper URL: https://git.openjdk.org/jdk/commit/5ec5a6ea6c8e887b4e21f81e382f57129bffbab8 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod 8373054: Shenandoah: Remove unnecessary BarrierSetNMethod::arm in shenandoahCodeRoots Reviewed-by: wkemper, ysr, shade ------------- PR: https://git.openjdk.org/jdk/pull/28648 From duke at openjdk.org Thu Dec 4 21:50:52 2025 From: duke at openjdk.org (duke) Date: Thu, 4 Dec 2025 21:50:52 GMT Subject: RFR: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered [v3] In-Reply-To: <7uBhjxIgI5nWimIcak1Id641QwQNKYhvXzo9_EQvFx8=.fa06f712-0cae-47bc-90d9-c650ae7ad86c@github.com> References: <7uBhjxIgI5nWimIcak1Id641QwQNKYhvXzo9_EQvFx8=.fa06f712-0cae-47bc-90d9-c650ae7ad86c@github.com> Message-ID: On Thu, 4 Dec 2025 16:06:13 GMT, Ben Taylor wrote: >> The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. >> >> A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 >> >> This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build > > Ben Taylor has updated the pull request incrementally with one additional commit since the last revision: > > Update another outdated comment @benty-amzn Your change (at version 830c83480540d57b147fe26f6ea6742b4788c5e2) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28642#issuecomment-3614461395 From btaylor at openjdk.org Thu Dec 4 22:15:07 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Thu, 4 Dec 2025 22:15:07 GMT Subject: Integrated: 8373039: Remove Incorrect Asserts in shenandoahScanRemembered In-Reply-To: References: Message-ID: <_arO7HyaZNuiCeydQ9IKZuIZfi1z6sNH--IIbk49Mcc=.68fa2fa2-e0dc-40bd-934f-49ae0b4b39ec@github.com> On Wed, 3 Dec 2025 17:16:02 GMT, Ben Taylor wrote: > The `Klass->is_valid` asserts in this file do not hold the required `ClassLoaderDataGraph_lock` and can cause a crash. > > A similar issue was seen in https://bugs.openjdk.org/browse/JDK-8372566 > > This change passes all tests in `TEST=hotspot_gc_shenandoah` with a fastdebug build This pull request has now been integrated. Changeset: c8b30da7 Author: Ben Taylor Committer: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/jdk/commit/c8b30da7ef48edb3d43e07d2c1b8622d8123c3a9 Stats: 13 lines in 1 file changed: 0 ins; 10 del; 3 mod 8373039: Remove Incorrect Asserts in shenandoahScanRemembered Reviewed-by: wkemper, ysr, xpeng ------------- PR: https://git.openjdk.org/jdk/pull/28642 From kdnilsen at openjdk.org Thu Dec 4 22:23:41 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 Dec 2025 22:23:41 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification In-Reply-To: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Thu, 4 Dec 2025 20:35:42 GMT, William Kemper wrote: > In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. Thanks. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28665#pullrequestreview-3542353932 From kdnilsen at openjdk.org Thu Dec 4 22:31:14 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 Dec 2025 22:31:14 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v3] In-Reply-To: References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: <9NmDxH8twYZVjup5VIUUI3Aw-dfOINY8eyPTNnFEZNA=.af38ae63-c607-46f4-979f-632eb7a44817@github.com> On Thu, 4 Dec 2025 01:23:55 GMT, Xiaolong Peng wrote: >> Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. >> >> In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: >> * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) >> >> Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Remove outdated comments Thanks. NIce improvements to code. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28649#pullrequestreview-3542376515 From xpeng at openjdk.org Thu Dec 4 23:59:14 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Dec 2025 23:59:14 GMT Subject: RFR: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() [v3] In-Reply-To: References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: On Thu, 4 Dec 2025 01:23:55 GMT, Xiaolong Peng wrote: >> Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. >> >> In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: >> * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) >> >> Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Remove outdated comments Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28649#issuecomment-3614774632 From xpeng at openjdk.org Thu Dec 4 23:59:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 4 Dec 2025 23:59:16 GMT Subject: Integrated: 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() In-Reply-To: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> References: <7nPmLquaKl_2EEk2JfsH1ForsyITXxmPAe8UxbazO9E=.e617c2c9-4a9f-44d5-ac08-33903e6deab9@github.com> Message-ID: On Wed, 3 Dec 2025 23:02:17 GMT, Xiaolong Peng wrote: > Follow up on the feedback/comments on PR https://github.com/openjdk/jdk/pull/28521 for bug [JDK-8372566](https://bugs.openjdk.org/browse/JDK-8372566), we should avoid using ShenandoahAllocRequest.type() directly if possible, in most of cases we don't need to directly use alloc type, the inline member methods provided by ShenandoahAllocRequest should be sufficient. > > In the PR, I have removed most of the places where ShenandoahAllocRequest.type() directly used, there will be only one place left after the change: > * ShenandoahFreeSet::allocate (This one will be removed with PR https://github.com/openjdk/jdk/pull/26171) > > Also did small code rearrangement for ShenandoahOldGeneration::configure_plab_for_current_thread > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA This pull request has now been integrated. Changeset: 15f25389 Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/15f25389435288881644f7aeab48fd2eae410999 Stats: 80 lines in 6 files changed: 12 ins; 21 del; 47 mod 8373056: Shenandoah: Remove unnecessary use of ShenandoahAllocRequest.type() Reviewed-by: wkemper, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/28649 From xpeng at openjdk.org Fri Dec 5 00:27:20 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 00:27:20 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region Message-ID: Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 It is caused by the behavior change from follow code: Original: if (ShenandoahSATBBarrier) { T* array = dst; HeapWord* array_addr = reinterpret_cast(array); ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); if (is_old_marking) { // Generational, old marking assert(_heap->mode()->is_generational(), "Invariant"); if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { arraycopy_work(array, count); } } else if (_heap->mode()->is_generational()) { // Generational, young marking if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { arraycopy_work(array, count); } } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { // Non-generational, marking arraycopy_work(array, count); } } New: if (ShenandoahSATBBarrier) { if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { arraycopy_work(dst, count); } } With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. ### Test - [x] hotspot_gc_shenandoah - [ ] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix - [ ] GHA ------------- Commit messages: - Reorder the code - Assert only when the obj been pointed to is in young - Add assert to check card table to sure card table is correct - Merge branch 'openjdk:master' into JDK-8372498 - arraycopy_work should be done unconditionally if the array is in an old region Changes: https://git.openjdk.org/jdk/pull/28669/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373116 Stats: 16 lines in 1 file changed: 15 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From xpeng at openjdk.org Fri Dec 5 02:03:34 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 02:03:34 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v2] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. > > ### Test > - [x] hotspot_gc_shenandoah > - [ ] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [ ] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: uncomment the new added assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/44938dc7..5b951e6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=00-01 Stats: 16 lines in 1 file changed: 2 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From duke at openjdk.org Fri Dec 5 05:41:03 2025 From: duke at openjdk.org (Harshit470250) Date: Fri, 5 Dec 2025 05:41:03 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 08:37:02 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: > > - add guard to the include > - add load_reference_barrier_Type > - add clone_barrier_Type > - add write_barrier_pre_Type > - revert shenandoah changes I have reproduced it [here](https://github.com/Harshit470250/jdk/pull/2). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3615381956 From xpeng at openjdk.org Fri Dec 5 07:24:35 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 07:24:35 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v3] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Remove the asset code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/5b951e6d..85acca0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=01-02 Stats: 16 lines in 1 file changed: 0 ins; 16 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From syan at openjdk.org Fri Dec 5 07:26:56 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 5 Dec 2025 07:26:56 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v2] In-Reply-To: References: Message-ID: <3ib3dmwah2lDF7aNTiGY-4lav-SXfYPXAdu2AKSvtGM=.b5cdd6b8-3b40-41bc-8ce4-46e934928756@github.com> On Fri, 5 Dec 2025 02:03:34 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > uncomment the new added assert After apply the proposed patch, the jvm crash do not observed by run the test 1000 times. But there is one "java.lang.OutOfMemoryError: Java heap space" test fails observed. [848.log](https://github.com/user-attachments/files/23955591/848.log) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3615615694 From xpeng at openjdk.org Fri Dec 5 07:42:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 07:42:56 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v2] In-Reply-To: <3ib3dmwah2lDF7aNTiGY-4lav-SXfYPXAdu2AKSvtGM=.b5cdd6b8-3b40-41bc-8ce4-46e934928756@github.com> References: <3ib3dmwah2lDF7aNTiGY-4lav-SXfYPXAdu2AKSvtGM=.b5cdd6b8-3b40-41bc-8ce4-46e934928756@github.com> Message-ID: On Fri, 5 Dec 2025 07:24:13 GMT, SendaoYan wrote: > After apply the proposed patch, the jvm crash do not observed by run the test 1000 times. But there is one "java.lang.OutOfMemoryError: Java heap space" test fails observed. > > [848.log](https://github.com/user-attachments/files/23955591/848.log) Thank you Sendao for the test and verification! The OOM should be a unrelated issue, I'm not if there is open bug for it or not, please create a new one if there isn't. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3615655399 From xpeng at openjdk.org Fri Dec 5 07:42:57 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 07:42:57 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v2] In-Reply-To: References: <3ib3dmwah2lDF7aNTiGY-4lav-SXfYPXAdu2AKSvtGM=.b5cdd6b8-3b40-41bc-8ce4-46e934928756@github.com> Message-ID: On Fri, 5 Dec 2025 07:39:53 GMT, Xiaolong Peng wrote: > > After apply the proposed patch, the jvm crash do not observed by run the test 1000 times. But there is one "java.lang.OutOfMemoryError: Java heap space" test fails observed. > > [848.log](https://github.com/user-attachments/files/23955591/848.log) > > Thank you Sendao for the test and verification! > > The OOM should be a unrelated issue, I'm not if there is open bug for it or not, please create a new one if there isn't. Found it, https://bugs.openjdk.org/browse/JDK-8298781 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3615657332 From roland at openjdk.org Fri Dec 5 13:52:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 13:52:12 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v9] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/castnode.hpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/93b8b0c5..cab44429 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=07-08 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From roland at openjdk.org Fri Dec 5 14:05:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:06 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/cab44429..4a877c43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=08-09 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From roland at openjdk.org Fri Dec 5 14:05:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:09 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <5DHx3WmMb1UtSeyiEiYCiisVgRFggPFfxBggpgtuD6M=.d72a9c07-9624-47ea-9398-a0d1dee69755@github.com> On Tue, 2 Dec 2025 17:32:09 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into JDK-8354282 >> - whitespace >> - review >> - review >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java >> >> Co-authored-by: Christian Hagedorn >> - review >> - review >> - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 > > src/hotspot/share/opto/castnode.hpp line 105: > >> 103: // All the possible combinations of floating/narrowing with example use cases: >> 104: >> 105: // Use case example: Range Check CastII > > I believe this is incorrect, a range check should be floating non-narrowing. It is only narrowing if the length of the array is a constant. It is because this cast encodes the dependency on the condition `index u< length`. This condition cannot be expressed in terms of `Type` unless `length` is a constant. Range check `CastII` were added to protect the `ConvI2L` in the address expression on 64 bits. The problem there was, in some cases, that the `ConvI2L` would float above the range check (because `ConvI2L` has no control input) and could end up with an out of range input (which in turn would cause the `ConvI2L` to become `top` in places where it wasn't expected). So `CastII` doesn't carry the control dependency of an array access on its range check. That dependency is carried by the `MemNode` which has its control input set to the range check. What you're saying, if I understand it correctly, would be true if the `CastII` was required to prevent an array `Load` from floating. But that's not the case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2592801401 From roland at openjdk.org Fri Dec 5 14:05:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:10 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 17:41:37 GMT, Quan Anh Mai wrote: >> Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. > > Naming is hard, but it is worth pointing out in the comment that floating here refers to `depends_only_on_test`. In other words, a cast is considered floating if it is legal to change the control input of a cast from an `IfTrue` or `IfFalse` to an `IfTrue` and `IfFalse` that dominates the current control input, and the corresponding conditions of the `If`s are the same. In contrast, we cannot do that for a pinned cast, and if the control is folded away, the control input of the pinned cast is changed to the control predecessor of the folded node. > > It is also worth noting that we have `Node::pinned` which means the node is pinned AT the control input while pinned here means that it is pinned UNDER the control input. Very confusing! I added a mention of `depends_only_on_test`. Is that good enough? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2592784214 From roland at openjdk.org Fri Dec 5 14:52:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:52:51 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: References: <2oDqUvcW_3hJRPRri4uttpkgfeCovL4ZZkcI0R1bB1A=.173b3a58-d0f1-4b29-94d1-77b0a350c790@github.com> <2wAnS7drj_r3dqsy5CEF9vBG40KizHsQDOxMeNymwhw=.9bc29879-eead-401c-b750-814592feff63@github.com> <-1wiWF_UEvCO6xPuYvIsElBzPPQDejGahm9Xd5YszPU=.cfb41cb1-f681-4e75-8c29-2d928468f53b@github.com> Message-ID: <42lOFbyCuQt4xj-pK-ME6ScceXqTnGOY0HrWnJMK56k=.87b29936-511f-4ba4-a429-e8b9faed83a2@github.com> On Sun, 30 Nov 2025 08:03:32 GMT, Zihao Lin wrote: >> I had a closer look and I think you ran into an inconsistency. Let me see if I can get it fixed as a separate change. > > Sure, it's better to separate to another change. I am not familiar this part, please pin me if you have better solution. Thanks! I filed https://bugs.openjdk.org/browse/JDK-8373143 for this but I keep finding new issues. So this one will take some time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2592955645 From xpeng at openjdk.org Fri Dec 5 16:25:22 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 16:25:22 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v4] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add asserts back, the elem_ptr must be dirty either in read or write table ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/85acca0c..53316bd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=02-03 Stats: 18 lines in 1 file changed: 18 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From pwinchester at palantir.com Fri Dec 5 16:32:36 2025 From: pwinchester at palantir.com (Parker Winchester) Date: Fri, 5 Dec 2025 16:32:36 +0000 Subject: Reference leak in old gen in Generational Shenandoah Message-ID: We just upgraded to JDK25 and are trying out Generational Shenandoah, coming from ZGC. We noticed native memory (in the "other" category) due to direct byte buffers steadily increasing and not getting freed - despite these DirectByteBuffer objects becoming unreachable and the GC clearly running frequently. One service of ours hit 2GB of native memory used after 24 hours, ultimately causing our service to be OOMKilled. Triggering GC's manually by taking a (live) heap histogram clears the native memory, so this seems to be a failure of the GC to find and clean up certain objects, rather than a true "leak." We tracked this down to issues with Undertow's DefaultByteBufferPool, which uses Finalizers and WeakHashMaps - these both use types of references (eg WeakReferences) that need at least one additional GC cycle to be removed by the GC. I plan to submit a change to Undertow's code to reduce its reliance on these, but it's possible this issue impacts other code, so I produced a minimal repro of it that doesn't use native memory. I believe the issue is a Reference in the old generation will sometimes fail to be discovered by the GC. A reference in the old gen will not be encountered by any young gen collections. And when it gets encountered in the old gen, should_discover() is returning false, so there's no way for it to ever be enqueued. I think this is due to the references being wrongly considered strongly live: [23.999s][trace][gc,ref ] GC(259) Encountered Reference: 0x000000030000b6e8 (Weak, OLD) [23.999s][trace][gc,ref ] GC(259) Reference strongly live: 0x000000030000b6e8 My minimal repro uses weak references, but I also noticed the issue with phantom references due to DirectByteBuffer Summary of my repro Each iteration it: * Allocates a simple object (MyLeakedObject - only necessary so it has a class name in the heap histogram) as well as a WeakReference to it. * It stores the WeakReference in a static list (this part appears to be necessary to the repro) * It then allocates a lot of garbage (80GB in a 8GB heap size) to force the object and the WeakReference to be promoted to the old gen * It then iterates over the static list and removes any WeakReferences with null referents * It then takes a heap histogram (not live, so we don't trigger GC), and prints the counts of MyLeakedObject and WeakReference * The loop then continues, allowing the object and its WeakReference to go out of scope. * Every 20 iterations it runs several system.gc() calls to prove that the counts return to 0 (system.gc() triggers a "global" GC which is different than an old gen GC). The count will go up each iteration until the system.gc(): Iteration 1: MyLeakedObject=1, WeakReference=5, WeakRefs with live referent=1 Iteration 2: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Iteration 3: MyLeakedObject=3, WeakReference=7, WeakRefs with live referent=3 Iteration 4: MyLeakedObject=4, WeakReference=8, WeakRefs with live referent=4 Iteration 5: MyLeakedObject=5, WeakReference=9, WeakRefs with live referent=5 Iteration 6: MyLeakedObject=6, WeakReference=10, WeakRefs with live referent=6 Iteration 7: MyLeakedObject=7, WeakReference=11, WeakRefs with live referent=7 Iteration 8: MyLeakedObject=8, WeakReference=12, WeakRefs with live referent=8 Iteration 9: MyLeakedObject=9, WeakReference=13, WeakRefs with live referent=9 Iteration 10: MyLeakedObject=10, WeakReference=14, WeakRefs with live referent=10 Iteration 11: MyLeakedObject=11, WeakReference=15, WeakRefs with live referent=11 Iteration 12: MyLeakedObject=12, WeakReference=16, WeakRefs with live referent=12 Iteration 13: MyLeakedObject=13, WeakReference=17, WeakRefs with live referent=13 Iteration 14: MyLeakedObject=14, WeakReference=18, WeakRefs with live referent=14 Iteration 15: MyLeakedObject=15, WeakReference=19, WeakRefs with live referent=15 Iteration 16: MyLeakedObject=16, WeakReference=20, WeakRefs with live referent=16 Iteration 17: MyLeakedObject=17, WeakReference=21, WeakRefs with live referent=17 Iteration 18: MyLeakedObject=18, WeakReference=22, WeakRefs with live referent=18 Iteration 19: MyLeakedObject=19, WeakReference=23, WeakRefs with live referent=19 Iteration 20: MyLeakedObject=20, WeakReference=24, WeakRefs with live referent=20 Forcing GCs... Iteration 21: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Expected behavior: Each iteration should see only 1 at most 2 of MyLeakedObject, since they are no longer in scope and sufficient GC activity (young + old gen GCs) has occurred Actual behavior: Each iteration adds an additional MyLeakedObject and its WeakReference, leading to a leak I have only tested with Corretto on Ubuntu & OSX openjdk 25.0.1 2025-10-21 LTS OpenJDK Runtime Environment Corretto-25.0.1.8.1 (build 25.0.1+8-LTS) OpenJDK 64-Bit Server VM Corretto-25.0.1.8.1 (build 25.0.1+8-LTS, mixed mode, sharing) I've tried with non-generational shenandoah (mode=satb) and the issue does not occur. It also does not occur for ZGC or G1. I had a version of the repro that used DirectByteBuffers which yielded these results, strictly looking at reference processing in old gen GCs (running with -Xlog:gc*=info,gc+ref=trace) Iteration 1: Native Memory = 1 KB [20.423s][info ][gc,ref ] GC(46) Encountered references: Soft: 66, Weak: 183, Final: 0, Phantom: 3 [20.423s][info ][gc,ref ] GC(46) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [20.423s][info ][gc,ref ] GC(46) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 2: Native Memory = 2 KB [30.687s][info ][gc,ref ] GC(52) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 4 [30.688s][info ][gc,ref ] GC(52) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [30.688s][info ][gc,ref ] GC(52) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 3: Native Memory = 3 KB [54.496s][info ][gc,ref ] GC(70) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 5 [54.496s][info ][gc,ref ] GC(70) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 1 [54.496s][info ][gc,ref ] GC(70) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 4: Native Memory = 4 KB [93.706s][info ][gc,ref ] GC(91) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 6 [93.706s][info ][gc,ref ] GC(91) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [93.706s][info ][gc,ref ] GC(91) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 It's a little easier to see with DirectByteBuffer's Phantom references (there are 100+ unrelated WeakReferences, I believe these are used internally). Each iteration it adds another Phantom reference which is encountered, but fails to be discovered (due to being considered strongly live) Run the repro with: java -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 -XX:ShenandoahGuaranteedOldGCInterval=1000 -XX:+AlwaysPreTouch -Xmx8g -Xms8g GenShenWeakRefLeakRepro These flags help prove that the references are guaranteed to be encountered during each old gen GC cycle (otherwise they might be skipped over if the region has very little garbage) -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 This flag guarantees that references in old gen regions get processed every 1 second (each iteration takes about 2 seconds on my M1 macbook) -XX:ShenandoahGuaranteedOldGCInterval=1000 Note I played around with the heap size and the allocation rate and found 8GB heap & 80GB allocated to be the most reliable way to reproduce the issue. Source code for GenShenWeakRefLeakRepro.java import java.io.BufferedReader; import java.io.InputStreamReader; import java.lang.ref.WeakReference; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.List; /** * Tests if WeakReferences with old-gen referents leak in Generational Shenandoah. */ public class GenShenWeakRefLeakRepro { // Keep WeakReferences alive in a static list (will be in old gen) private static final List> WEAK_REFS = new ArrayList<>(); private static final long[] COUNTS = new long[2]; static class MyLeakedObject { private final int value; MyLeakedObject(int value) { this.value = value; } } public static void main(String[] args) throws Exception { //allocate garbage to promote WEAK_REFS to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } for (int iteration = 0; iteration < 100; iteration++) { // Create object and weak reference MyLeakedObject obj = new MyLeakedObject(iteration); WeakReference wr = new WeakReference<>(obj); // Store in static list (so WeakRef survives and gets promoted) WEAK_REFS.add(wr); // Allocate garbage to promote both WeakRef and referent to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } // Remove cleared WeakRefs (referent was collected) WEAK_REFS.removeIf(w -> w.get() == null); // Count objects getObjectCounts(); // What remains are WeakRefs with live referents long aliveCount = WEAK_REFS.size(); System.out.println("Iteration " + (iteration + 1) + ": MyLeakedObject=" + COUNTS[0] + ", WeakReference=" + COUNTS[1] + ", WeakRefs with live referent=" + aliveCount); // Periodically force GCs if ((iteration + 1) % 20 == 0) { System.out.println("Forcing GCs..."); for (int i = 0; i < 4; i++) { System.gc(); Thread.sleep(3000); } getObjectCounts(); System.out.println("After GC: MyLeakedObject=" + COUNTS[0] + ", WeakRefs with live referent=" + aliveCount); } } } private static void getObjectCounts() { COUNTS[0] = 0; COUNTS[1] = 0; try { Process p = new ProcessBuilder( "jcmd", String.valueOf(ProcessHandle.current().pid()), "GC.class_histogram", "-all") .start(); try (BufferedReader r = new BufferedReader( new InputStreamReader(p.getInputStream(), StandardCharsets.UTF_8))) { String line; while ((line = r.readLine()) != null) { String[] parts = line.trim().split("\\s+"); if (parts.length >= 4) { if (line.contains("GenShenWeakRefLeakRepro$MyLeakedObject")) { COUNTS[0] = Long.parseLong(parts[1]); } else if (line.contains("java.lang.ref.WeakReference ")) { COUNTS[1] = Long.parseLong(parts[1]); } } } } } catch (Exception e) { System.err.println("Histogram failed: " + e.getMessage()); } } } Thanks, Parker Winchester -------------- next part -------------- An HTML attachment was scrubbed... URL: From eastigeevich at openjdk.org Fri Dec 5 17:52:20 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 5 Dec 2025 17:52:20 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v14] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Benchmarking results: Neoverse-N1 r3p1 (Graviton 2) > > - Baseline > > $ taskset -c 0-3 java -Xbootclasspath/a:./wb.jar -XX:+UnlockDiagnosticVMOptions -XX:-NeoverseN1Errata1542419 -XX:+UseZGC -XX:ZYoungGCThreads=1 -XX:ZOldGC... Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Implement nested ICacheInvalidationContext ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/4b04496f..b9380fd8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=12-13 Stats: 402 lines in 27 files changed: 162 ins; 167 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From xpeng at openjdk.org Fri Dec 5 18:19:39 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 18:19:39 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add include header shenandoahOldGeneration.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/53316bd3..49ea3c93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From kemperw at amazon.com Fri Dec 5 18:23:01 2025 From: kemperw at amazon.com (Kemper, William) Date: Fri, 5 Dec 2025 18:23:01 +0000 Subject: Reference leak in old gen in Generational Shenandoah In-Reply-To: References: Message-ID: <526637d32a674ba3b83d024abaf25d29@amazon.com> Hi Parker - thank you for reporting this and writing a reproducer. I'll take a look and keep you apprised. ________________________________ From: shenandoah-dev on behalf of Parker Winchester Sent: Friday, December 5, 2025 8:32:36 AM To: shenandoah-dev at openjdk.org Subject: [EXTERNAL] Reference leak in old gen in Generational Shenandoah CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. We just upgraded to JDK25 and are trying out Generational Shenandoah, coming from ZGC. We noticed native memory (in the "other" category) due to direct byte buffers steadily increasing and not getting freed - despite these DirectByteBuffer objects becoming unreachable and the GC clearly running frequently. One service of ours hit 2GB of native memory used after 24 hours, ultimately causing our service to be OOMKilled. Triggering GC's manually by taking a (live) heap histogram clears the native memory, so this seems to be a failure of the GC to find and clean up certain objects, rather than a true "leak." We tracked this down to issues with Undertow's DefaultByteBufferPool, which uses Finalizers and WeakHashMaps - these both use types of references (eg WeakReferences) that need at least one additional GC cycle to be removed by the GC. I plan to submit a change to Undertow's code to reduce its reliance on these, but it's possible this issue impacts other code, so I produced a minimal repro of it that doesn't use native memory. I believe the issue is a Reference in the old generation will sometimes fail to be discovered by the GC. A reference in the old gen will not be encountered by any young gen collections. And when it gets encountered in the old gen, should_discover() is returning false, so there's no way for it to ever be enqueued. I think this is due to the references being wrongly considered strongly live: [23.999s][trace][gc,ref ] GC(259) Encountered Reference: 0x000000030000b6e8 (Weak, OLD) [23.999s][trace][gc,ref ] GC(259) Reference strongly live: 0x000000030000b6e8 My minimal repro uses weak references, but I also noticed the issue with phantom references due to DirectByteBuffer Summary of my repro Each iteration it: * Allocates a simple object (MyLeakedObject - only necessary so it has a class name in the heap histogram) as well as a WeakReference to it. * It stores the WeakReference in a static list (this part appears to be necessary to the repro) * It then allocates a lot of garbage (80GB in a 8GB heap size) to force the object and the WeakReference to be promoted to the old gen * It then iterates over the static list and removes any WeakReferences with null referents * It then takes a heap histogram (not live, so we don't trigger GC), and prints the counts of MyLeakedObject and WeakReference * The loop then continues, allowing the object and its WeakReference to go out of scope. * Every 20 iterations it runs several system.gc() calls to prove that the counts return to 0 (system.gc() triggers a "global" GC which is different than an old gen GC). The count will go up each iteration until the system.gc(): Iteration 1: MyLeakedObject=1, WeakReference=5, WeakRefs with live referent=1 Iteration 2: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Iteration 3: MyLeakedObject=3, WeakReference=7, WeakRefs with live referent=3 Iteration 4: MyLeakedObject=4, WeakReference=8, WeakRefs with live referent=4 Iteration 5: MyLeakedObject=5, WeakReference=9, WeakRefs with live referent=5 Iteration 6: MyLeakedObject=6, WeakReference=10, WeakRefs with live referent=6 Iteration 7: MyLeakedObject=7, WeakReference=11, WeakRefs with live referent=7 Iteration 8: MyLeakedObject=8, WeakReference=12, WeakRefs with live referent=8 Iteration 9: MyLeakedObject=9, WeakReference=13, WeakRefs with live referent=9 Iteration 10: MyLeakedObject=10, WeakReference=14, WeakRefs with live referent=10 Iteration 11: MyLeakedObject=11, WeakReference=15, WeakRefs with live referent=11 Iteration 12: MyLeakedObject=12, WeakReference=16, WeakRefs with live referent=12 Iteration 13: MyLeakedObject=13, WeakReference=17, WeakRefs with live referent=13 Iteration 14: MyLeakedObject=14, WeakReference=18, WeakRefs with live referent=14 Iteration 15: MyLeakedObject=15, WeakReference=19, WeakRefs with live referent=15 Iteration 16: MyLeakedObject=16, WeakReference=20, WeakRefs with live referent=16 Iteration 17: MyLeakedObject=17, WeakReference=21, WeakRefs with live referent=17 Iteration 18: MyLeakedObject=18, WeakReference=22, WeakRefs with live referent=18 Iteration 19: MyLeakedObject=19, WeakReference=23, WeakRefs with live referent=19 Iteration 20: MyLeakedObject=20, WeakReference=24, WeakRefs with live referent=20 Forcing GCs... Iteration 21: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Expected behavior: Each iteration should see only 1 at most 2 of MyLeakedObject, since they are no longer in scope and sufficient GC activity (young + old gen GCs) has occurred Actual behavior: Each iteration adds an additional MyLeakedObject and its WeakReference, leading to a leak I have only tested with Corretto on Ubuntu & OSX openjdk 25.0.1 2025-10-21 LTS OpenJDK Runtime Environment Corretto-25.0.1.8.1 (build 25.0.1+8-LTS) OpenJDK 64-Bit Server VM Corretto-25.0.1.8.1 (build 25.0.1+8-LTS, mixed mode, sharing) I've tried with non-generational shenandoah (mode=satb) and the issue does not occur. It also does not occur for ZGC or G1. I had a version of the repro that used DirectByteBuffers which yielded these results, strictly looking at reference processing in old gen GCs (running with -Xlog:gc*=info,gc+ref=trace) Iteration 1: Native Memory = 1 KB [20.423s][info ][gc,ref ] GC(46) Encountered references: Soft: 66, Weak: 183, Final: 0, Phantom: 3 [20.423s][info ][gc,ref ] GC(46) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [20.423s][info ][gc,ref ] GC(46) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 2: Native Memory = 2 KB [30.687s][info ][gc,ref ] GC(52) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 4 [30.688s][info ][gc,ref ] GC(52) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [30.688s][info ][gc,ref ] GC(52) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 3: Native Memory = 3 KB [54.496s][info ][gc,ref ] GC(70) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 5 [54.496s][info ][gc,ref ] GC(70) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 1 [54.496s][info ][gc,ref ] GC(70) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 4: Native Memory = 4 KB [93.706s][info ][gc,ref ] GC(91) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 6 [93.706s][info ][gc,ref ] GC(91) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [93.706s][info ][gc,ref ] GC(91) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 It's a little easier to see with DirectByteBuffer's Phantom references (there are 100+ unrelated WeakReferences, I believe these are used internally). Each iteration it adds another Phantom reference which is encountered, but fails to be discovered (due to being considered strongly live) Run the repro with: java -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 -XX:ShenandoahGuaranteedOldGCInterval=1000 -XX:+AlwaysPreTouch -Xmx8g -Xms8g GenShenWeakRefLeakRepro These flags help prove that the references are guaranteed to be encountered during each old gen GC cycle (otherwise they might be skipped over if the region has very little garbage) -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 This flag guarantees that references in old gen regions get processed every 1 second (each iteration takes about 2 seconds on my M1 macbook) -XX:ShenandoahGuaranteedOldGCInterval=1000 Note I played around with the heap size and the allocation rate and found 8GB heap & 80GB allocated to be the most reliable way to reproduce the issue. Source code for GenShenWeakRefLeakRepro.java import java.io.BufferedReader; import java.io.InputStreamReader; import java.lang.ref.WeakReference; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.List; /** * Tests if WeakReferences with old-gen referents leak in Generational Shenandoah. */ public class GenShenWeakRefLeakRepro { // Keep WeakReferences alive in a static list (will be in old gen) private static final List> WEAK_REFS = new ArrayList<>(); private static final long[] COUNTS = new long[2]; static class MyLeakedObject { private final int value; MyLeakedObject(int value) { this.value = value; } } public static void main(String[] args) throws Exception { //allocate garbage to promote WEAK_REFS to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } for (int iteration = 0; iteration < 100; iteration++) { // Create object and weak reference MyLeakedObject obj = new MyLeakedObject(iteration); WeakReference wr = new WeakReference<>(obj); // Store in static list (so WeakRef survives and gets promoted) WEAK_REFS.add(wr); // Allocate garbage to promote both WeakRef and referent to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } // Remove cleared WeakRefs (referent was collected) WEAK_REFS.removeIf(w -> w.get() == null); // Count objects getObjectCounts(); // What remains are WeakRefs with live referents long aliveCount = WEAK_REFS.size(); System.out.println("Iteration " + (iteration + 1) + ": MyLeakedObject=" + COUNTS[0] + ", WeakReference=" + COUNTS[1] + ", WeakRefs with live referent=" + aliveCount); // Periodically force GCs if ((iteration + 1) % 20 == 0) { System.out.println("Forcing GCs..."); for (int i = 0; i < 4; i++) { System.gc(); Thread.sleep(3000); } getObjectCounts(); System.out.println("After GC: MyLeakedObject=" + COUNTS[0] + ", WeakRefs with live referent=" + aliveCount); } } } private static void getObjectCounts() { COUNTS[0] = 0; COUNTS[1] = 0; try { Process p = new ProcessBuilder( "jcmd", String.valueOf(ProcessHandle.current().pid()), "GC.class_histogram", "-all") .start(); try (BufferedReader r = new BufferedReader( new InputStreamReader(p.getInputStream(), StandardCharsets.UTF_8))) { String line; while ((line = r.readLine()) != null) { String[] parts = line.trim().split("\\s+"); if (parts.length >= 4) { if (line.contains("GenShenWeakRefLeakRepro$MyLeakedObject")) { COUNTS[0] = Long.parseLong(parts[1]); } else if (line.contains("java.lang.ref.WeakReference ")) { COUNTS[1] = Long.parseLong(parts[1]); } } } } } catch (Exception e) { System.err.println("Histogram failed: " + e.getMessage()); } } } Thanks, Parker Winchester -------------- next part -------------- An HTML attachment was scrubbed... URL: From btaylor at openjdk.org Fri Dec 5 18:50:41 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Fri, 5 Dec 2025 18:50:41 GMT Subject: RFR: 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics Message-ID: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> The `STATIC_ASSERT` below this typedef appears to be out of date. The barriers check thread local copy of gc state, which is stored in `ShenandoahThreadLocalData::_gc_state` and is type `char`, so the size requirement described by the assert is maintained even after this change. Change passes all tier1 tests locally when run with Shenandoah GC. ------------- Commit messages: - 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics Changes: https://git.openjdk.org/jdk/pull/28681/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28681&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352914 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28681.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28681/head:pull/28681 PR: https://git.openjdk.org/jdk/pull/28681 From wkemper at openjdk.org Fri Dec 5 18:53:37 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 Dec 2025 18:53:37 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: > In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Set requested gc cause under a lock when allocation fails ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28665/files - new: https://git.openjdk.org/jdk/pull/28665/files/89af1701..1081f21e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28665&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28665&range=00-01 Stats: 27 lines in 2 files changed: 2 ins; 8 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/28665.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28665/head:pull/28665 PR: https://git.openjdk.org/jdk/pull/28665 From wkemper at openjdk.org Fri Dec 5 18:53:37 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 Dec 2025 18:53:37 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Fri, 5 Dec 2025 18:50:08 GMT, William Kemper wrote: >> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Set requested gc cause under a lock when allocation fails src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 145: > 143: // Notifies the control thread, but does not update the requested cause or generation. > 144: // The overloaded variant should be used when the _control_lock is already held. > 145: void notify_cancellation(GCCause::Cause cause); These methods were the root cause here. `ShenandoahHeap::_canceled_gc` is read/written atomically, but `ShenandoahGenerationalControlThread::_requested_gc_cause` is read/written under a lock. These `notify_cancellation` methods did _not_ update `_requested_gc_cause` at all. So, in the failure I observed we had: 1. Control thread finishes cycle and sees no cancellation is requested (no lock used). 2. Mutator thread fails allocation, cancels GC (again, no lock used), and does _not_ change `_requested_gc_cause`. 3. Control thread takes `_control_lock` and checks `_requested_gc_cause` and sees `_no_gc` (because `notify_cancellation` didn't change it) and `waits` forever now. The fix here is to replace `notify_cancellation` with `notify_control_thread` which serializes updates to `_requested_gc_cause` under `_control_lock`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28665#discussion_r2593632599 From wkemper at openjdk.org Fri Dec 5 19:04:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 Dec 2025 19:04:56 GMT Subject: RFR: 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics In-Reply-To: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> References: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> Message-ID: On Fri, 5 Dec 2025 18:44:08 GMT, Ben Taylor wrote: > The `STATIC_ASSERT` below this typedef appears to be out of date. > > The barriers check thread local copy of gc state, which is stored in `ShenandoahThreadLocalData::_gc_state` and is type `char`, so the size requirement described by the assert is maintained even after this change. > > Change passes all tier1 tests locally when run with Shenandoah GC. Marked as reviewed by wkemper (Reviewer). This looks good to me, but would appreciate another reviewer. ------------- PR Review: https://git.openjdk.org/jdk/pull/28681#pullrequestreview-3546051530 PR Comment: https://git.openjdk.org/jdk/pull/28681#issuecomment-3618172121 From kdnilsen at openjdk.org Fri Dec 5 19:21:56 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 5 Dec 2025 19:21:56 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Fri, 5 Dec 2025 18:53:37 GMT, William Kemper wrote: >> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Set requested gc cause under a lock when allocation fails Thanks for diligent testing and analysis. Subtle code here. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28665#pullrequestreview-3546110509 From kdnilsen at openjdk.org Fri Dec 5 19:36:56 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 5 Dec 2025 19:36:56 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 02:19:32 GMT, Rui Li wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused freeset includes Changes requested by kdnilsen (Committer). src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 940: > 938: > 939: size_t ShenandoahGeneration::soft_available_exclude_evac_reserve() const { > 940: size_t result = available(ShenandoahHeap::heap()->soft_max_capacity() * (100.0 - ShenandoahEvacReserve) / 100); I'm a little uncomfortable with this approach. It's mostly a question of how we name it. The evac reserve is not always this value. In particular, we may shrink the young evac reserves after we have selected the cset. Also of concern is that if someone invokes this function on old_generation(), it looks like they'll get a bogus (not meaningful) value. I think I'd be more comfortable with naming this to something like "mutator_available_when_gc_is_idle()". If we keep it virtual, then OldGeneration should override with "assert(false, "Not relevant to old generation") ------------- PR Review: https://git.openjdk.org/jdk/pull/28622#pullrequestreview-3546162874 PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2593766590 From wkemper at openjdk.org Fri Dec 5 20:02:57 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 Dec 2025 20:02:57 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: References: Message-ID: <0_7ZOhkCLi17a3aMtxAoV_6hfr9FzZPyto3uOeBqODw=.95f213af-ec8b-4ca3-82a0-c0c95e30ad6d@github.com> On Fri, 5 Dec 2025 18:19:39 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add include header shenandoahOldGeneration.hpp The issue, as I understand it, is that mutators are racing with the concurrent remembered set scan. If a mutator changes a pointer covered by a dirty card, it could prevent the remembered set scan from tracing the original object that was reachable at the beginning of marking. Since we may not be marking old, we cannot rely on the TAMS for objects in old regions and must unconditionally enqueue all of the overwritten pointers in the old array. Should we only do this when young marking is in progress? Perhaps we should have a version of `arraycopy_work` that only enqueues young pointers here? ------------- PR Review: https://git.openjdk.org/jdk/pull/28669#pullrequestreview-3546247628 From xpeng at openjdk.org Fri Dec 5 23:01:57 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 23:01:57 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: <0_7ZOhkCLi17a3aMtxAoV_6hfr9FzZPyto3uOeBqODw=.95f213af-ec8b-4ca3-82a0-c0c95e30ad6d@github.com> References: <0_7ZOhkCLi17a3aMtxAoV_6hfr9FzZPyto3uOeBqODw=.95f213af-ec8b-4ca3-82a0-c0c95e30ad6d@github.com> Message-ID: On Fri, 5 Dec 2025 20:00:04 GMT, William Kemper wrote: > The issue, as I understand it, is that mutators are racing with the concurrent remembered set scan. If a mutator changes a pointer covered by a dirty card, it could prevent the remembered set scan from tracing the original object that was reachable at the beginning of marking. Since we may not be marking old, we cannot rely on the TAMS for objects in old regions and must unconditionally enqueue all of the overwritten pointers in the old array. Should we only do this when young marking is in progress? Perhaps we should have a version of `arraycopy_work` that only enqueues young pointers here? I don't think it is related the any racing on remembered set, I got some GC logs from which I think we may know how it actually happens. [15.653s][info][gc,start ] GC(188) Pause Full ... [15.763s][info][gc ] GC(188) Pause Full 913M->707M(1024M) 109.213ms [15.767s][info][gc,ergo ] GC(189) Start GC cycle (Young) ... [15.802s][info][gc ] GC(189) Concurrent reset after collect (Young) 1.160ms [15.802s][info][gc,ergo ] GC(189) At end of Interrupted Concurrent Young GC: Young generation used: 874M, used regions: 874M, humongous waste: 7066K, soft capacity: 1024M, max capacity: 1022M, available: 99071K [15.802s][info][gc,ergo ] GC(189) At end of Interrupted Concurrent Young GC: Old generation used: 1273K, used regions: 1536K, humongous waste: 0B, soft capacity: 1024M, max capacity: 1536K, available: 262K [15.803s][info][gc,metaspace ] GC(189) Metaspace: 759K(960K)->759K(960K) NonClass: 721K(832K)->721K(832K) Class: 38K(128K)->38K(128K) [15.803s][info][gc ] Trigger (Young): Handle Allocation Failure [15.803s][info][gc,start ] GC(190) Pause Full [15.803s][info][gc,task ] GC(190) Using 8 of 8 workers for full gc [15.803s][info][gc,phases,start] GC(190) Phase 1: Mark live objects [15.806s][info][gc,ref ] GC(190) Clearing All SoftReferences References: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> Message-ID: On Fri, 5 Dec 2025 18:44:08 GMT, Ben Taylor wrote: > The `STATIC_ASSERT` below this typedef appears to be out of date. > > The barriers check thread local copy of gc state, which is stored in `ShenandoahThreadLocalData::_gc_state` and is type `char`, so the size requirement described by the assert is maintained even after this change. > > Change passes all tier1 tests locally when run with Shenandoah GC. Any comparative performance numbers? ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28681#pullrequestreview-3546716696 From wkemper at openjdk.org Fri Dec 5 23:23:00 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 Dec 2025 23:23:00 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 18:19:39 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add include header shenandoahOldGeneration.hpp At step 2, we have an element in the old array pointing to young, correct? Why is it not represented in the remembered set at the beginning of young mark? If it is because the old -> young pointer was created _after_ init mark, then the young pointer was either reachable when mark started, or it was created after mark started. Either way, the young pointer should have been found without this SATB modification. Unless, it was in the remembered set, but it didn't get scanned because a mutator modified it before it was scanned. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3618939846 From xpeng at openjdk.org Fri Dec 5 23:34:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 5 Dec 2025 23:34:56 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: References: Message-ID: <8yfYsnqygCX37e1fTQOGMs-MRDjVrgmDX-pp799MDfk=.35732036-5d3a-4cc6-ad4e-872e099b6ebf@github.com> On Fri, 5 Dec 2025 23:19:54 GMT, William Kemper wrote: > At step 2, we have an element in the old array pointing to young, correct? Why is it not represented in the remembered set at the beginning of young mark? If it is because the old -> young pointer was created _after_ init mark, then the young pointer was either reachable when mark started, or it was created after mark started. Either way, the young pointer should have been found without this SATB modification. Unless, it was in the remembered set, but it didn't get scanned because a mutator modified it before it was scanned. array copy involves two array object, src and dst. The dst array is an old array, the src may not be an old, it could be young. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3618958253 From kemperw at amazon.com Sat Dec 6 00:51:28 2025 From: kemperw at amazon.com (Kemper, William) Date: Sat, 6 Dec 2025 00:51:28 +0000 Subject: Reference leak in old gen in Generational Shenandoah In-Reply-To: References: Message-ID: I created https://bugs.openjdk.org/browse/JDK-8373203 to track progress. ________________________________ From: shenandoah-dev on behalf of Parker Winchester Sent: Friday, December 5, 2025 8:32:36 AM To: shenandoah-dev at openjdk.org Subject: [EXTERNAL] Reference leak in old gen in Generational Shenandoah CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. We just upgraded to JDK25 and are trying out Generational Shenandoah, coming from ZGC. We noticed native memory (in the "other" category) due to direct byte buffers steadily increasing and not getting freed - despite these DirectByteBuffer objects becoming unreachable and the GC clearly running frequently. One service of ours hit 2GB of native memory used after 24 hours, ultimately causing our service to be OOMKilled. Triggering GC's manually by taking a (live) heap histogram clears the native memory, so this seems to be a failure of the GC to find and clean up certain objects, rather than a true "leak." We tracked this down to issues with Undertow's DefaultByteBufferPool, which uses Finalizers and WeakHashMaps - these both use types of references (eg WeakReferences) that need at least one additional GC cycle to be removed by the GC. I plan to submit a change to Undertow's code to reduce its reliance on these, but it's possible this issue impacts other code, so I produced a minimal repro of it that doesn't use native memory. I believe the issue is a Reference in the old generation will sometimes fail to be discovered by the GC. A reference in the old gen will not be encountered by any young gen collections. And when it gets encountered in the old gen, should_discover() is returning false, so there's no way for it to ever be enqueued. I think this is due to the references being wrongly considered strongly live: [23.999s][trace][gc,ref ] GC(259) Encountered Reference: 0x000000030000b6e8 (Weak, OLD) [23.999s][trace][gc,ref ] GC(259) Reference strongly live: 0x000000030000b6e8 My minimal repro uses weak references, but I also noticed the issue with phantom references due to DirectByteBuffer Summary of my repro Each iteration it: * Allocates a simple object (MyLeakedObject - only necessary so it has a class name in the heap histogram) as well as a WeakReference to it. * It stores the WeakReference in a static list (this part appears to be necessary to the repro) * It then allocates a lot of garbage (80GB in a 8GB heap size) to force the object and the WeakReference to be promoted to the old gen * It then iterates over the static list and removes any WeakReferences with null referents * It then takes a heap histogram (not live, so we don't trigger GC), and prints the counts of MyLeakedObject and WeakReference * The loop then continues, allowing the object and its WeakReference to go out of scope. * Every 20 iterations it runs several system.gc() calls to prove that the counts return to 0 (system.gc() triggers a "global" GC which is different than an old gen GC). The count will go up each iteration until the system.gc(): Iteration 1: MyLeakedObject=1, WeakReference=5, WeakRefs with live referent=1 Iteration 2: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Iteration 3: MyLeakedObject=3, WeakReference=7, WeakRefs with live referent=3 Iteration 4: MyLeakedObject=4, WeakReference=8, WeakRefs with live referent=4 Iteration 5: MyLeakedObject=5, WeakReference=9, WeakRefs with live referent=5 Iteration 6: MyLeakedObject=6, WeakReference=10, WeakRefs with live referent=6 Iteration 7: MyLeakedObject=7, WeakReference=11, WeakRefs with live referent=7 Iteration 8: MyLeakedObject=8, WeakReference=12, WeakRefs with live referent=8 Iteration 9: MyLeakedObject=9, WeakReference=13, WeakRefs with live referent=9 Iteration 10: MyLeakedObject=10, WeakReference=14, WeakRefs with live referent=10 Iteration 11: MyLeakedObject=11, WeakReference=15, WeakRefs with live referent=11 Iteration 12: MyLeakedObject=12, WeakReference=16, WeakRefs with live referent=12 Iteration 13: MyLeakedObject=13, WeakReference=17, WeakRefs with live referent=13 Iteration 14: MyLeakedObject=14, WeakReference=18, WeakRefs with live referent=14 Iteration 15: MyLeakedObject=15, WeakReference=19, WeakRefs with live referent=15 Iteration 16: MyLeakedObject=16, WeakReference=20, WeakRefs with live referent=16 Iteration 17: MyLeakedObject=17, WeakReference=21, WeakRefs with live referent=17 Iteration 18: MyLeakedObject=18, WeakReference=22, WeakRefs with live referent=18 Iteration 19: MyLeakedObject=19, WeakReference=23, WeakRefs with live referent=19 Iteration 20: MyLeakedObject=20, WeakReference=24, WeakRefs with live referent=20 Forcing GCs... Iteration 21: MyLeakedObject=2, WeakReference=6, WeakRefs with live referent=2 Expected behavior: Each iteration should see only 1 at most 2 of MyLeakedObject, since they are no longer in scope and sufficient GC activity (young + old gen GCs) has occurred Actual behavior: Each iteration adds an additional MyLeakedObject and its WeakReference, leading to a leak I have only tested with Corretto on Ubuntu & OSX openjdk 25.0.1 2025-10-21 LTS OpenJDK Runtime Environment Corretto-25.0.1.8.1 (build 25.0.1+8-LTS) OpenJDK 64-Bit Server VM Corretto-25.0.1.8.1 (build 25.0.1+8-LTS, mixed mode, sharing) I've tried with non-generational shenandoah (mode=satb) and the issue does not occur. It also does not occur for ZGC or G1. I had a version of the repro that used DirectByteBuffers which yielded these results, strictly looking at reference processing in old gen GCs (running with -Xlog:gc*=info,gc+ref=trace) Iteration 1: Native Memory = 1 KB [20.423s][info ][gc,ref ] GC(46) Encountered references: Soft: 66, Weak: 183, Final: 0, Phantom: 3 [20.423s][info ][gc,ref ] GC(46) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [20.423s][info ][gc,ref ] GC(46) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 2: Native Memory = 2 KB [30.687s][info ][gc,ref ] GC(52) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 4 [30.688s][info ][gc,ref ] GC(52) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [30.688s][info ][gc,ref ] GC(52) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 3: Native Memory = 3 KB [54.496s][info ][gc,ref ] GC(70) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 5 [54.496s][info ][gc,ref ] GC(70) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 1 [54.496s][info ][gc,ref ] GC(70) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 Iteration 4: Native Memory = 4 KB [93.706s][info ][gc,ref ] GC(91) Encountered references: Soft: 66, Weak: 187, Final: 0, Phantom: 6 [93.706s][info ][gc,ref ] GC(91) Discovered references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [93.706s][info ][gc,ref ] GC(91) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 It's a little easier to see with DirectByteBuffer's Phantom references (there are 100+ unrelated WeakReferences, I believe these are used internally). Each iteration it adds another Phantom reference which is encountered, but fails to be discovered (due to being considered strongly live) Run the repro with: java -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 -XX:ShenandoahGuaranteedOldGCInterval=1000 -XX:+AlwaysPreTouch -Xmx8g -Xms8g GenShenWeakRefLeakRepro These flags help prove that the references are guaranteed to be encountered during each old gen GC cycle (otherwise they might be skipped over if the region has very little garbage) -XX:ShenandoahIgnoreGarbageThreshold=0 -XX:ShenandoahOldGarbageThreshold=0 -XX:ShenandoahGarbageThreshold=0 This flag guarantees that references in old gen regions get processed every 1 second (each iteration takes about 2 seconds on my M1 macbook) -XX:ShenandoahGuaranteedOldGCInterval=1000 Note I played around with the heap size and the allocation rate and found 8GB heap & 80GB allocated to be the most reliable way to reproduce the issue. Source code for GenShenWeakRefLeakRepro.java import java.io.BufferedReader; import java.io.InputStreamReader; import java.lang.ref.WeakReference; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.List; /** * Tests if WeakReferences with old-gen referents leak in Generational Shenandoah. */ public class GenShenWeakRefLeakRepro { // Keep WeakReferences alive in a static list (will be in old gen) private static final List> WEAK_REFS = new ArrayList<>(); private static final long[] COUNTS = new long[2]; static class MyLeakedObject { private final int value; MyLeakedObject(int value) { this.value = value; } } public static void main(String[] args) throws Exception { //allocate garbage to promote WEAK_REFS to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } for (int iteration = 0; iteration < 100; iteration++) { // Create object and weak reference MyLeakedObject obj = new MyLeakedObject(iteration); WeakReference wr = new WeakReference<>(obj); // Store in static list (so WeakRef survives and gets promoted) WEAK_REFS.add(wr); // Allocate garbage to promote both WeakRef and referent to old gen for (int i = 0; i < 800; i++) { byte[] garbage = new byte[100 * 1024 * 1024]; garbage[i % garbage.length] = (byte) i; } // Remove cleared WeakRefs (referent was collected) WEAK_REFS.removeIf(w -> w.get() == null); // Count objects getObjectCounts(); // What remains are WeakRefs with live referents long aliveCount = WEAK_REFS.size(); System.out.println("Iteration " + (iteration + 1) + ": MyLeakedObject=" + COUNTS[0] + ", WeakReference=" + COUNTS[1] + ", WeakRefs with live referent=" + aliveCount); // Periodically force GCs if ((iteration + 1) % 20 == 0) { System.out.println("Forcing GCs..."); for (int i = 0; i < 4; i++) { System.gc(); Thread.sleep(3000); } getObjectCounts(); System.out.println("After GC: MyLeakedObject=" + COUNTS[0] + ", WeakRefs with live referent=" + aliveCount); } } } private static void getObjectCounts() { COUNTS[0] = 0; COUNTS[1] = 0; try { Process p = new ProcessBuilder( "jcmd", String.valueOf(ProcessHandle.current().pid()), "GC.class_histogram", "-all") .start(); try (BufferedReader r = new BufferedReader( new InputStreamReader(p.getInputStream(), StandardCharsets.UTF_8))) { String line; while ((line = r.readLine()) != null) { String[] parts = line.trim().split("\\s+"); if (parts.length >= 4) { if (line.contains("GenShenWeakRefLeakRepro$MyLeakedObject")) { COUNTS[0] = Long.parseLong(parts[1]); } else if (line.contains("java.lang.ref.WeakReference ")) { COUNTS[1] = Long.parseLong(parts[1]); } } } } } catch (Exception e) { System.err.println("Histogram failed: " + e.getMessage()); } } } Thanks, Parker Winchester -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkemper at openjdk.org Sat Dec 6 00:52:01 2025 From: wkemper at openjdk.org (William Kemper) Date: Sat, 6 Dec 2025 00:52:01 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v5] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 18:19:39 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add include header shenandoahOldGeneration.hpp We talked offline. The assertion must be weakened to account for dirty write cards because the young pointer could be put in the old array _after_ init mark. We cannot expect the read card to be dirty in this case. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28669#pullrequestreview-3546805335 From zlin at openjdk.org Sat Dec 6 12:07:04 2025 From: zlin at openjdk.org (Zihao Lin) Date: Sat, 6 Dec 2025 12:07:04 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v15] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' into 8344116 - Merge branch 'master' into 8344116 - remove adr_type from graphKit - Fix test failed - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - fix conflict - Merge master - remove C2AccessValuePtr - fix assert - ... and 8 more: https://git.openjdk.org/jdk/compare/b0f59f60...c526f021 ------------- Changes: https://git.openjdk.org/jdk/pull/24258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=14 Stats: 316 lines in 22 files changed: 47 ins; 89 del; 180 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From qamai at openjdk.org Sun Dec 7 12:12:20 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Dec 2025 12:12:20 GMT Subject: RFR: 8372779: C2: Disambiguate Node::adr_type for the IR graph [v3] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Node::adr_type` is ambiguous. For some, it refers to the memory the node consumes, while for the others, it refer to the memory the node produces. This PR removes that ambiguity by introducing `Node::in_adr_type` and `Node::out_adr_type` that refer to those properties, respectively. It also introduces a local verification of the memory graph during compilation. These additions uncover some issues: > > - Sometimes, the memory is wired incorrectly, such as in `LibraryCall::extend_setCurrentThread`, the `Phi` collect the `StoreNode`s instead of the whole memory state. I think these issues do not result in crashes or miscompilation, though. > - `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. > - For nodes such as `StrInflatedCopyNode`, as it consumes more than it produces, during scheduling, we need to compute anti-dependencies. This is not the case, so I fixed it by making it kill all the memory it consumes. > - `GraphKit::set_output_for_allocation` uses a raw `ProjNode` as the base for a `MergeMem`, this is really suspicious. I didn't fix it, as it seems to not result in any symptom at the moment. > > In the end, the execution of the compiler is strictly more restricted than before, and there is less room for ambiguity. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into adrtype - store_to_memory does not emit MemBars - Disambiguate Node::adr_type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28570/files - new: https://git.openjdk.org/jdk/pull/28570/files/b39029a3..ec31fb75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28570&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28570&range=01-02 Stats: 29305 lines in 803 files changed: 17601 ins; 8334 del; 3370 mod Patch: https://git.openjdk.org/jdk/pull/28570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28570/head:pull/28570 PR: https://git.openjdk.org/jdk/pull/28570 From kdnilsen at openjdk.org Sun Dec 7 17:54:24 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 7 Dec 2025 17:54:24 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics Message-ID: When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. ------------- Commit messages: - make old evac ratio adaptive - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers - change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers - Adjust test for new defaults - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers - Change secondary old trigger to be percent of young-gen heap size - add trigger for percent of heap growth Changes: https://git.openjdk.org/jdk/pull/28561/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373225 Stats: 92 lines in 9 files changed: 74 ins; 1 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/28561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28561/head:pull/28561 PR: https://git.openjdk.org/jdk/pull/28561 From kdnilsen at openjdk.org Sun Dec 7 17:54:24 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 7 Dec 2025 17:54:24 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 01:10:02 GMT, Kelvin Nilsen wrote: > When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. > > When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. The benefits of this PR are demonstrated on an Extremem workload. Comparisons with master are highighted in this spreadsheet: image Highlights: 1. Far fewer old GCs, with slight increase in young GCs (74.45% improvement) 2. Since old GCs are much more costly than young GCs, 4.5% improvement in CPU utilization. 3. Latencies improved across all percentiles (from small increase of 0.3% at p50 to significant increase of 51.2% at p99.999) The workload is configured as follows: ~/github/jdk.11-17-2025/build/linux-x86_64-server-release/images/jdk/bin/java \ -XX:+UnlockExperimentalVMOptions \ -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms8g -Xmx8g \ -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \ -XX:ShenandoahMinFreeThreshold=5 \ -XX:ShenandoahFullGCThreshold=1024 \ -Xlog:"gc*=info,ergo" \ -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \ -XX:+UnlockDiagnosticVMOptions \ -jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \ -dInitializationDelay=45s \ -dDictionarySize=3000000 \ -dNumCustomers=300000 \ -dNumProducts=60000 \ -dCustomerThreads=750 \ -dCustomerPeriod=1600ms \ -dCustomerThinkTime=300ms \ -dKeywordSearchCount=4 \ -dServerThreads=5 \ -dServerPeriod=1s \ -dProductNameLength=10 \ -dBrowsingHistoryQueueCount=5 \ -dSalesTransactionQueueCount=5 \ -dProductDescriptionLength=32 \ -dProductReplacementPeriod=10s \ -dProductReplacementCount=10000 \ -dCustomerReplacementPeriod=5s \ -dCustomerReplacementCount=1000 \ -dBrowsingExpiration=1m \ -dPhasedUpdates=true \ -dPhasedUpdateInterval=30s \ -dSimulationDuration=25m \ -dResponseTimeMeasurements=100000 \ >$t.genshen.reproducer.baseline-8g.out 2>$t.genshen.reproducer.baseline-8g.err & job_pid=$! max_rss_kb=0 for s in {1..99} do sleep 15 rss_kb=$(ps -o rss= -p $job_pid) if (( $rss_kb > $max_rss_kb )) then max_rss_kb=$rss_kb fi done rss_mb=$((max_rss_kb / 1024)) cpu_percent=$(ps -o cputime -o etime -p $job_pid) wait $job_pid echo "RSS: $rss_mb MB" >>$t.genshen.reproducer.baseline-8g.out 2>>$t.genshen.reproducer.share-collector-reserves.err echo "$cpu_percent" >>$t.genshen.reproducer.baseline-8g.out 2>>$t.genshen.reproducer.share-collector-reserves.err gzip $t.genshen.reproducer.baseline-8g.out $t.genshen.reproducer.baseline-8g.err Note that this PR causes us to operate closer to the edge of the operating envelope. In more aggressively provisioned configurations (same workload in smaller heap, for example), we see some regression in latencies compared to tip. This results because of increased numbers of degenerated GCs which result from starvation of mixed evacuations. This PR causes us to do fewer old GCs, but each old GC is expected to work more efficiently. We expect these regressions to be mitigated by other PRs that are currently under development and review, including: 1. Sharing of collector reserves between young and old 2. Accelerated triggers 3. Surging of GC workers 4. Adaptive old-evac ratio ------------- PR Comment: https://git.openjdk.org/jdk/pull/28561#issuecomment-3622610260 PR Comment: https://git.openjdk.org/jdk/pull/28561#issuecomment-3622625901 From xpeng at openjdk.org Sun Dec 7 20:34:13 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sun, 7 Dec 2025 20:34:13 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be done unconditionally by arraycopy_marking if the array is in an old region [v6] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS, arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: enqueue objects stored in old array at ShenandoahSATBBarrier when concurrent young marking is in progress ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/49ea3c93..c649cf2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From qamai at openjdk.org Mon Dec 8 07:41:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 8 Dec 2025 07:41:01 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <5DHx3WmMb1UtSeyiEiYCiisVgRFggPFfxBggpgtuD6M=.d72a9c07-9624-47ea-9398-a0d1dee69755@github.com> References: <5DHx3WmMb1UtSeyiEiYCiisVgRFggPFfxBggpgtuD6M=.d72a9c07-9624-47ea-9398-a0d1dee69755@github.com> Message-ID: On Fri, 5 Dec 2025 14:02:14 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/castnode.hpp line 105: >> >>> 103: // All the possible combinations of floating/narrowing with example use cases: >>> 104: >>> 105: // Use case example: Range Check CastII >> >> I believe this is incorrect, a range check should be floating non-narrowing. It is only narrowing if the length of the array is a constant. It is because this cast encodes the dependency on the condition `index u< length`. This condition cannot be expressed in terms of `Type` unless `length` is a constant. > > Range check `CastII` were added to protect the `ConvI2L` in the address expression on 64 bits. The problem there was, in some cases, that the `ConvI2L` would float above the range check (because `ConvI2L` has no control input) and could end up with an out of range input (which in turn would cause the `ConvI2L` to become `top` in places where it wasn't expected). > So `CastII` doesn't carry the control dependency of an array access on its range check. That dependency is carried by the `MemNode` which has its control input set to the range check. > What you're saying, if I understand it correctly, would be true if the `CastII` was required to prevent an array `Load` from floating. But that's not the case. Got it, sorry I misunderstood! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2597364668 From qamai at openjdk.org Mon Dec 8 07:48:05 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 8 Dec 2025 07:48:05 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:05:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3550550450 From roland at openjdk.org Mon Dec 8 14:52:35 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 8 Dec 2025 14:52:35 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: <2uqd_nRO0UZWonQnFDqkWYvrYwTGQbDEDnWx3C4eoAo=.65472aeb-e9c2-4f99-8728-d4c7e1afaf57@github.com> References: <2uqd_nRO0UZWonQnFDqkWYvrYwTGQbDEDnWx3C4eoAo=.65472aeb-e9c2-4f99-8728-d4c7e1afaf57@github.com> Message-ID: On Mon, 14 Apr 2025 11:50:27 GMT, Quan Anh Mai wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > If a `CastII` that does not narrow its input has its type being a constant, do you think GVN should transform it into a constant, or such nodes should return the bottom type so that it is not folded into a floating `ConNode`? @merykitty @eme64 @chhagedorn thanks for the reviews Does testing need to be run on this before I integrate? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3627292197 From epeter at openjdk.org Mon Dec 8 15:50:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 15:50:55 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: <2uqd_nRO0UZWonQnFDqkWYvrYwTGQbDEDnWx3C4eoAo=.65472aeb-e9c2-4f99-8728-d4c7e1afaf57@github.com> Message-ID: On Mon, 8 Dec 2025 14:49:41 GMT, Roland Westrelin wrote: >> If a `CastII` that does not narrow its input has its type being a constant, do you think GVN should transform it into a constant, or such nodes should return the bottom type so that it is not folded into a floating `ConNode`? > > @merykitty @eme64 @chhagedorn thanks for the reviews > Does testing need to be run on this before I integrate? @rwestrel I'll run some testing now ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3627616227 From wkemper at openjdk.org Mon Dec 8 15:58:02 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Dec 2025 15:58:02 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v6] In-Reply-To: References: Message-ID: On Sun, 7 Dec 2025 20:34:13 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS(Old GC may not be started, TAMS of old region is not captured), arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case during concurrent young marking. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > enqueue objects stored in old array at ShenandoahSATBBarrier when concurrent young marking is in progress src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 451: > 449: if (ShenandoahSATBBarrier) { > 450: if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst)) || > 451: (_heap->is_concurrent_young_mark_in_progress() && _heap->heap_region_containing(dst)->is_old())) { We could also check if Shenandoah is running in generational mode. Even in non-generational mode, we set `YOUNG_MARKING` in gc state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2599168582 From wkemper at openjdk.org Mon Dec 8 16:44:57 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Dec 2025 16:44:57 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 01:10:02 GMT, Kelvin Nilsen wrote: > When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. > > When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 212: > 210: void slide_pinned_regions_to_front(); > 211: bool all_candidates_are_pinned(); > 212: void adjust_old_garbage_threshold(); A brief general comment about the algorithm here or in the implementation would be welcome. As I read it, we are lowering the region's garbage threshold as the occupancy in the old generation increases. Lowering the garbage threshold will increase the number of old regions selected for a mixed collection. src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 252: > 250: (_live_bytes_after_last_mark + ((ShenandoahHeap::heap()->soft_max_capacity() - _live_bytes_after_last_mark) > 251: * ShenandoahMinOldGenGrowthRemainingHeapPercent / 100.0)); > 252: size_t result = MIN2(threshold_by_relative_growth, threshold_by_growth_into_percent_remaining); Are we comparing bytes to a percentage here? Not sure I understand the role of `FRACTIONAL_DENOMINATOR`. ------------- PR Review: https://git.openjdk.org/jdk/pull/28561#pullrequestreview-3552965737 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2599302961 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2599331553 From xpeng at openjdk.org Mon Dec 8 18:05:00 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 18:05:00 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v6] In-Reply-To: References: Message-ID: <7VQhaVsTX2vUrVDFT8DLdVAFZjhhbentEUo3Fms4MMY=.fb2be599-6ab5-4e5c-bd8c-10840ac1f5c5@github.com> On Mon, 8 Dec 2025 15:55:41 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> enqueue objects stored in old array at ShenandoahSATBBarrier when concurrent young marking is in progress > > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 451: > >> 449: if (ShenandoahSATBBarrier) { >> 450: if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst)) || >> 451: (_heap->is_concurrent_young_mark_in_progress() && _heap->heap_region_containing(dst)->is_old())) { > > We could also check if Shenandoah is running in generational mode. Even in non-generational mode, we set `YOUNG_MARKING` in gc state. When is_old returns true, it implies Shenandoah is in generational mode, so logically it is not needed to test if is in generational mode, but we may want to test the mode for performance consideration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2599591835 From wkemper at openjdk.org Mon Dec 8 18:08:49 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Dec 2025 18:08:49 GMT Subject: RFR: 8314599: [GenShen] Couple adaptive tenuring and generation size budgeting [v12] In-Reply-To: References: Message-ID: > Notable changes: > * Improvements to logging > * More accurate tracking of promotion failures > * Use shared allocation for promotions only when the size is above the maximum plab size (not the minimum size) > * Use census information gathered during mark to size promotion reserves and old generation > > With these changes, GenShen is expected to have fewer promotion failures and this is indeed the case. As a result of this, we expect less time to be spent in concurrent marking and update refs for young collections. We may also expect shorter concurrent evacuation phases because GenShen will have fewer densely packed regions stuck in the young generation. With more objects being promoted, we also expect to see longer remembered set scan times. This is generally the case across all benchmarks, but we do also see some counter-intuitive results. > > Here we are comparing 20 executions (10 on x86, 10 on aarch64) of the changes in the PR (experiment) against 20 executions of the same benchmarks results from tip. This is a summary of statistically significant changes of more than 5% across all benchmarks: > > > Concurrent Evacuation: 7 improvements, 3 regressions > ? Best improvements: extremem-large-45g (-29.6%), neo4j-analytics (-26.9%) > ? Worst regression: xalan (+53.7%) > > Concurrent Marking: 15 improvements, 1 regression > ? Best improvements: hyperalloc_a2048_o4096 (-30.1%), crypto.rsa (-27.3%) > ? Only regression: serial (+8.9%) > > Concurrent Scan Remembered Set: 7 improvements, 2 regressions > ? Best improvements: xalan (-49.4%), pmd (-49.0%), crypto.rsa (-41.8%) > ? Worst regression: extremem-phased (+52.4%) > > Concurrent Update Refs: 5 improvements, 4 regressions > ? Best improvements: crypto.rsa (-36.4%), mnemonics (-28.4%) > ? Worst regression: xalan (+89.4%) William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 79 commits: - Fix comments, add back an assertion - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Accommodate behavior of global heuristic - Restore missing update for inplace promotion padding - Remove reference to adaptive tuning flag - Remove commented out assertion - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Adaptive tenuring is no longer optional We are using age census data to compute promotion reserves. The tenuring threshold may still be fixed by setting the min/max threshold to the same value. - Remove bad asserts - Don't include tenurable bytes for current cycle in the next cycle Also remove vestigial promotion potential calculation - ... and 69 more: https://git.openjdk.org/jdk/compare/811591c5...e3f22960 ------------- Changes: https://git.openjdk.org/jdk/pull/27632/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27632&range=11 Stats: 398 lines in 11 files changed: 158 ins; 173 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/27632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27632/head:pull/27632 PR: https://git.openjdk.org/jdk/pull/27632 From xpeng at openjdk.org Mon Dec 8 18:09:00 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 18:09:00 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: Message-ID: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS(Old GC may not be started, TAMS of old region is not captured), arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case during concurrent young marking. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Also test is_generational ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/c649cf2b..225b999d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From wkemper at openjdk.org Mon Dec 8 18:29:02 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Dec 2025 18:29:02 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 18:09:00 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS(Old GC may not be started, TAMS of old region is not captured), arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case during concurrent young marking. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Also test is_generational Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28669#pullrequestreview-3553497303 From xpeng at openjdk.org Mon Dec 8 19:59:08 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 19:59:08 GMT Subject: RFR: 8373272: Genshen: ShenandoahOldGenerationTest fails after JDK-8373056 Message-ID: ShenandoahOldGeneration::configure_plab_for_current_thread has been updated to only handle plab req, which is a behavior change, but ShenandoahOldGenerationTest was not updated to match the behavior change, causing the test to fail. The two failing unit tests of ShenandoahOldGenerationTest have been removed in this PR since the behavior being verified with them is no longer in configure_plab_for_current_thread, meanwhile one more unit test is added to verify the behavior of `expend_promoted` ### Test - [x] CONF=macosx-aarch64-server-fastdebug make test TEST=gtest - [ ] CONF=macosx-aarch64-server-fastdebug make test TEST=hotspot_gc_shenandoah ------------- Commit messages: - Remove tests and revert the changes to configure_plab_for_current_thread - Rework on configure_plab_for_current_thread and the unit test to fix test failures Changes: https://git.openjdk.org/jdk/pull/28706/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28706&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373272 Stats: 26 lines in 1 file changed: 0 ins; 23 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28706.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28706/head:pull/28706 PR: https://git.openjdk.org/jdk/pull/28706 From wkemper at openjdk.org Mon Dec 8 19:59:09 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Dec 2025 19:59:09 GMT Subject: RFR: 8373272: Genshen: ShenandoahOldGenerationTest fails after JDK-8373056 In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 19:32:16 GMT, Xiaolong Peng wrote: > ShenandoahOldGeneration::configure_plab_for_current_thread has been updated to only handle plab req, which is a behavior change, but ShenandoahOldGenerationTest was not updated to match the behavior change, causing the test to fail. > > The two failing unit tests of ShenandoahOldGenerationTest have been removed in this PR since the behavior being verified with them is no longer in configure_plab_for_current_thread, meanwhile one more unit test is added to verify the behavior of `expend_promoted` > > ### Test > - [x] CONF=macosx-aarch64-server-fastdebug make test TEST=gtest > - [ ] CONF=macosx-aarch64-server-fastdebug make test TEST=hotspot_gc_shenandoah Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 196: > 194: } > 195: > 196: void I would revert this back to asserting its never called for a shared allocation and delete the test (but keep the new `test_expend_promoted_should_increase_expended`). ------------- PR Review: https://git.openjdk.org/jdk/pull/28706#pullrequestreview-3553772130 PR Review Comment: https://git.openjdk.org/jdk/pull/28706#discussion_r2599902076 From xpeng at openjdk.org Mon Dec 8 19:59:10 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 19:59:10 GMT Subject: RFR: 8373272: Genshen: ShenandoahOldGenerationTest fails after JDK-8373056 In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 19:40:10 GMT, William Kemper wrote: >> ShenandoahOldGeneration::configure_plab_for_current_thread has been updated to only handle plab req, which is a behavior change, but ShenandoahOldGenerationTest was not updated to match the behavior change, causing the test to fail. >> >> The two failing unit tests of ShenandoahOldGenerationTest have been removed in this PR since the behavior being verified with them is no longer in configure_plab_for_current_thread, meanwhile one more unit test is added to verify the behavior of `expend_promoted` >> >> ### Test >> - [x] CONF=macosx-aarch64-server-fastdebug make test TEST=gtest >> - [ ] CONF=macosx-aarch64-server-fastdebug make test TEST=hotspot_gc_shenandoah > > src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 196: > >> 194: } >> 195: >> 196: void > > I would revert this back to asserting its never called for a shared allocation and delete the test (but keep the new `test_expend_promoted_should_increase_expended`). Thanks, I also prefer this approach. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28706#discussion_r2599931175 From kdnilsen at openjdk.org Mon Dec 8 20:56:00 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 8 Dec 2025 20:56:00 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 18:09:00 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS(Old GC may not be started, TAMS of old region is not captured), arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case during concurrent young marking. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Also test is_generational src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 450: > 448: assert(_heap->is_concurrent_mark_in_progress(), "only during marking"); > 449: if (ShenandoahSATBBarrier) { > 450: if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst)) || Predicates: A: heap->is_concurrent_young_mark_in_progress() B: heap->is_concurrent_old_mark_in_progress() C: heap->heap_region_containining(dst)->is_old() D: !heap->marking_context()->allocated_after_mark_start(dst) I think the conditions under which we need to call arraycopy_work() are: (A && C) || (A && D) || (B && C && D) which could be written: (A && (C || D)) || (B && C && D) As written, I think we are also calling arraycopy_work() under certain unnecessary conditions, such as: (B && C) src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 464: > 462: assert(!_heap->is_in_young(obj) || > 463: card_scan->is_card_dirty(elem_heap_word_ptr) || > 464: card_scan->is_write_card_dirty(elem_heap_word_ptr), I believe there is a very slight risk of assertion failure here, which might be so rare that you could just mention the possibility in the "error message" or in a comment associated with this code. The race is that some other thread could overwrite an an old-gen array element with pointer to young during young marking, and we might see this interesting young pointer before that other thread has had a chance to mark the associated card dirty. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600079828 PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600102116 From kdnilsen at openjdk.org Mon Dec 8 20:56:01 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 8 Dec 2025 20:56:01 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 20:45:36 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Also test is_generational > > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 450: > >> 448: assert(_heap->is_concurrent_mark_in_progress(), "only during marking"); >> 449: if (ShenandoahSATBBarrier) { >> 450: if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst)) || > > Predicates: > A: heap->is_concurrent_young_mark_in_progress() > B: heap->is_concurrent_old_mark_in_progress() > C: heap->heap_region_containining(dst)->is_old() > D: !heap->marking_context()->allocated_after_mark_start(dst) > > I think the conditions under which we need to call arraycopy_work() are: > (A && C) || (A && D) || (B && C && D) > which could be written: > (A && (C || D)) || (B && C && D) > > As written, I think we are also calling arraycopy_work() under certain unnecessary conditions, such as: > (B && C) Wondering if the test for is_generational() could be captured in a template parameter. Each invocation that I find with grep already knows whether is_generational. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600093807 From xpeng at openjdk.org Mon Dec 8 21:14:12 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 21:14:12 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: Message-ID: <4V8h9n8guTinNiCNYtecwcEKZb4y8zz6Qnwidpc4lC4=.149b8dd2-e5d5-45ad-9f93-12be677a2072@github.com> On Mon, 8 Dec 2025 20:49:30 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 450: >> >>> 448: assert(_heap->is_concurrent_mark_in_progress(), "only during marking"); >>> 449: if (ShenandoahSATBBarrier) { >>> 450: if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst)) || >> >> Predicates: >> A: heap->is_concurrent_young_mark_in_progress() >> B: heap->is_concurrent_old_mark_in_progress() >> C: heap->heap_region_containining(dst)->is_old() >> D: !heap->marking_context()->allocated_after_mark_start(dst) >> >> I think the conditions under which we need to call arraycopy_work() are: >> (A && C) || (A && D) || (B && C && D) >> which could be written: >> (A && (C || D)) || (B && C && D) >> >> As written, I think we are also calling arraycopy_work() under certain unnecessary conditions, such as: >> (B && C) > > Wondering if the test for is_generational() could be captured in a template parameter. Each invocation that I find with grep already knows whether is_generational. Thanks to make it so clear, I did similar evaluation and knew we will call arraycopy_work when (B && C) is true, but it shouldn't cause crash from OldGC, we will be more conservative in such case resulting in maybe more live objects. I'll try to make the tests here more accurate in the PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600146192 From xpeng at openjdk.org Mon Dec 8 21:30:00 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 21:30:00 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 20:52:54 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Also test is_generational > > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 464: > >> 462: assert(!_heap->is_in_young(obj) || >> 463: card_scan->is_card_dirty(elem_heap_word_ptr) || >> 464: card_scan->is_write_card_dirty(elem_heap_word_ptr), > > I believe there is a very slight risk of assertion failure here, which might be so rare that you could just mention the possibility in the "error message" or in a comment associated with this code. > > The race is that some other thread could overwrite an an old-gen array element with pointer to young during young marking, and we might see this interesting young pointer before that other thread has had a chance to mark the associated card dirty. Yes, it is still possible to fail because during arraycopy, there could be mutators updating objects in the array or also do arraycopy, they will update the write card at the same time. With the better understanding of ShenandoahSATBBarrier and ShenandoahCardBarrier after the discussions, I don't think it make sense to add the these asserts. We are verifying the behavior of ShenandoahCardBarrier in the ShenandoahSATBBarrier code, which is wired and will confuse people when will read the code in the further. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600191941 From dlong at openjdk.org Mon Dec 8 21:31:05 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 8 Dec 2025 21:31:05 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 08:37:02 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: > > - add guard to the include > - add load_reference_barrier_Type > - add clone_barrier_Type > - add write_barrier_pre_Type > - revert shenandoah changes Why are you trying to #include a .cpp file? Just let the linker handle it. You didn't need that for shenandoahBarrierSetC2.cpp, so what makes barrierSetC2.cpp special? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3629077478 From xpeng at openjdk.org Mon Dec 8 21:46:57 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 21:46:57 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: <4V8h9n8guTinNiCNYtecwcEKZb4y8zz6Qnwidpc4lC4=.149b8dd2-e5d5-45ad-9f93-12be677a2072@github.com> References: <4V8h9n8guTinNiCNYtecwcEKZb4y8zz6Qnwidpc4lC4=.149b8dd2-e5d5-45ad-9f93-12be677a2072@github.com> Message-ID: On Mon, 8 Dec 2025 21:09:31 GMT, Xiaolong Peng wrote: >> Wondering if the test for is_generational() could be captured in a template parameter. Each invocation that I find with grep already knows whether is_generational. > > Thanks to make it so clear, I did similar evaluation and knew we will call arraycopy_work when (B && C) is true, but it shouldn't cause crash from OldGC, we will be more conservative in such case resulting in maybe more live objects. > > I'll try to make the tests here more accurate in the PR. arraycopy_work will be called when (B && C) is true if only A is also true, it is expected and what we wanted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600242973 From wkemper at openjdk.org Mon Dec 8 21:46:58 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Dec 2025 21:46:58 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: <4V8h9n8guTinNiCNYtecwcEKZb4y8zz6Qnwidpc4lC4=.149b8dd2-e5d5-45ad-9f93-12be677a2072@github.com> Message-ID: On Mon, 8 Dec 2025 21:43:26 GMT, Xiaolong Peng wrote: >> Thanks to make it so clear, I did similar evaluation and knew we will call arraycopy_work when (B && C) is true, but it shouldn't cause crash from OldGC, we will be more conservative in such case resulting in maybe more live objects. >> >> I'll try to make the tests here more accurate in the PR. > > arraycopy_work will be called when (B && C) is true if only A is also true, it is expected and what we wanted. Condition `D` should be sufficient when we are marking old. That is, I don't believe we need to check `B` or `C` when we are marking old. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600245936 From xpeng at openjdk.org Mon Dec 8 21:51:00 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 21:51:00 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: <4V8h9n8guTinNiCNYtecwcEKZb4y8zz6Qnwidpc4lC4=.149b8dd2-e5d5-45ad-9f93-12be677a2072@github.com> Message-ID: On Mon, 8 Dec 2025 21:44:24 GMT, William Kemper wrote: >> arraycopy_work will be called when (B && C) is true if only A is also true, it is expected and what we wanted. > > Condition `D` should be sufficient when we are marking old. That is, I don't believe we need to check `B` or `C` when we are marking old. There is one case, I think we may not want to make the test here overly complicated so I didn't add it: (!A && B && C && !D) It could after final-mark in bootstrap young GC, the young marking has done, old marking is in progress. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600258109 From xpeng at openjdk.org Mon Dec 8 22:38:13 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 22:38:13 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v8] In-Reply-To: References: Message-ID: <3Z0T1qrYELdfc_aZ-nN64JnCUG4pKB6I5TKZa_aAeKQ=.464424cb-b086-45b6-86d7-2116db07e06b@github.com> > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS(Old GC may not be started, TAMS of old region is not captured), arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case during concurrent young marking. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Remove header - Remove card assert, pass IS_GENERATIONAL as template parameter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/225b999d..fe272ab8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=06-07 Stats: 28 lines in 2 files changed: 4 ins; 19 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From xpeng at openjdk.org Mon Dec 8 22:38:15 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 22:38:15 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: <4V8h9n8guTinNiCNYtecwcEKZb4y8zz6Qnwidpc4lC4=.149b8dd2-e5d5-45ad-9f93-12be677a2072@github.com> Message-ID: On Mon, 8 Dec 2025 21:48:10 GMT, Xiaolong Peng wrote: >> Condition `D` should be sufficient when we are marking old. That is, I don't believe we need to check `B` or `C` when we are marking old. > > There is one case, I think we may not want to make the test here overly complicated so I didn't add it: > (!A && B && C && !D) > > It could happen after final-mark in bootstrap young GC, the young marking has done, old marking is in progress. I have updated PR, now is_generational is passed to the method as template parameter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600363483 From xpeng at openjdk.org Mon Dec 8 22:38:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 22:38:16 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 21:25:46 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 464: >> >>> 462: assert(!_heap->is_in_young(obj) || >>> 463: card_scan->is_card_dirty(elem_heap_word_ptr) || >>> 464: card_scan->is_write_card_dirty(elem_heap_word_ptr), >> >> I believe there is a very slight risk of assertion failure here, which might be so rare that you could just mention the possibility in the "error message" or in a comment associated with this code. >> >> The race is that some other thread could overwrite an an old-gen array element with pointer to young during young marking, and we might see this interesting young pointer before that other thread has had a chance to mark the associated card dirty. > > Yes, it is still possible to fail because during arraycopy, there could be mutators updating objects in the array or doing arraycopy, they will update the write card at the same time. > > With the better understanding of ShenandoahSATBBarrier and ShenandoahCardBarrier after the discussions, I don't think it make sense to add the these asserts. We are verifying the behavior of ShenandoahCardBarrier in the ShenandoahSATBBarrier code, which is weird and will confuse people when will read the code in the further. I have removed the asserts, given that it is not possible the guarantee that the assert will always pass, also I think we shall not verify the behavior of ShenandoahCardBarrier in the ShenandoahSATBBarrier code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600367406 From xpeng at openjdk.org Mon Dec 8 22:47:45 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Dec 2025 22:47:45 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v9] In-Reply-To: References: Message-ID: <4xcQSCGc6pweN2V4QQqX02_e06yCoXEEDPC5fH50DUE=.b2e0d913-af28-4b38-9bb4-652fbbdab614@github.com> > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS(Old GC may not be started, TAMS of old region is not captured), arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case during concurrent young marking. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix indent ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28669/files - new: https://git.openjdk.org/jdk/pull/28669/files/fe272ab8..9e186a85 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28669&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28669/head:pull/28669 PR: https://git.openjdk.org/jdk/pull/28669 From wkemper at openjdk.org Mon Dec 8 22:59:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Dec 2025 22:59:56 GMT Subject: RFR: 8373272: Genshen: ShenandoahOldGenerationTest fails after JDK-8373056 In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 19:32:16 GMT, Xiaolong Peng wrote: > ShenandoahOldGeneration::configure_plab_for_current_thread has been updated to only handle plab req, which is a behavior change, but ShenandoahOldGenerationTest was not updated to match the behavior change, causing the test to fail. > > The two failing unit tests of ShenandoahOldGenerationTest have been removed in this PR since the behavior being verified with them is no longer in configure_plab_for_current_thread, meanwhile one more unit test is added to verify the behavior of `expend_promoted` > > ### Test > - [x] CONF=macosx-aarch64-server-fastdebug make test TEST=gtest > - [x] CONF=macosx-aarch64-server-fastdebug make test TEST=hotspot_gc_shenandoah Thank you! ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28706#pullrequestreview-3554431253 From wkemper at openjdk.org Mon Dec 8 23:01:57 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Dec 2025 23:01:57 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v9] In-Reply-To: <4xcQSCGc6pweN2V4QQqX02_e06yCoXEEDPC5fH50DUE=.b2e0d913-af28-4b38-9bb4-652fbbdab614@github.com> References: <4xcQSCGc6pweN2V4QQqX02_e06yCoXEEDPC5fH50DUE=.b2e0d913-af28-4b38-9bb4-652fbbdab614@github.com> Message-ID: On Mon, 8 Dec 2025 22:47:45 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS(Old GC may not be started, TAMS of old region is not captured), arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case during concurrent young marking. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Fix indent Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28669#pullrequestreview-3554437116 From btaylor at openjdk.org Mon Dec 8 23:29:58 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Mon, 8 Dec 2025 23:29:58 GMT Subject: RFR: 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics In-Reply-To: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> References: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> Message-ID: On Fri, 5 Dec 2025 18:44:08 GMT, Ben Taylor wrote: > The `STATIC_ASSERT` below this typedef appears to be out of date. > > The barriers check thread local copy of gc state, which is stored in `ShenandoahThreadLocalData::_gc_state` and is type `char`, so the size requirement described by the assert is maintained even after this change. > > Change passes all tier1 tests locally when run with Shenandoah GC. I will check some performance numbers before and after this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28681#issuecomment-3629462825 From duke at openjdk.org Mon Dec 8 23:29:59 2025 From: duke at openjdk.org (duke) Date: Mon, 8 Dec 2025 23:29:59 GMT Subject: RFR: 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics In-Reply-To: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> References: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> Message-ID: <6LAY0z9qrgg4ejX6fH_yO1s2eXRTxdQpffa7-mrER84=.f35776d1-6b7a-4863-977c-f59825c12f0a@github.com> On Fri, 5 Dec 2025 18:44:08 GMT, Ben Taylor wrote: > The `STATIC_ASSERT` below this typedef appears to be out of date. > > The barriers check thread local copy of gc state, which is stored in `ShenandoahThreadLocalData::_gc_state` and is type `char`, so the size requirement described by the assert is maintained even after this change. > > Change passes all tier1 tests locally when run with Shenandoah GC. @benty-amzn Your change (at version 90923ab3b090ae4021bcb4bf47076f6124cd2491) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28681#issuecomment-3629465464 From duke at openjdk.org Tue Dec 9 00:03:58 2025 From: duke at openjdk.org (Rui Li) Date: Tue, 9 Dec 2025 00:03:58 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 02:19:32 GMT, Rui Li wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> Tests: >> - Ran the repro app `StableLiveSet.java` in https://bugs.openjdk.org/browse/JDK-8372543. Without fix, tip had ~2910 times gc in 20 sec with `-XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive -XX:SoftMaxHeapSize=512m -Xmx31g` jvm args. With the fix, only 18 times in 20 sec. >> - GHA passed. >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused freeset includes src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 255: > 253: size_t min_threshold = min_free_threshold(); > 254: if (available < min_threshold) { > 255: log_trigger("Free (Soft) (%zu%s) is below minimum threshold (%zu%s)", Forgot to use log format macros `PROPERFMT` & `PROPERFMTARGS` here. Will update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2600564531 From kdnilsen at openjdk.org Tue Dec 9 00:06:56 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Dec 2025 00:06:56 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 16:41:04 GMT, William Kemper wrote: >> When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. >> >> When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. > > src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 252: > >> 250: (_live_bytes_after_last_mark + ((ShenandoahHeap::heap()->soft_max_capacity() - _live_bytes_after_last_mark) >> 251: * ShenandoahMinOldGenGrowthRemainingHeapPercent / 100.0)); >> 252: size_t result = MIN2(threshold_by_relative_growth, threshold_by_growth_into_percent_remaining); > > Are we comparing bytes to a percentage here? Not sure I understand the role of `FRACTIONAL_DENOMINATOR`. I'll take this opportunity to simplify that code. We're using a percent. Next version of code should be more clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2600571373 From btaylor at openjdk.org Tue Dec 9 00:20:20 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Tue, 9 Dec 2025 00:20:20 GMT Subject: Integrated: 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics In-Reply-To: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> References: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> Message-ID: On Fri, 5 Dec 2025 18:44:08 GMT, Ben Taylor wrote: > The `STATIC_ASSERT` below this typedef appears to be out of date. > > The barriers check thread local copy of gc state, which is stored in `ShenandoahThreadLocalData::_gc_state` and is type `char`, so the size requirement described by the assert is maintained even after this change. > > Change passes all tier1 tests locally when run with Shenandoah GC. This pull request has now been integrated. Changeset: b86b2cbc Author: Ben Taylor Committer: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/jdk/commit/b86b2cbc7d9dd57aeaf64f70f248a120ae3cb751 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/jdk/pull/28681 From xpeng at openjdk.org Tue Dec 9 01:20:08 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 9 Dec 2025 01:20:08 GMT Subject: RFR: 8373272: Genshen: ShenandoahOldGenerationTest fails after JDK-8373056 In-Reply-To: References: Message-ID: <-h78N1QQaMBV1JcYat9SZfxw0zsA4mC_0RQrLWlK2ic=.3e3eaf57-deea-4561-922d-0856418a2730@github.com> On Mon, 8 Dec 2025 22:57:33 GMT, William Kemper wrote: >> ShenandoahOldGeneration::configure_plab_for_current_thread has been updated to only handle plab req, which is a behavior change, but ShenandoahOldGenerationTest was not updated to match the behavior change, causing the test to fail. >> >> The two failing unit tests of ShenandoahOldGenerationTest have been removed in this PR since the behavior being verified with them is no longer in configure_plab_for_current_thread, meanwhile one more unit test is added to verify the behavior of `expend_promoted` >> >> ### Test >> - [x] CONF=macosx-aarch64-server-fastdebug make test TEST=gtest >> - [x] CONF=macosx-aarch64-server-fastdebug make test TEST=hotspot_gc_shenandoah > > Thank you! Thank you @earthling-amzn! I'll integrate it now considering that these testes are blocking our CI workflows. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28706#issuecomment-3629760071 From xpeng at openjdk.org Tue Dec 9 01:20:09 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 9 Dec 2025 01:20:09 GMT Subject: Integrated: 8373272: Genshen: ShenandoahOldGenerationTest fails after JDK-8373056 In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 19:32:16 GMT, Xiaolong Peng wrote: > ShenandoahOldGeneration::configure_plab_for_current_thread has been updated to only handle plab req, which is a behavior change, but ShenandoahOldGenerationTest was not updated to match the behavior change, causing the test to fail. > > The two failing unit tests of ShenandoahOldGenerationTest have been removed in this PR since the behavior being verified with them is no longer in configure_plab_for_current_thread, meanwhile one more unit test is added to verify the behavior of `expend_promoted` > > ### Test > - [x] CONF=macosx-aarch64-server-fastdebug make test TEST=gtest > - [x] CONF=macosx-aarch64-server-fastdebug make test TEST=hotspot_gc_shenandoah This pull request has now been integrated. Changeset: 3ea82b9f Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/3ea82b9ff90aebc1a169fdd967c44408dc4a4f51 Stats: 26 lines in 1 file changed: 0 ins; 23 del; 3 mod 8373272: Genshen: ShenandoahOldGenerationTest fails after JDK-8373056 Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/28706 From kdnilsen at openjdk.org Tue Dec 9 01:23:30 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Dec 2025 01:23:30 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v2] In-Reply-To: References: Message-ID: <7UZLMmJ563BYtqd7cxmV3NIMc9LhzQFym6qLGBkXqFc=.ee43602e-8a4c-4bb3-acc4-267ee0a909ec@github.com> > When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. > > When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. Kelvin Nilsen has updated the pull request incrementally with three additional commits since the last revision: - Add comment to describe behavior of adjust_old_garbage_threshold() - Simplify representation of growth percentages - Change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28561/files - new: https://git.openjdk.org/jdk/pull/28561/files/3c9a86ee..d5e4072d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=00-01 Stats: 39 lines in 4 files changed: 14 ins; 4 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/28561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28561/head:pull/28561 PR: https://git.openjdk.org/jdk/pull/28561 From kdnilsen at openjdk.org Tue Dec 9 01:23:31 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Dec 2025 01:23:31 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v2] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 16:32:53 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with three additional commits since the last revision: >> >> - Add comment to describe behavior of adjust_old_garbage_threshold() >> - Simplify representation of growth percentages >> - Change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 212: > >> 210: void slide_pinned_regions_to_front(); >> 211: bool all_candidates_are_pinned(); >> 212: void adjust_old_garbage_threshold(); > > A brief general comment about the algorithm here or in the implementation would be welcome. As I read it, we are lowering the region's garbage threshold as the occupancy in the old generation increases. Lowering the garbage threshold will increase the number of old regions selected for a mixed collection. Thanks. I've added this description. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2600711259 From kdnilsen at openjdk.org Tue Dec 9 01:27:03 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Dec 2025 01:27:03 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v7] In-Reply-To: References: <4V8h9n8guTinNiCNYtecwcEKZb4y8zz6Qnwidpc4lC4=.149b8dd2-e5d5-45ad-9f93-12be677a2072@github.com> Message-ID: On Mon, 8 Dec 2025 22:32:49 GMT, Xiaolong Peng wrote: >> There is one case, I think we may not want to make the test here overly complicated so I didn't add it: >> (!A && B && C && !D) >> >> It could happen after final-mark in bootstrap young GC, the young marking has done, old marking is in progress. > > I have updated PR, now is_generational is passed to the method as template parameter. When marking OLD and not marking YOUNG, there is no need to enforce SATB on array_copy if the destination resides in young. That is because such an array would essentially reside above the old-generation marking TAMS. I understand that we may choose to be a bit less aggressive in how much we optimize for improved code clarity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28669#discussion_r2600717666 From kdnilsen at openjdk.org Tue Dec 9 01:29:57 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Dec 2025 01:29:57 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v9] In-Reply-To: <4xcQSCGc6pweN2V4QQqX02_e06yCoXEEDPC5fH50DUE=.b2e0d913-af28-4b38-9bb4-652fbbdab614@github.com> References: <4xcQSCGc6pweN2V4QQqX02_e06yCoXEEDPC5fH50DUE=.b2e0d913-af28-4b38-9bb4-652fbbdab614@github.com> Message-ID: On Mon, 8 Dec 2025 22:47:45 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS(Old GC may not be started, TAMS of old region is not captured), arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case during concurrent young marking. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Fix indent Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28669#pullrequestreview-3554842845 From kdnilsen at openjdk.org Tue Dec 9 03:12:37 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Dec 2025 03:12:37 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v3] In-Reply-To: References: Message-ID: <9gzZlrEzwroH2F-yEC6uzYIPsoKEcTBs_6UZDNFvvCg=.e0e0697f-e537-41fb-9f7f-740acbab4ad1@github.com> > When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. > > When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers-gh - Add comment to describe behavior of adjust_old_garbage_threshold() - Simplify representation of growth percentages - Change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent - make old evac ratio adaptive - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers - change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers - Adjust test for new defaults - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers - ... and 2 more: https://git.openjdk.org/jdk/compare/edebc413...c7c22974 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28561/files - new: https://git.openjdk.org/jdk/pull/28561/files/d5e4072d..c7c22974 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=01-02 Stats: 12179 lines in 61 files changed: 8415 ins; 3463 del; 301 mod Patch: https://git.openjdk.org/jdk/pull/28561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28561/head:pull/28561 PR: https://git.openjdk.org/jdk/pull/28561 From xpeng at openjdk.org Tue Dec 9 03:31:09 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 9 Dec 2025 03:31:09 GMT Subject: RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking [v9] In-Reply-To: <4xcQSCGc6pweN2V4QQqX02_e06yCoXEEDPC5fH50DUE=.b2e0d913-af28-4b38-9bb4-652fbbdab614@github.com> References: <4xcQSCGc6pweN2V4QQqX02_e06yCoXEEDPC5fH50DUE=.b2e0d913-af28-4b38-9bb4-652fbbdab614@github.com> Message-ID: On Mon, 8 Dec 2025 22:47:45 GMT, Xiaolong Peng wrote: >> Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 >> >> It is caused by the behavior change from follow code: >> >> Original: >> >> if (ShenandoahSATBBarrier) { >> T* array = dst; >> HeapWord* array_addr = reinterpret_cast(array); >> ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); >> if (is_old_marking) { >> // Generational, old marking >> assert(_heap->mode()->is_generational(), "Invariant"); >> if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (_heap->mode()->is_generational()) { >> // Generational, young marking >> if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { >> arraycopy_work(array, count); >> } >> } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { >> // Non-generational, marking >> arraycopy_work(array, count); >> } >> } >> >> New: >> >> if (ShenandoahSATBBarrier) { >> if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { >> arraycopy_work(dst, count); >> } >> } >> >> >> >> With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS(Old GC may not be started, TAMS of old region is not captured), arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case during concurrent young marking. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Fix indent Thank you all for the reviews and valuable feedbacks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28669#issuecomment-3630074748 From xpeng at openjdk.org Tue Dec 9 03:31:10 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 9 Dec 2025 03:31:10 GMT Subject: Integrated: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 22:14:50 GMT, Xiaolong Peng wrote: > Chasing the root cause of JDK-8372498, I have narrowed down root cause to the commit https://github.com/openjdk/jdk/commit/f8cf9ca69cfef286c80559bfe1d147b6303d10d2 > > It is caused by the behavior change from follow code: > > Original: > > if (ShenandoahSATBBarrier) { > T* array = dst; > HeapWord* array_addr = reinterpret_cast(array); > ShenandoahHeapRegion* r = _heap->heap_region_containing(array_addr); > if (is_old_marking) { > // Generational, old marking > assert(_heap->mode()->is_generational(), "Invariant"); > if (r->is_old() && (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (_heap->mode()->is_generational()) { > // Generational, young marking > if (r->is_old() || (array_addr < _heap->marking_context()->top_at_mark_start(r))) { > arraycopy_work(array, count); > } > } else if (array_addr < _heap->marking_context()->top_at_mark_start(r)) { > // Non-generational, marking > arraycopy_work(array, count); > } > } > > New: > > if (ShenandoahSATBBarrier) { > if (!_heap->marking_context()->allocated_after_mark_start(reinterpret_cast(dst))) { > arraycopy_work(dst, count); > } > } > > > > With the new STAB barrier code for arraycopy_marking, if is it young GC and the array is in old region, but array is above TAMS(Old GC may not be started, TAMS of old region is not captured), arraycopy_work won't be applied anymore, so we may have missed some pointers in SATB in such case during concurrent young marking. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] repeat gc/TestAllocHumongousFragment.java#generational and sure it won't crash with the fix > - [x] GHA This pull request has now been integrated. Changeset: c9ab330b Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/c9ab330b7bdd3cc2410ffdb336a63aa0ac7256a3 Stats: 10 lines in 2 files changed: 5 ins; 0 del; 5 mod 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking 8372498: [genshen] gc/TestAllocHumongousFragment.java#generational causes intermittent SIGSEGV crashes Reviewed-by: wkemper, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/28669 From xpeng at openjdk.org Tue Dec 9 07:45:26 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 9 Dec 2025 07:45:26 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v16] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 260 commits: - Fix build failure after merge - Expend promoted from ShenandoahOldCollectorAllocator - Merge branch 'master' into cas-alloc-1 - Address PR comments - Merge branch 'openjdk:master' into cas-alloc-1 - Add missing header for ShenandoahFreeSetPartitionId - Declare ShenandoahFreeSetPartitionId as enum instead of enum class - Fix a typo - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition - ... and 250 more: https://git.openjdk.org/jdk/compare/5f083aba...06366b4b ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=15 Stats: 1642 lines in 25 files changed: 1295 ins; 235 del; 112 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From epeter at openjdk.org Tue Dec 9 07:57:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Dec 2025 07:57:06 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <2xxjKX6hMeKDfS9SGBEvll8yadDthCoUjCIRpaE8ObA=.b567ec00-7dad-4b57-82a4-db1149fc8942@github.com> References: <2xxjKX6hMeKDfS9SGBEvll8yadDthCoUjCIRpaE8ObA=.b567ec00-7dad-4b57-82a4-db1149fc8942@github.com> Message-ID: On Tue, 2 Dec 2025 13:52:04 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into JDK-8354282 >> - whitespace >> - review >> - review >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java >> >> Co-authored-by: Christian Hagedorn >> - review >> - review >> - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 > > Thanks for the update, it looks good to me! If @eme64 also agrees with the latest patch, we can submit some testing and then hopefully get it in right before the fork. @chhagedorn I see that an internal IR test is failing - one that you added a while back. Could you have a look what may have gone wrong? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3630859059 From chagedorn at openjdk.org Tue Dec 9 14:10:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Dec 2025 14:10:55 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:05:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review I had a look and it seems that the internal test is relying on a `CastII` node to be removed after loop opts, when we widen `CastII` nodes, to trigger an ideal optimization. That is no longer the case with this patch because we keep the `CastII` node in the graph. The fix would be to improve the ideal optimization to look through cast nodes. However, this feels out of scope, especially since this PR is a bug fix for JDK 26. I therefore propose to fix the internal test before integrating this PR and then follow up with an RFE to fix the ideal optimization. I can take care of this and let you know once this is done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3632449257 From roland at openjdk.org Tue Dec 9 15:03:07 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Dec 2025 15:03:07 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 14:07:22 GMT, Christian Hagedorn wrote: > I therefore propose to fix the internal test before integrating this PR and then follow up with an RFE to fix the ideal optimization. I can take care of this and let you know once this is done. That sounds good to me. Should I take care of the ideal transformation? Let me know when the internal test is so I can proceed with the integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3632713131 From chagedorn at openjdk.org Tue Dec 9 15:27:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Dec 2025 15:27:11 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:05:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks Roland! I'll let you know and file a follow-up RFE and assign it to you. I will dump all the relevant information in there with a test case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3632843345 From btaylor at openjdk.org Tue Dec 9 17:50:25 2025 From: btaylor at openjdk.org (Ben Taylor) Date: Tue, 9 Dec 2025 17:50:25 GMT Subject: RFR: 8352914: Shenandoah: Change definition of ShenandoahSharedValue to int32_t to leverage platform atomics In-Reply-To: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> References: <32HM2TBQGO0hbc42x3mah4v-JKwYZo7YiVNjrmc1r5M=.949fb4f6-5882-4c30-b9b6-e0adc7deca79@github.com> Message-ID: On Fri, 5 Dec 2025 18:44:08 GMT, Ben Taylor wrote: > The `STATIC_ASSERT` below this typedef appears to be out of date. > > The barriers check thread local copy of gc state, which is stored in `ShenandoahThreadLocalData::_gc_state` and is type `char`, so the size requirement described by the assert is maintained even after this change. > > Change passes all tier1 tests locally when run with Shenandoah GC. In SpecJBB 2015, I observed the following performance changes. These comparisons are only between single runs, so some variance should be expected. On aarch64: - Critical jops increased from 57513 to 57779, (+266 / +0.46%) - Max jops increased from 62164 to 53015 (+851 / +1.36%) On x86_64: - Critical jops increased from 86241 to 92047 (+5806 / 6.73%) - Max jops increased from 88479 to 95613 (+7134 / +8.06%) I also ran dacapo and specjvm, but no other results in any of the 3 benchmarks showed significant changes. Most other results were within 0.5% or less ------------- PR Comment: https://git.openjdk.org/jdk/pull/28681#issuecomment-3633443138 From chagedorn at openjdk.org Tue Dec 9 17:59:20 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Dec 2025 17:59:20 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: <7ZPdHr7IzEoj0yh45zEt-8ogQ8-2q435PPXieqqZKJU=.4366191f-729e-4e38-84f3-628f0d83cb33@github.com> On Fri, 5 Dec 2025 14:05:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review The internal test is fixed and sanity testing passed - you can move forward with integrating this PR :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3633535520 From wkemper at openjdk.org Tue Dec 9 18:15:08 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 Dec 2025 18:15:08 GMT Subject: RFR: 8314599: [GenShen] Couple adaptive tenuring and generation size budgeting [v13] In-Reply-To: References: Message-ID: > Notable changes: > * Improvements to logging > * More accurate tracking of promotion failures > * Use shared allocation for promotions only when the size is above the maximum plab size (not the minimum size) > * Use census information gathered during mark to size promotion reserves and old generation > > With these changes, GenShen is expected to have fewer promotion failures and this is indeed the case. As a result of this, we expect less time to be spent in concurrent marking and update refs for young collections. We may also expect shorter concurrent evacuation phases because GenShen will have fewer densely packed regions stuck in the young generation. With more objects being promoted, we also expect to see longer remembered set scan times. This is generally the case across all benchmarks, but we do also see some counter-intuitive results. > > Here we are comparing 20 executions (10 on x86, 10 on aarch64) of the changes in the PR (experiment) against 20 executions of the same benchmarks results from tip. This is a summary of statistically significant changes of more than 5% across all benchmarks: > > > Concurrent Evacuation: 7 improvements, 3 regressions > ? Best improvements: extremem-large-45g (-29.6%), neo4j-analytics (-26.9%) > ? Worst regression: xalan (+53.7%) > > Concurrent Marking: 15 improvements, 1 regression > ? Best improvements: hyperalloc_a2048_o4096 (-30.1%), crypto.rsa (-27.3%) > ? Only regression: serial (+8.9%) > > Concurrent Scan Remembered Set: 7 improvements, 2 regressions > ? Best improvements: xalan (-49.4%), pmd (-49.0%), crypto.rsa (-41.8%) > ? Worst regression: extremem-phased (+52.4%) > > Concurrent Update Refs: 5 improvements, 4 regressions > ? Best improvements: crypto.rsa (-36.4%), mnemonics (-28.4%) > ? Worst regression: xalan (+89.4%) William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 80 commits: - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Fix comments, add back an assertion - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Accommodate behavior of global heuristic - Restore missing update for inplace promotion padding - Remove reference to adaptive tuning flag - Remove commented out assertion - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Adaptive tenuring is no longer optional We are using age census data to compute promotion reserves. The tenuring threshold may still be fixed by setting the min/max threshold to the same value. - Remove bad asserts - ... and 70 more: https://git.openjdk.org/jdk/compare/b99be505...0869a46a ------------- Changes: https://git.openjdk.org/jdk/pull/27632/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27632&range=12 Stats: 398 lines in 11 files changed: 158 ins; 173 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/27632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27632/head:pull/27632 PR: https://git.openjdk.org/jdk/pull/27632 From xpeng at openjdk.org Tue Dec 9 18:22:46 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 9 Dec 2025 18:22:46 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v17] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Some comments updates as suggested in PR review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/06366b4b..5e660216 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=15-16 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Tue Dec 9 18:48:35 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 9 Dec 2025 18:48:35 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Tue, 2 Dec 2025 22:04:06 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 255 commits: >> >> - Add missing header for ShenandoahFreeSetPartitionId >> - Declare ShenandoahFreeSetPartitionId as enum instead of enum class >> - Fix a typo >> - Remove unnecessary `enum class ShenandoahFreeSetPartitionId : uint8_t` in shenandoahAllocator.php >> - Make ShenandoahAllocator as template class to make compiled code more efficient for each alloc partition >> - Port the fix of JDK-8372566 >> - Merge branch 'master' into cas-alloc-1 >> - Merge remote-tracking branch 'origin/master' into cas-alloc-1 >> - Remove junk code >> - Remove unnecessary change and tidy up >> - ... and 245 more: https://git.openjdk.org/jdk/compare/79e99bb0...7980c039 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 155: > >> 153: size_t min_free_words = req.is_lab_alloc() ? req.min_size() : req.size(); >> 154: ShenandoahHeapRegion* r = _free_set->find_heap_region_for_allocation(ALLOC_PARTITION, min_free_words, req.is_lab_alloc(), in_new_region); >> 155: // The region returned by find_heap_region_for_allocation must have sufficient free space for the allocation it if it is not nullptr > > comment has an extra "it" fixed. > src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 69: > >> 67: >> 68: // Attempt to allocate in shared alloc regions, the allocation attempt is done with atomic operation w/o >> 69: // holding heap lock. > > I would rewrite comment: > // Attempt to allocate in a shared alloc region using atomic operation without holding the heap lock. > // Returns nullptr and overwrites regions_ready_for_refresh with the number of shared alloc regions that are ready > // to be retired if it is unable to satisfy the allocation request from the existing shared alloc regions. Thanks, I have updated the the comments as you suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2603910894 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2603909318 From kdnilsen at openjdk.org Tue Dec 9 19:13:34 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Dec 2025 19:13:34 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v3] In-Reply-To: <9gzZlrEzwroH2F-yEC6uzYIPsoKEcTBs_6UZDNFvvCg=.e0e0697f-e537-41fb-9f7f-740acbab4ad1@github.com> References: <9gzZlrEzwroH2F-yEC6uzYIPsoKEcTBs_6UZDNFvvCg=.e0e0697f-e537-41fb-9f7f-740acbab4ad1@github.com> Message-ID: On Tue, 9 Dec 2025 03:12:37 GMT, Kelvin Nilsen wrote: >> When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. >> >> When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers-gh > - Add comment to describe behavior of adjust_old_garbage_threshold() > - Simplify representation of growth percentages > - Change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent > - make old evac ratio adaptive > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers > - change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers > - Adjust test for new defaults > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers > - ... and 2 more: https://git.openjdk.org/jdk/compare/92ea7c4d...c7c22974 I reran the performance results following changes suggested by reviewers with similar results: image ------------- PR Comment: https://git.openjdk.org/jdk/pull/28561#issuecomment-3633865135 From xpeng at openjdk.org Tue Dec 9 19:13:44 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 9 Dec 2025 19:13:44 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> <30wgBRAA7e-TzwsWTFagwcJdAxRcOeFD6-WJS-ashhA=.05995833-528a-4dec-820f-0307bf769520@github.com> Message-ID: On Wed, 3 Dec 2025 01:06:29 GMT, Xiaolong Peng wrote: >> I suppose we could use conservative values for a first implementation, as long as we file a "low priority" ticket to come back and revisit for improved efficiency at a later time. > > We don't really know what need to be recompute until the allocation finishes, we can make it less conservative, but then we needs more code branches here because the template methods require explicit template parameters. > > I'll create to ticket to follow up on this, given that I also want to see if we can defer the recomputation to the read side, if we can do that we don't even need the ShenandoahHeapAccountingUpdater here. I have created a bug for the improvement of accounting update: https://bugs.openjdk.org/browse/JDK-8373371 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2604001291 From kdnilsen at openjdk.org Tue Dec 9 19:19:06 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Dec 2025 19:19:06 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v3] In-Reply-To: <9gzZlrEzwroH2F-yEC6uzYIPsoKEcTBs_6UZDNFvvCg=.e0e0697f-e537-41fb-9f7f-740acbab4ad1@github.com> References: <9gzZlrEzwroH2F-yEC6uzYIPsoKEcTBs_6UZDNFvvCg=.e0e0697f-e537-41fb-9f7f-740acbab4ad1@github.com> Message-ID: On Tue, 9 Dec 2025 03:12:37 GMT, Kelvin Nilsen wrote: >> When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. >> >> When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers-gh > - Add comment to describe behavior of adjust_old_garbage_threshold() > - Simplify representation of growth percentages > - Change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent > - make old evac ratio adaptive > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers > - change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers > - Adjust test for new defaults > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers > - ... and 2 more: https://git.openjdk.org/jdk/compare/107e9e49...c7c22974 The outlier in trial 4 results from a late trigger, after 4.529s of GC idle time. This late trigger caused us to experience an allocation failure during marking, resulting in a stop-the-world degenerated pause of 188.556 ms. Other development efforts will mitigate issues with late triggering. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28561#issuecomment-3633883006 From xpeng at openjdk.org Tue Dec 9 21:06:29 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 9 Dec 2025 21:06:29 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v17] In-Reply-To: References: Message-ID: <2YVE3uR8bRJ_8qAtXN4WoRmeY0Y9xhzsmKbxqv5oL2M=.2811f02e-35b8-46cf-863a-db4006ca1a78@github.com> On Tue, 9 Dec 2025 18:22:46 GMT, Xiaolong Peng wrote: >> Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: >> >> * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. >> * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. >> * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. >> * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` >> >> I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: >> >> 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. >> >> java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" >> >> >> Openjdk TIP: >> >> ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== >> ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== >> ===== DaCapo tail ... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Some comments updates as suggested in PR review src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 41: > 39: _alloc_region_count(alloc_region_count), _free_set(free_set), _alloc_partition_name(ShenandoahRegionPartitions::partition_name(ALLOC_PARTITION)) { > 40: if (alloc_region_count > 0) { > 41: _alloc_regions = PaddedArray::create_unfreeable(alloc_region_count); Rethinking about the the PaddedArray used here, we may not really need it. Allocator has multiple shared alloc regions for CAS, and only refreshes them when all of them run out of usable memory, so _alloc_regions won't be frequently updated, the PaddedArray here should have a negative performance impact. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2604362492 From duke at openjdk.org Tue Dec 9 22:48:44 2025 From: duke at openjdk.org (Rui Li) Date: Tue, 9 Dec 2025 22:48:44 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 19:34:12 GMT, Kelvin Nilsen wrote: >> Rui Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused freeset includes > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 940: > >> 938: >> 939: size_t ShenandoahGeneration::soft_available_exclude_evac_reserve() const { >> 940: size_t result = available(ShenandoahHeap::heap()->soft_max_capacity() * (100.0 - ShenandoahEvacReserve) / 100); > > I'm a little uncomfortable with this approach. It's mostly a question of how we name it. The evac reserve is not always this value. In particular, we may shrink the young evac reserves after we have selected the cset. Also of concern is that if someone invokes this function on old_generation(), it looks like they'll get a bogus (not meaningful) value. > > I think I'd be more comfortable with naming this to something like "mutator_available_when_gc_is_idle()". If we keep it virtual, then OldGeneration should override with "assert(false, "Not relevant to old generation") Talked offline. Rename this to `soft_mutator_available` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2604612374 From kdnilsen at openjdk.org Tue Dec 9 22:55:48 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Dec 2025 22:55:48 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v3] In-Reply-To: <9gzZlrEzwroH2F-yEC6uzYIPsoKEcTBs_6UZDNFvvCg=.e0e0697f-e537-41fb-9f7f-740acbab4ad1@github.com> References: <9gzZlrEzwroH2F-yEC6uzYIPsoKEcTBs_6UZDNFvvCg=.e0e0697f-e537-41fb-9f7f-740acbab4ad1@github.com> Message-ID: On Tue, 9 Dec 2025 03:12:37 GMT, Kelvin Nilsen wrote: >> When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. >> >> When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers-gh > - Add comment to describe behavior of adjust_old_garbage_threshold() > - Simplify representation of growth percentages > - Change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent > - make old evac ratio adaptive > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers > - change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers > - Adjust test for new defaults > - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers > - ... and 2 more: https://git.openjdk.org/jdk/compare/08fdfe16...c7c22974 src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 249: > 247: _live_bytes_after_last_mark + (_live_bytes_after_last_mark * _growth_percent_before_compaction) / 100; > 248: size_t threshold_by_growth_into_percent_remaining = (size_t) > 249: (_live_bytes_after_last_mark + ((ShenandoahHeap::heap()->soft_max_capacity() - _live_bytes_after_last_mark) I need to protect against underflow here. It might be that _live_bytes_after_last_mark is greater than soft_max_capacity(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2604625802 From duke at openjdk.org Tue Dec 9 23:16:35 2025 From: duke at openjdk.org (Rui Li) Date: Tue, 9 Dec 2025 23:16:35 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v4] In-Reply-To: References: Message-ID: > Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. > > Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: > > > available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used > soft_tail = Xmx - soft_max > if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc > > > The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. > > > Suggested fix: when deciding when to trigger gc, use logic similar to below: > > mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; > available = mutator_soft_capacity - used; > if (available < mutator_soft_capacity) // trigger gc > ``` > > Tests: > - Ran the repro app `StableLiveSet.java` in https://bugs.openjdk.org/browse/JDK-8372543. Without fix, tip had ~2910 times gc in 20 sec with `-XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive -XX:SoftMaxHeapSize=512m -Xmx31g` jvm args. With the fix, only 18 times in 20 sec. > - GHA passed. > > ------- > This change also improved gc logging: > > Before: > > [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) > [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% > external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B > > > After: > > [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) > [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: > 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: > 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: > 122, Reserved: 102M, Max free available in a single region: 1024K; Rui Li has updated the pull request incrementally with one additional commit since the last revision: log and naming fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28622/files - new: https://git.openjdk.org/jdk/pull/28622/files/599cc2d7..8dea5164 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28622&range=02-03 Stats: 33 lines in 11 files changed: 0 ins; 14 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/28622.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28622/head:pull/28622 PR: https://git.openjdk.org/jdk/pull/28622 From kdnilsen at openjdk.org Tue Dec 9 23:21:39 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Dec 2025 23:21:39 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v4] In-Reply-To: References: Message-ID: > When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. > > When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Protect against underflow when computing old growth trigger threshold ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28561/files - new: https://git.openjdk.org/jdk/pull/28561/files/c7c22974..79a21ee6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=02-03 Stats: 10 lines in 1 file changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28561/head:pull/28561 PR: https://git.openjdk.org/jdk/pull/28561 From xpeng at openjdk.org Wed Dec 10 01:00:07 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 10 Dec 2025 01:00:07 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v18] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 262 commits: - Merge branch 'master' into cas-alloc-1 - Some comments updates as suggested in PR review - Fix build failure after merge - Expend promoted from ShenandoahOldCollectorAllocator - Merge branch 'master' into cas-alloc-1 - Address PR comments - Merge branch 'openjdk:master' into cas-alloc-1 - Add missing header for ShenandoahFreeSetPartitionId - Declare ShenandoahFreeSetPartitionId as enum instead of enum class - Fix a typo - ... and 252 more: https://git.openjdk.org/jdk/compare/020e3f95...c8e98bce ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=17 Stats: 1643 lines in 25 files changed: 1296 ins; 235 del; 112 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From roland at openjdk.org Wed Dec 10 08:08:28 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 Dec 2025 08:08:28 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: <7ZPdHr7IzEoj0yh45zEt-8ogQ8-2q435PPXieqqZKJU=.4366191f-729e-4e38-84f3-628f0d83cb33@github.com> References: <7ZPdHr7IzEoj0yh45zEt-8ogQ8-2q435PPXieqqZKJU=.4366191f-729e-4e38-84f3-628f0d83cb33@github.com> Message-ID: On Tue, 9 Dec 2025 17:56:32 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > The internal test is fixed and sanity testing passed - you can move forward with integrating this PR :-) @chhagedorn @eme64 @merykitty thanks for the reviews and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3635860312 From chagedorn at openjdk.org Wed Dec 10 08:43:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Dec 2025 08:43:36 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:05:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3561363531 From roland at openjdk.org Wed Dec 10 08:48:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 Dec 2025 08:48:41 GMT Subject: Integrated: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: <37GHpeFd6FKbfVuMmdhz9-YPcEQcC_fYBRjlLzrkRHg=.f865988e-1ca9-4731-921c-b73029c484cd@github.com> On Thu, 10 Apr 2025 15:15:54 GMT, Roland Westrelin wrote: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. This pull request has now been integrated. Changeset: 00068a80 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/00068a80304a809297d0df8698850861e9a1c5e9 Stats: 367 lines in 13 files changed: 266 ins; 27 del; 74 mod 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs Reviewed-by: chagedorn, qamai, galder, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24575 From epeter at openjdk.org Wed Dec 10 13:21:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 13:21:27 GMT Subject: RFR: 8372779: C2: Disambiguate Node::adr_type for the IR graph [v3] In-Reply-To: References: Message-ID: On Sun, 7 Dec 2025 12:12:20 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Node::adr_type` is ambiguous. For some, it refers to the memory the node consumes, while for the others, it refer to the memory the node produces. This PR removes that ambiguity by introducing `Node::in_adr_type` and `Node::out_adr_type` that refer to those properties, respectively. It also introduces a local verification of the memory graph during compilation. These additions uncover some issues: >> >> - Sometimes, the memory is wired incorrectly, such as in `LibraryCall::extend_setCurrentThread`, the `Phi` collect the `StoreNode`s instead of the whole memory state. I think these issues do not result in crashes or miscompilation, though. >> - `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. >> - For nodes such as `StrInflatedCopyNode`, as it consumes more than it produces, during scheduling, we need to compute anti-dependencies. This is not the case, so I fixed it by making it kill all the memory it consumes. >> - `GraphKit::set_output_for_allocation` uses a raw `ProjNode` as the base for a `MergeMem`, this is really suspicious. I didn't fix it, as it seems to not result in any symptom at the moment. >> >> In the end, the execution of the compiler is strictly more restricted than before, and there is less room for ambiguity. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into adrtype > - store_to_memory does not emit MemBars > - Disambiguate Node::adr_type @merykitty I can't promise that I'll review the PR yet. But I have an initial question: Since some cases were wrong, does that mean we can find reproducers for those cases? Because it may be worth backporting some cases, but we would first need to know if there are cases that actually are currently wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28570#issuecomment-3637055561 From eastigeevich at openjdk.org Wed Dec 10 15:09:30 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 10 Dec 2025 15:09:30 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v15] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Benchmarking results: Neoverse-N1 r3p1 (Graviton 2), 5 000 nmethods, 1 parallel thread, 1 concurrent thread, 3 forks, ms/op (lower better) > > - ZGC > > | Benchmark | accessedFieldCount | Baseline | Error (Baseline) | Fix | ... Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Fix tier1 failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/b9380fd8..85691beb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=13-14 Stats: 69 lines in 10 files changed: 27 ins; 31 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From ysr at openjdk.org Wed Dec 10 16:51:05 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 10 Dec 2025 16:51:05 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking [v5] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 23:31:41 GMT, Nityanand Rai wrote: >> Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. > > Nityanand Rai has updated the pull request incrementally with one additional commit since the last revision: > > hardening of comments > > remove unintended files Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28204#pullrequestreview-3563535274 From shade at openjdk.org Wed Dec 10 16:54:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 Dec 2025 16:54:52 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking [v5] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 23:31:41 GMT, Nityanand Rai wrote: >> Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. > > Nityanand Rai has updated the pull request incrementally with one additional commit since the last revision: > > hardening of comments > > remove unintended files I am good with comments as they are. Thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28204#pullrequestreview-3563550852 From ysr at openjdk.org Wed Dec 10 16:54:54 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 10 Dec 2025 16:54:54 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking [v3] In-Reply-To: References: <-0iMsHeZnk_Ld_6D9zCBNFVcXi9rIq9S0NmmYEgqb0I=.ffb1591a-83a7-47be-86ff-a5646b51e3e1@github.com> Message-ID: <-92CnLR8zJxFhPPoimnzjae5ox6WSUh9YV1f4d5m4OI=.7a0eedf8-3ef6-4e4b-9d69-989d3e8014d2@github.com> On Tue, 18 Nov 2025 23:06:47 GMT, Nityanand Rai wrote: >> Nityanand Rai has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Aleksey Shipil?v > > Here are GenShen results over 100+ sample collections > > +1.90% philosophers/philosophers_duration p=0.00142 (Wilcoxon) > Control: 2.620s (+/-173.82ms) 120 > Test: 2.670s (+/-205.24ms) 120 > > +1.68% finagle-chirper/finagle-chirper_duration p=0.00343 (Wilcoxon) > Control: 2.802s (+/-130.38ms) 360 > Test: 2.850s (+/-131.25ms) 360 > > +1.04% rx-scrabble/rx-scrabble_duration p=0.00000 (Wilcoxon) > Control: 148.308ms (+/- 0.88ms) 320 > Test: 149.855ms (+/- 0.60ms) 320 > > -2.48% scrabble/scrabble_duration p=0.00000 (Wilcoxon) > Control: 169.599ms (+/- 2.90ms) 200 > Test: 165.399ms (+/- 3.40ms) 200 > > -2.24% scala-kmeans/scala-kmeans_duration p=0.00000 (Wilcoxon) > Control: 479.973ms (+/- 1.92ms) 200 > Test: 469.219ms (+/- 1.98ms) 200 @nityarai08 : > Description states: > > > Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. > > I don't see any change for the latter. I believe `UseCondCardMark` is already enabled with GenShen, and the barrier ode respects it. > > Also `Exclude young-young, old-old` skips the part of the change that also skips `young->old`. I'd reword the synopsis to: > > > Skips card marks for stores in young generation objects, old -> old, and null stores. > > which I think is what the code does. > Once description/synopsis is corrected, this PR looks good to go. Approved in anticipation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28204#issuecomment-3638042985 From wkemper at openjdk.org Wed Dec 10 17:07:18 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 Dec 2025 17:07:18 GMT Subject: RFR: Merge openjdk/jdk21u:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.10+5 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/230/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/230/files/3c0530fd..3c0530fd Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=230&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=230&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/230.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/230/head:pull/230 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/230 From wkemper at openjdk.org Wed Dec 10 17:07:20 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 Dec 2025 17:07:20 GMT Subject: Integrated: Merge openjdk/jdk21u:master In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 14:26:46 GMT, William Kemper wrote: > Merges tag jdk-21.0.10+5 This pull request has now been integrated. Changeset: 40994da8 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/40994da8b0f0a97d8fa44e67a52965d8c43f8053 Stats: 1413 lines in 32 files changed: 678 ins; 680 del; 55 mod Merge ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/230 From xpeng at openjdk.org Wed Dec 10 17:07:37 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 10 Dec 2025 17:07:37 GMT Subject: [jdk26] RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking Message-ID: Hi all, This pull request contains a clean backport of commit [c9ab330b](https://github.com/openjdk/jdk/commit/c9ab330b7bdd3cc2410ffdb336a63aa0ac7256a3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Xiaolong Peng on 9 Dec 2025 and was reviewed by William Kemper and Kelvin Nilsen. It is a necessary bug fix to address the crash caused by recently change to arraycopy barrier. Thanks! ------------- Commit messages: - Backport c9ab330b7bdd3cc2410ffdb336a63aa0ac7256a3 Changes: https://git.openjdk.org/jdk/pull/28751/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28751&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373116 Stats: 10 lines in 2 files changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28751.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28751/head:pull/28751 PR: https://git.openjdk.org/jdk/pull/28751 From shade at openjdk.org Wed Dec 10 17:11:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 Dec 2025 17:11:30 GMT Subject: [jdk26] RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 16:59:06 GMT, Xiaolong Peng wrote: > Hi all, > > This pull request contains a clean backport of commit [c9ab330b](https://github.com/openjdk/jdk/commit/c9ab330b7bdd3cc2410ffdb336a63aa0ac7256a3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaolong Peng on 9 Dec 2025 and was reviewed by William Kemper and Kelvin Nilsen. > > It is a necessary bug fix to address the crash caused by recently change to arraycopy barrier. > > Thanks! Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28751#pullrequestreview-3563623666 From wkemper at openjdk.org Wed Dec 10 17:37:16 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 Dec 2025 17:37:16 GMT Subject: [jdk26] RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 16:59:06 GMT, Xiaolong Peng wrote: > Hi all, > > This pull request contains a clean backport of commit [c9ab330b](https://github.com/openjdk/jdk/commit/c9ab330b7bdd3cc2410ffdb336a63aa0ac7256a3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaolong Peng on 9 Dec 2025 and was reviewed by William Kemper and Kelvin Nilsen. > > It is a necessary bug fix to address the crash caused by recently change to arraycopy barrier. > > Thanks! Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28751#pullrequestreview-3563722464 From xpeng at openjdk.org Wed Dec 10 17:37:17 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 10 Dec 2025 17:37:17 GMT Subject: [jdk26] RFR: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 16:59:06 GMT, Xiaolong Peng wrote: > Hi all, > > This pull request contains a clean backport of commit [c9ab330b](https://github.com/openjdk/jdk/commit/c9ab330b7bdd3cc2410ffdb336a63aa0ac7256a3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaolong Peng on 9 Dec 2025 and was reviewed by William Kemper and Kelvin Nilsen. > > It is a necessary bug fix to address the crash caused by recently change to arraycopy barrier. > > Thanks! Thank you all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28751#issuecomment-3638204799 From xpeng at openjdk.org Wed Dec 10 17:37:17 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 10 Dec 2025 17:37:17 GMT Subject: [jdk26] Integrated: 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking In-Reply-To: References: Message-ID: <3ZHDolYBNGlDXNjr2Eo1DUWTLTsFUh-zndK53Bu75wg=.1e9ceb99-7347-4fc9-8a99-bf18f1a4c70c@github.com> On Wed, 10 Dec 2025 16:59:06 GMT, Xiaolong Peng wrote: > Hi all, > > This pull request contains a clean backport of commit [c9ab330b](https://github.com/openjdk/jdk/commit/c9ab330b7bdd3cc2410ffdb336a63aa0ac7256a3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaolong Peng on 9 Dec 2025 and was reviewed by William Kemper and Kelvin Nilsen. > > It is a necessary bug fix to address the crash caused by recently change to arraycopy barrier. > > Thanks! This pull request has now been integrated. Changeset: 15b5789f Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/15b5789f554cb7b2467a6a0efb4e4cd129ee609b Stats: 10 lines in 2 files changed: 5 ins; 0 del; 5 mod 8373116: Genshen: arraycopy_work should be always done for arrays in old gen during young concurrent marking 8372498: [genshen] gc/TestAllocHumongousFragment.java#generational causes intermittent SIGSEGV crashes Reviewed-by: shade, wkemper Backport-of: c9ab330b7bdd3cc2410ffdb336a63aa0ac7256a3 ------------- PR: https://git.openjdk.org/jdk/pull/28751 From qamai at openjdk.org Wed Dec 10 17:40:43 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 10 Dec 2025 17:40:43 GMT Subject: RFR: 8372779: C2: Disambiguate Node::adr_type for the IR graph [v3] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 13:18:48 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into adrtype >> - store_to_memory does not emit MemBars >> - Disambiguate Node::adr_type > > @merykitty I can't promise that I'll review the PR yet. But I have an initial question: > Since some cases were wrong, does that mean we can find reproducers for those cases? Because it may be worth backporting some cases, but we would first need to know if there are cases that actually are currently wrong. @eme64 Thanks a lot for taking a look. Normally, those intrinsics are not exposed bare, so it is hard to misschedule them. But I can craft one, at least on my machine, it fails with only `--add-opens=java.base/java.lang=ALL-UNNAMED` import java.lang.invoke.MethodHandle; import java.lang.invoke.MethodHandles; import java.lang.invoke.MethodType; public class TestAntiDependency { static final MethodHandle COMPRESS_HANDLE; static { try { var lookup = MethodHandles.privateLookupIn(String.class, MethodHandles.lookup()); Class stringUtf16Class = lookup.findClass("java.lang.StringUTF16"); COMPRESS_HANDLE = lookup.findStatic(stringUtf16Class, "compress", MethodType.methodType(int.class, char[].class, int.class, byte[].class, int.class, int.class)); } catch (Exception e) { throw new RuntimeException(e); } } public static void main(String[] args) throws Throwable { for (int i = 0; i < 50000; i++) { if (test() != 0) { throw new AssertionError(); } } } static int test() throws Throwable { char[] src = new char[4]; byte[] dst = new byte[4]; int l = (int) COMPRESS_HANDLE.invokeExact(src, 0, dst, 0, 4); src[0] = 1; return dst[0]; } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/28570#issuecomment-3638228258 From qamai at openjdk.org Wed Dec 10 17:45:10 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 10 Dec 2025 17:45:10 GMT Subject: RFR: 8372779: C2: Disambiguate Node::adr_type for the IR graph [v3] In-Reply-To: References: Message-ID: <-Uq0PhdsB_FeQ6DCNRes_tvSzy0uJhKTD73OL26r2-4=.47ae2131-ad40-40b2-bddb-d306a386cffb@github.com> On Sun, 7 Dec 2025 12:12:20 GMT, Quan Anh Mai wrote: >> Hi, >> >> Currently, `Node::adr_type` is ambiguous. For some, it refers to the memory the node consumes, while for the others, it refer to the memory the node produces. This PR removes that ambiguity by introducing `Node::in_adr_type` and `Node::out_adr_type` that refer to those properties, respectively. It also introduces a local verification of the memory graph during compilation. These additions uncover some issues: >> >> - Sometimes, the memory is wired incorrectly, such as in `LibraryCall::extend_setCurrentThread`, the `Phi` collect the `StoreNode`s instead of the whole memory state. I think these issues do not result in crashes or miscompilation, though. >> - `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. >> - For nodes such as `StrInflatedCopyNode`, as it consumes more than it produces, during scheduling, we need to compute anti-dependencies. This is not the case, so I fixed it by making it kill all the memory it consumes. >> - `GraphKit::set_output_for_allocation` uses a raw `ProjNode` as the base for a `MergeMem`, this is really suspicious. I didn't fix it, as it seems to not result in any symptom at the moment. >> >> In the end, the execution of the compiler is strictly more restricted than before, and there is less room for ambiguity. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into adrtype > - store_to_memory does not emit MemBars > - Disambiguate Node::adr_type For the other issues (`AryEqNode` advertises incorrect `adr_type` and `LibraryCall::extend_setCurrentThread` incorrectly wires the memory nodes), there is no immediate issue apart from incorrectly looking graph, so I have not come up with a real failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28570#issuecomment-3638245320 From shade at openjdk.org Wed Dec 10 19:19:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 Dec 2025 19:19:22 GMT Subject: RFR: 8373266: Strengthen constant CardTable base accesses Message-ID: Shenandoah and G1 are using CardTable for most of its infrastructure, but flip the card tables as they go, and maintain the actual card table reference in TLS. As such, accessing card table base from assembler and compilers runs into risk of accidentally encoding the wrong card table base in generated code. Most of the current code avoids this trouble by carefully implementing their GC barriers to avoid touching shared parts where card table base constness is assumed. _Except_ for JVMCI, that reads the card table base for G1 barrier set, and that is wrong. The JVMCI users would need to rectify this downstream. Shenandoah added a few asserts to catch these errors: SHENANDOAHGC_ONLY(assert(!UseShenandoahGC, "Shenandoah byte_map_base is not constant.");) ...but G1 would also benefit from the similar safety mechanism. This PR strengthens the code to prevent future accidents. Additional testing: - [x] Linux x86_64 server fastdebug, `hotspot_gc` - [x] Linux x86_64 server fastdebug, `tier1` with Serial, Parallel, G1, Shenandoah, Z - [x] Linux AArch64 server fastdebug, `tier1` with Serial, Parallel, G1, Shenandoah, Z - [x] GHA, cross-compilation only ------------- Commit messages: - Another build fix - Fix Minimal builds - Shenandoah non-generational can have nullptr card table - Also simplify CTBS builder - CI should also mention "const" - Fix JVMCI by answering proper things - Merge branch 'master' into JDK-8373266-cardtable-asserts - More fixes Changes: https://git.openjdk.org/jdk/pull/28703/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28703&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373266 Stats: 99 lines in 15 files changed: 35 ins; 25 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/28703.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28703/head:pull/28703 PR: https://git.openjdk.org/jdk/pull/28703 From kdnilsen at openjdk.org Wed Dec 10 21:29:57 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 10 Dec 2025 21:29:57 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v3] In-Reply-To: References: <9gzZlrEzwroH2F-yEC6uzYIPsoKEcTBs_6UZDNFvvCg=.e0e0697f-e537-41fb-9f7f-740acbab4ad1@github.com> Message-ID: On Tue, 9 Dec 2025 22:53:03 GMT, Kelvin Nilsen wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers-gh >> - Add comment to describe behavior of adjust_old_garbage_threshold() >> - Simplify representation of growth percentages >> - Change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent >> - make old evac ratio adaptive >> - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers >> - change default value of ShenandoahMinOldGenGrowthRemainingHeapPercent >> - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers >> - Adjust test for new defaults >> - Merge remote-tracking branch 'jdk/master' into more-adaptive-old-triggers >> - ... and 2 more: https://git.openjdk.org/jdk/compare/b2386fa4...c7c22974 > > src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 249: > >> 247: _live_bytes_after_last_mark + (_live_bytes_after_last_mark * _growth_percent_before_compaction) / 100; >> 248: size_t threshold_by_growth_into_percent_remaining = (size_t) >> 249: (_live_bytes_after_last_mark + ((ShenandoahHeap::heap()->soft_max_capacity() - _live_bytes_after_last_mark) > > I need to protect against underflow here. It might be that _live_bytes_after_last_mark is greater than soft_max_capacity(). Done and tested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608266583 From wkemper at openjdk.org Wed Dec 10 21:35:14 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 Dec 2025 21:35:14 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v4] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 23:21:39 GMT, Kelvin Nilsen wrote: >> When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. >> >> When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Protect against underflow when computing old growth trigger threshold src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.hpp line 290: > 288: State _state; > 289: > 290: static const size_t FRACTIONAL_DENOMINATOR = 65536; Is anything still using this constant? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608285011 From kdnilsen at openjdk.org Wed Dec 10 21:50:29 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 10 Dec 2025 21:50:29 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: > When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. > > When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove FRACTIONAL_DENOMINATOR constat ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28561/files - new: https://git.openjdk.org/jdk/pull/28561/files/79a21ee6..41a83389 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28561/head:pull/28561 PR: https://git.openjdk.org/jdk/pull/28561 From kdnilsen at openjdk.org Wed Dec 10 21:50:31 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 10 Dec 2025 21:50:31 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v4] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 21:32:58 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Protect against underflow when computing old growth trigger threshold > > src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.hpp line 290: > >> 288: State _state; >> 289: >> 290: static const size_t FRACTIONAL_DENOMINATOR = 65536; > > Is anything still using this constant? Thanks for the catch. Removed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608316470 From wkemper at openjdk.org Wed Dec 10 21:55:31 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 Dec 2025 21:55:31 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 21:50:29 GMT, Kelvin Nilsen wrote: >> When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. >> >> When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 25%, of the memory not live in old at the last marking of old. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove FRACTIONAL_DENOMINATOR constat Thank you for the comments and the simplification. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28561#pullrequestreview-3564640549 From xpeng at openjdk.org Thu Dec 11 00:00:37 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 00:00:37 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Fri, 5 Dec 2025 18:47:56 GMT, William Kemper wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Set requested gc cause under a lock when allocation fails > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 145: > >> 143: // Notifies the control thread, but does not update the requested cause or generation. >> 144: // The overloaded variant should be used when the _control_lock is already held. >> 145: void notify_cancellation(GCCause::Cause cause); > > These methods were the root cause here. `ShenandoahHeap::_canceled_gc` is read/written atomically, but `ShenandoahGenerationalControlThread::_requested_gc_cause` is read/written under a lock. These `notify_cancellation` methods did _not_ update `_requested_gc_cause` at all. So, in the failure I observed we had: > 1. Control thread finishes cycle and sees no cancellation is requested (no lock used). > 2. Mutator thread fails allocation, cancels GC (again, no lock used), and does _not_ change `_requested_gc_cause`. > 3. Control thread takes `_control_lock` and checks `_requested_gc_cause` and sees `_no_gc` (because `notify_cancellation` didn't change it) and `waits` forever now. > > > The fix here is to replace `notify_cancellation` with `notify_control_thread` which serializes updates to `_requested_gc_cause` under `_control_lock`. I was looking at the places where `ShenandoahHeap::clear_cancelled_gc` is called, I feel the problem is more likely from op_final_update_refs: void ShenandoahConcurrentGC::op_final_update_refs() { ShenandoahHeap* const heap = ShenandoahHeap::heap(); ... ... // Clear cancelled GC, if set. On cancellation path, the block before would handle // everything. if (heap->cancelled_gc()) { heap->clear_cancelled_gc(); } ... ... } Let's say there is concurrent GC running, right before the final update refs safepoint, there is mutator allocation failure: 1. The mutator tries to cancel the the concurrent GC and notify controller thread. 2. The mutator block itself at `_alloc_failure_waiters_lock`, claiming safepoint safe as well. 3. concurrent GC enter the final update refs (VM operation) 4. in final update refs, VMThread sees cancelled_gc and clear it. 5. concurrent GC finishes, but cancelled_gc has been cleared so it won't notify the mutator. The fix seem to work in generational mode, but may not work in non-generational mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28665#discussion_r2608573677 From xpeng at openjdk.org Thu Dec 11 00:15:11 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 00:15:11 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Wed, 10 Dec 2025 23:35:45 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 145: >> >>> 143: // Notifies the control thread, but does not update the requested cause or generation. >>> 144: // The overloaded variant should be used when the _control_lock is already held. >>> 145: void notify_cancellation(GCCause::Cause cause); >> >> These methods were the root cause here. `ShenandoahHeap::_canceled_gc` is read/written atomically, but `ShenandoahGenerationalControlThread::_requested_gc_cause` is read/written under a lock. These `notify_cancellation` methods did _not_ update `_requested_gc_cause` at all. So, in the failure I observed we had: >> 1. Control thread finishes cycle and sees no cancellation is requested (no lock used). >> 2. Mutator thread fails allocation, cancels GC (again, no lock used), and does _not_ change `_requested_gc_cause`. >> 3. Control thread takes `_control_lock` and checks `_requested_gc_cause` and sees `_no_gc` (because `notify_cancellation` didn't change it) and `waits` forever now. >> >> >> The fix here is to replace `notify_cancellation` with `notify_control_thread` which serializes updates to `_requested_gc_cause` under `_control_lock`. > > I was looking at the places where `ShenandoahHeap::clear_cancelled_gc` is called, I feel the problem is more likely from op_final_update_refs: > > > void ShenandoahConcurrentGC::op_final_update_refs() { > ShenandoahHeap* const heap = ShenandoahHeap::heap(); > ... > ... > // Clear cancelled GC, if set. On cancellation path, the block before would handle > // everything. > if (heap->cancelled_gc()) { > heap->clear_cancelled_gc(); > } > ... > ... > } > > > Let's say there is concurrent GC running, right before the final update refs safepoint, there is mutator allocation failure: > 1. The mutator tries to cancel the the concurrent GC and notify controller thread. > 2. The mutator block itself at `_alloc_failure_waiters_lock`, claiming safepoint safe as well. > 3. concurrent GC enter the final update refs (VM operation) > 4. in final update refs, VMThread sees cancelled_gc and clear it. > 5. concurrent GC finishes, but cancelled_gc has been cleared so it won't notify the mutator. > > The fix seems to work in generational mode, but may not work in non-generational mode. While I was staring at the code ShenandoahController::handle_alloc_failure today, I found there is discrepancy between ShenandoahGenerationalControlThread and ShenandoahControlThread, I created a [bug](https://bugs.openjdk.org/browse/JDK-8373468) to unify the behavior, we could fix the issue in ShenandoahControlThread there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28665#discussion_r2608651279 From ysr at openjdk.org Thu Dec 11 00:58:31 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 11 Dec 2025 00:58:31 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 21:50:29 GMT, Kelvin Nilsen wrote: >> When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. >> >> When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 35%, of the memory not live in old at the last marking of old. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove FRACTIONAL_DENOMINATOR constat I left a few cosmetic comments, but I have the feeling that the verbal description of the parameters isn't faithful to the actual implementation. Moreover, it appeared to me, quite subjectively, that the triggering criteria are sufficiently complex that capturing them accurately as tunables that users could usefully tune is a bit difficult without reference to how the adjustments themselves work. I think your changes are probably good and yield improvements in performance that you can demonstrate in practice. So I think these changes should go in. That said, I feel there may be an opportunity to slightly simplify the implementation or at least expose only those tunables that one can expect to be able to adjust easily. As it stands, I find the implementation to be such that most users will likely have a hard time using these tunables in any intelligent way. I'd like us to see if we can somehow convey the tuning aspect more clearly in the description of these parameters (while hopefully not compormising accuracy of the descriptions). src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 105: > 103: size_t _fragmentation_last_old_region; > 104: > 105: // adapted value of ShenandoahOldGarbageThreshold May be reword to: // a dynamic threshold of garbage for an old // region to be deemed eligible for evacuation. since `ShenandoahOldGarbageThreshold` is a constant parameter to the JVM. src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 206: > 204: bool is_experimental() override; > 205: > 206: Although just an accessor, I'd document this API, perhaps using its intended usage as understood by its client: // Returns the current value of a dynamically // adjusted threshold percentage of garbage // above which an Old region should be deemed // eligible for evacuation. src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 214: > 212: > 213: // The normal old_garbage_threshold is specified by ShenandoahOldGarbageThreshold command-line argument, with default > 214: // value 25, denoting that a region that has at least 25% garbage is eligible for compaction. With default values for compaction or evacuation? src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 509: > 507: > 508: // Preselect for inclusion into the collection set regions whose age is at or above tenure age which contain more than > 509: // the old garbage threshold amount of garbage. We identify these regions by setting the appropriate entry of Is amount in this sentence a percentage? If so, I'd say: `We only select regions whose garbage percentage exceeds a dynamically adjusted threshold.` Using "old garbage threshold amount" by itself can be confusing, since "amount" doesn't sound like a percentage which is what I believe we mean here. src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 119: > 117: _card_scan(nullptr), > 118: _state(WAITING_FOR_BOOTSTRAP), > 119: _growth_percent_before_compaction(INITIAL_PERCENT_GROWTH_BEFORE_COMPACTION) Use either "percent_growth" or "growth_percent" consistently in both names. src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.hpp line 306: > 304: > 305: // How much growth in usage before we trigger old collection as a percent of soft_max_capacity > 306: size_t _growth_percent_before_compaction; I'd prefer we didn't use the term "compaction". Would replacing it with "collection", as in the comment, work? Or is there a specific reason you want to say "compaction" here? src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.hpp line 307: > 305: // How much growth in usage before we trigger old collection as a percent of soft_max_capacity > 306: size_t _growth_percent_before_compaction; > 307: See comment on naming of percent quantities in .cpp above. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 65: > 63: "(Generational mode only) If the usage within old generation " \ > 64: "has grown by at least this percent of its live memory size " \ > 65: "at the start of the previous old-generation marking effort, " \ Did you intend to say "at the _end_ of the previous old-generation marking effort" above? src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 71: > 69: product(double, ShenandoahMinOldGenGrowthRemainingHeapPercent, \ > 70: 35, EXPERIMENTAL, \ > 71: "(Generational mode only) If the usage within old generation " \ I find this comment very confusing, and not amenable to use as a tuning device. Can thus be converted into a knob that is history-independent/memory-less -- i.e. solely state-dependent? Otherwise, let's try and word this so that it's simpler to parse. This could include a multiple sentence description. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 90: > 88: "ShenandoahGenerationalDoNotIgnoreGrowthAfterYoungCycles " \ > 89: "consecutive cycles have been completed following the " \ > 90: "preceding old-gen collection.") \ Here again, like my remark below, we can effectively decouple the two options and simplify the verbage by merely saying: // Do not use Old generation growth as a triggering criterion // when usage is lower than this percentage of heap. I am not sure if "of heap" is correct, or if there is some other implicit percentage of the old generation capacity that one has in mind here. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 96: > 94: 100, EXPERIMENTAL, \ > 95: "(Generational mode only) Even if the usage of old generation " \ > 96: "is below ShenandoahIgnoreOldGrowthBelowPercentage, " \ The reference to `ShenandoahIgnoreOldGrowthBelowPercentage` (SIOGBP) seems to me to be spurious and confusing. I think this might be a simpler phrasing, without any reference to SIOGBP: \\ Trigger an Old collection if Old generation usage has grown, \\ and this many Young collections have happened, \\ since the last Old collection. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28561#pullrequestreview-3564758554 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608434949 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608566349 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608443529 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608509748 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608643499 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608604393 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608648317 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608454990 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608498445 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608491539 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608477270 From ysr at openjdk.org Thu Dec 11 00:58:32 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 11 Dec 2025 00:58:32 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 22:28:01 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FRACTIONAL_DENOMINATOR constat > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 105: > >> 103: size_t _fragmentation_last_old_region; >> 104: >> 105: // adapted value of ShenandoahOldGarbageThreshold > > May be reword to: > > // a dynamic threshold of garbage for an old > // region to be deemed eligible for evacuation. > > since `ShenandoahOldGarbageThreshold` is a constant parameter to the JVM. As I write this, I realize "Old region" may not be the right term here. It should be "an Old or otherwise tenurable region" because it seems regions that are Young but tenurable are filtered through this check, not just Old regions? > src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 119: > >> 117: _card_scan(nullptr), >> 118: _state(WAITING_FOR_BOOTSTRAP), >> 119: _growth_percent_before_compaction(INITIAL_PERCENT_GROWTH_BEFORE_COMPACTION) > > Use either "percent_growth" or "growth_percent" consistently in both names. And make those two consistent with either "percent_live" or "live_percent" below. (These comments from me actually belong in the .hpp where these are defined.) > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 71: > >> 69: product(double, ShenandoahMinOldGenGrowthRemainingHeapPercent, \ >> 70: 35, EXPERIMENTAL, \ >> 71: "(Generational mode only) If the usage within old generation " \ > > I find this comment very confusing, and not amenable to use as a tuning device. Can thus be converted into a knob that is history-independent/memory-less -- i.e. solely state-dependent? > > Otherwise, let's try and word this so that it's simpler to parse. This could include a multiple sentence description. I might even combine the descriptions of the previous and this parameter, which work in tandem, to describe how to tune them. e.g. Old generation collection is triggered by determining when its usage has grown past a threshold since the end of the last Old generation collection, viz. 1. if the usage exceeds the amount considered live at the last old marking cycle plus ShenandoahMinOldGenGrowthPercent markup, or 2. if the current remaining headroom falls below ShenandoahMinOldGenGrowthRemainingHeapPercent of the complement of what was considered live at the last old marking cycle. I am not sure if the verbage in (2) above is correct. > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 90: > >> 88: "ShenandoahGenerationalDoNotIgnoreGrowthAfterYoungCycles " \ >> 89: "consecutive cycles have been completed following the " \ >> 90: "preceding old-gen collection.") \ > > Here again, like my remark below, we can effectively decouple the two options and simplify the verbage by merely saying: > > > // Do not use Old generation growth as a triggering criterion > // when usage is lower than this percentage of heap. > > > I am not sure if "of heap" is correct, or if there is some other implicit percentage of the old generation capacity that one has in mind here. As I read the code for the old heuristic growth trigger, I realize my rewording above is incorrect. I think the code in the triggering could be simplified a bit to allow a more crisp description of these parameters. Let me talk with you offline (face to face). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608586522 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608646531 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608735072 PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608743619 From ysr at openjdk.org Thu Dec 11 00:58:34 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 11 Dec 2025 00:58:34 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: <-Ma5FOOlP7qtDmaJtpnglct3_plcnFLYnVnUwWdLpTA=.c154aed7-fc00-4011-8d96-8652cc996234@github.com> On Thu, 11 Dec 2025 00:09:12 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 119: >> >>> 117: _card_scan(nullptr), >>> 118: _state(WAITING_FOR_BOOTSTRAP), >>> 119: _growth_percent_before_compaction(INITIAL_PERCENT_GROWTH_BEFORE_COMPACTION) >> >> Use either "percent_growth" or "growth_percent" consistently in both names. > > And make those two consistent with either "percent_live" or "live_percent" below. > > (These comments from me actually belong in the .hpp where these are defined.) I think "percent" at the end makes sense, so "live_percent" and "growth_percent", so only the "initial growth percent" name needs to be adjusted for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2608654130 From xpeng at openjdk.org Thu Dec 11 08:06:21 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 08:06:21 GMT Subject: RFR: 8373468: Shenandoah: Mutator may block at _gc_waiters_lock after allocation failure even block parameter is false Message-ID: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> There is behavior discrepancy between Shenandoah generational mode and non-generational when it handles mutator allocation failure, as stated in the description of [JDK-8373468](https://bugs.openjdk.org/browse/JDK-8373468), in non-generational, the `block` parameter may not always work. Further looking into ShenandoahGenerationalControlThread and ShenandoahControlThread, they handle the _requested_gc_cause and ShenandoahHeap::cancelled_cause differently in other places as well, the change in this PR will minimize the gap to unify the behavior, and also fix potentially missed allocation failure notifications in some rare cases for both control threads. ### Test - [x] hotspot_gc_shenandoah - [ ] GHA (includes tier1 but no unit unit with gtest) - [x] gtest ------------- Commit messages: - Need to clear _gc_requested and _requested_gc_cause after checking - Align behaviour for non-generational mode - Fix - Add missing header file runtime/atomic.hpp - Use Atomic to store _requested_gc_cause to ensure atomicity - alloc_failure_pending set to true if _requested_gc_cause is alloc failure - Fix a build error - Shenandoah: Mutator may block at _gc_waiters_lock after allocation failure even block parameter is false Changes: https://git.openjdk.org/jdk/pull/28758/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28758&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373468 Stats: 51 lines in 6 files changed: 28 ins; 1 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/28758.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28758/head:pull/28758 PR: https://git.openjdk.org/jdk/pull/28758 From eastigeevich at openjdk.org Thu Dec 11 11:24:20 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 11 Dec 2025 11:24:20 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v16] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Testing results: linux fastdebug build > - Neoverse-N1 (Graviton 2) > - [x] tier1: passed > - [x] tier2: passed > - [x] tier3: passed > - [ ] tier4 > > Benchmarking results: Neoverse-N1 r3p1 (Graviton 2), 5 000 nmethods... Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision: - Add cache DIC IDC status to VM_Version - Fix tier3 failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/85691beb..44f43e4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=14-15 Stats: 115 lines in 7 files changed: 49 ins; 50 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From eastigeevich at openjdk.org Thu Dec 11 14:18:28 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 11 Dec 2025 14:18:28 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v17] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Testing results: linux fastdebug build > - Neoverse-N1 (Graviton 2) > - [x] tier1: passed > - [x] tier2: passed > - [x] tier3: passed > - [x] tier4: 3 failures > - `containers/docker/TestJcmdWithSideCar.java`: JD... Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Fix macos and windows aarch64 builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/44f43e4e..2448b2d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=15-16 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From xpeng at openjdk.org Thu Dec 11 14:57:27 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 14:57:27 GMT Subject: RFR: 8373468: Shenandoah: Mutator may block at _gc_waiters_lock after allocation failure even block parameter is false [v2] In-Reply-To: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> References: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> Message-ID: <8Av2Uamk28eooDf886YczESyuaUniD538Rt5Dv3gMSg=.20c49530-70cb-4f21-8e3f-fd662dfb732b@github.com> > There is behavior discrepancy between Shenandoah generational mode and non-generational when it handles mutator allocation failure, as stated in the description of [JDK-8373468](https://bugs.openjdk.org/browse/JDK-8373468), in non-generational, the `block` parameter may not always work. > > Further looking into ShenandoahGenerationalControlThread and ShenandoahControlThread, they handle the _requested_gc_cause and ShenandoahHeap::cancelled_cause differently in other places as well, the change in this PR will minimize the gap to unify the behavior, and also fix potentially missed allocation failure notifications in some rare cases for both control threads. > > ### Test > - [x] hotspot_gc_shenandoah > - [ ] GHA (includes tier1 but no unit unit with gtest) - Only one test failure caused by header filer order, fixed. > - [x] gtest Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Reorder include statements in shenandoahGenerationalControlThread.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28758/files - new: https://git.openjdk.org/jdk/pull/28758/files/e60e71da..b3e18e9f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28758&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28758&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28758.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28758/head:pull/28758 PR: https://git.openjdk.org/jdk/pull/28758 From kdnilsen at openjdk.org Thu Dec 11 15:49:27 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 15:49:27 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 22:37:58 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FRACTIONAL_DENOMINATOR constat > > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 65: > >> 63: "(Generational mode only) If the usage within old generation " \ >> 64: "has grown by at least this percent of its live memory size " \ >> 65: "at the start of the previous old-generation marking effort, " \ > > Did you intend to say "at the _end_ of the previous old-generation marking effort" above? Actually, since we are using SATB protocol for old marking, the live data is measured as of the start of GC. (We subtract out the allocations (promotions) that happen during concurrent GC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2611104184 From wkemper at openjdk.org Thu Dec 11 16:05:51 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 11 Dec 2025 16:05:51 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: <1Ry7ZarguJ0J8-wxU9GbmBxWv6ya78RydVQrsSDxn2Y=.8701f127-1e77-4b31-99e8-0df241c89bd8@github.com> On Thu, 11 Dec 2025 00:12:35 GMT, Xiaolong Peng wrote: >> I was looking at the places where `ShenandoahHeap::clear_cancelled_gc` is called, I feel the problem is more likely from op_final_update_refs: >> >> >> void ShenandoahConcurrentGC::op_final_update_refs() { >> ShenandoahHeap* const heap = ShenandoahHeap::heap(); >> ... >> ... >> // Clear cancelled GC, if set. On cancellation path, the block before would handle >> // everything. >> if (heap->cancelled_gc()) { >> heap->clear_cancelled_gc(); >> } >> ... >> ... >> } >> >> >> Let's say there is concurrent GC running, right before the final update refs safepoint, there is mutator allocation failure: >> 1. The mutator tries to cancel the the concurrent GC and notify controller thread. >> 2. The mutator block itself at `_alloc_failure_waiters_lock`, claiming safepoint safe as well. >> 3. concurrent GC enter the final update refs (VM operation) >> 4. in final update refs, VMThread sees cancelled_gc and clear it. >> 5. concurrent GC finishes, but cancelled_gc has been cleared so it won't notify the mutator. >> >> The fix seems to work in generational mode, but may not work in non-generational mode. > > While I was staring at the code ShenandoahController::handle_alloc_failure today, I found there is discrepancy between ShenandoahGenerationalControlThread and ShenandoahControlThread, I created a [bug](https://bugs.openjdk.org/browse/JDK-8373468) to unify the behavior, we could fix the issue in ShenandoahControlThread there. The scenario I described wasn't supposition, that is actually what happened in the debugger. The scenario you describe with `op_final_update_refs` would also be fixed by this PR. The `_requested_gc_cause` field should always be accessed under a lock. The code change here fixes an issue where an allocation failure might not set `_requested_gc_cause` at all. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28665#discussion_r2611167200 From eastigeevich at openjdk.org Thu Dec 11 16:21:51 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 11 Dec 2025 16:21:51 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v18] In-Reply-To: References: Message-ID: <55OALV7rkAyerTV2RKIdh1J1qgZ1hUbHMtB-ND3DRZM=.b37afa75-d56e-4372-81fd-22bab5c2c533@github.com> > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Testing results: linux fastdebug build > - Neoverse-N1 (Graviton 2) > - [x] tier1: passed > - [x] tier2: passed > - [x] tier3: passed > - [x] tier4: 3 failures > - `containers/docker/TestJcmdWithSideCar.java`: JD... Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Merge branch 'master' into JDK-8370947 - Fix macos and windows aarch64 builds - Add cache DIC IDC status to VM_Version - Fix tier3 failures - Fix tier1 failures - Implement nested ICacheInvalidationContext - Fix linux-cross-compile build aarch64 - Merge branch 'master' into JDK-8370947 - Remove trailing whitespaces - Add support of deferred icache invalidation to other GCs and JIT - ... and 15 more: https://git.openjdk.org/jdk/compare/aa986be7...fc4bbe9d ------------- Changes: https://git.openjdk.org/jdk/pull/28328/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=17 Stats: 919 lines in 27 files changed: 852 ins; 21 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From kdnilsen at openjdk.org Thu Dec 11 16:28:25 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 16:28:25 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 23:43:16 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 105: >> >>> 103: size_t _fragmentation_last_old_region; >>> 104: >>> 105: // adapted value of ShenandoahOldGarbageThreshold >> >> May be reword to: >> >> // a dynamic threshold of garbage for an old >> // region to be deemed eligible for evacuation. >> >> since `ShenandoahOldGarbageThreshold` is a constant parameter to the JVM. > > As I write this, I realize "Old region" may not be the right term here. It should be "an Old or otherwise tenurable region" because it seems regions that are Young but tenurable are filtered through this check, not just Old regions? Thanks for your review and suggestions. I'll make a pass over this code and try to improve the comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2611248131 From kdnilsen at openjdk.org Thu Dec 11 16:28:28 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 16:28:28 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: <0KjOEhmxlJ2anomFUqZ0vz1S36zofiteJ2sL1x-KTs0=.2b44157b-6f25-4a04-8cc3-de3ba3e3567b@github.com> On Wed, 10 Dec 2025 22:32:11 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FRACTIONAL_DENOMINATOR constat > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 214: > >> 212: >> 213: // The normal old_garbage_threshold is specified by ShenandoahOldGarbageThreshold command-line argument, with default >> 214: // value 25, denoting that a region that has at least 25% garbage is eligible for compaction. With default values for > > compaction or evacuation? Thanks. Making this change to evacuation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2611252008 From kdnilsen at openjdk.org Thu Dec 11 16:39:15 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 16:39:15 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 00:48:58 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 90: >> >>> 88: "ShenandoahGenerationalDoNotIgnoreGrowthAfterYoungCycles " \ >>> 89: "consecutive cycles have been completed following the " \ >>> 90: "preceding old-gen collection.") \ >> >> Here again, like my remark below, we can effectively decouple the two options and simplify the verbage by merely saying: >> >> >> // Do not use Old generation growth as a triggering criterion >> // when usage is lower than this percentage of heap. >> >> >> I am not sure if "of heap" is correct, or if there is some other implicit percentage of the old generation capacity that one has in mind here. > > As I read the code for the old heuristic growth trigger, I realize my rewording above is incorrect. > > I think the code in the triggering could be simplified a bit to allow a more crisp description of these parameters. > > Let me talk with you offline (face to face). Sounds good. I'm inclined to leave these behaviors as is for now. These are not the focus of this particular PR. But let's think about creating a new JBS ticket and PR to improve further if that's ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2611288502 From kdnilsen at openjdk.org Thu Dec 11 16:39:18 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 16:39:18 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 22:49:18 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FRACTIONAL_DENOMINATOR constat > > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 96: > >> 94: 100, EXPERIMENTAL, \ >> 95: "(Generational mode only) Even if the usage of old generation " \ >> 96: "is below ShenandoahIgnoreOldGrowthBelowPercentage, " \ > > The reference to `ShenandoahIgnoreOldGrowthBelowPercentage` (SIOGBP) seems to me to be spurious and confusing. I think this might be a simpler phrasing, without any reference to SIOGBP: > > > \\ Trigger an Old collection if Old generation usage has grown, > \\ and this many Young collections have happened, > \\ since the last Old collection. I'll reorder this paragraph to emphasize the triggering. I don't want to totally remove the mention of ShenandoahIngoreOldGrowthBelowPercentage because the descriptions of these two parameters might otherwise appear to contradict each other. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2611282109 From kdnilsen at openjdk.org Thu Dec 11 16:43:39 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 16:43:39 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 23:01:38 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FRACTIONAL_DENOMINATOR constat > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 509: > >> 507: >> 508: // Preselect for inclusion into the collection set regions whose age is at or above tenure age which contain more than >> 509: // the old garbage threshold amount of garbage. We identify these regions by setting the appropriate entry of > > Is amount in this sentence a percentage? If so, I'd say: `We only select regions whose garbage percentage exceeds a dynamically adjusted threshold.` > Using "old garbage threshold amount" by itself can be confusing, since "amount" doesn't sound like a percentage which is what I believe we mean here. I've word-smithed your suggestion into the existing paragraph . Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2611304065 From kdnilsen at openjdk.org Thu Dec 11 16:45:50 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 16:45:50 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 23:31:56 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FRACTIONAL_DENOMINATOR constat > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 206: > >> 204: bool is_experimental() override; >> 205: >> 206: > > Although just an accessor, I'd document this API, perhaps using its intended usage as understood by its client: > > > // Returns the current value of a dynamically > // adjusted threshold percentage of garbage > // above which an Old region should be deemed > // eligible for evacuation. Thanks. Inserted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2611312529 From xpeng at openjdk.org Thu Dec 11 16:54:14 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 16:54:14 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: <58yuhJdcLvchtwbp6Z85z4H6KIr6NyejnkO-CZTG-sk=.9e0ef412-078e-4aef-bd0e-cd141f441f46@github.com> On Fri, 5 Dec 2025 18:53:37 GMT, William Kemper wrote: >> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Set requested gc cause under a lock when allocation fails Thanks for the digging and fixing the issue. ------------- Marked as reviewed by xpeng (Committer). PR Review: https://git.openjdk.org/jdk/pull/28665#pullrequestreview-3568387538 From xpeng at openjdk.org Thu Dec 11 16:54:15 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 16:54:15 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: <1Ry7ZarguJ0J8-wxU9GbmBxWv6ya78RydVQrsSDxn2Y=.8701f127-1e77-4b31-99e8-0df241c89bd8@github.com> References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> <1Ry7ZarguJ0J8-wxU9GbmBxWv6ya78RydVQrsSDxn2Y=.8701f127-1e77-4b31-99e8-0df241c89bd8@github.com> Message-ID: On Thu, 11 Dec 2025 16:03:10 GMT, William Kemper wrote: >> While I was staring at the code ShenandoahController::handle_alloc_failure today, I found there is discrepancy between ShenandoahGenerationalControlThread and ShenandoahControlThread, I created a [bug](https://bugs.openjdk.org/browse/JDK-8373468) to unify the behavior, we could fix the issue in ShenandoahControlThread there. > > The scenario I described wasn't supposition, that is actually what happened in the debugger. The scenario you describe with `op_final_update_refs` would also be fixed by this PR. The `_requested_gc_cause` field should always be accessed under a lock. The code change here fixes an issue where an allocation failure might not set `_requested_gc_cause` at all. Yes, I understand the fix will solve the issue for genshen and also fix scenario I described. I'll solve the potential issue in non-generational Shenandoah in the [PR](https://github.com/openjdk/jdk/pull/28758) to fix the behavior differences in Genshen and non-generational Shenandoah. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28665#discussion_r2611334636 From wkemper at openjdk.org Thu Dec 11 17:34:22 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 11 Dec 2025 17:34:22 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking [v5] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 23:31:41 GMT, Nityanand Rai wrote: >> Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. > > Nityanand Rai has updated the pull request incrementally with one additional commit since the last revision: > > hardening of comments > > remove unintended files Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28204#pullrequestreview-3568545658 From xpeng at openjdk.org Thu Dec 11 17:38:43 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 17:38:43 GMT Subject: RFR: 8373468: Shenandoah: Mutator may block at _gc_waiters_lock after allocation failure even block parameter is false [v3] In-Reply-To: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> References: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> Message-ID: > There is behavior discrepancy between Shenandoah generational mode and non-generational when it handles mutator allocation failure, as stated in the description of [JDK-8373468](https://bugs.openjdk.org/browse/JDK-8373468), in non-generational, the `block` parameter may not always work. > > Further looking into ShenandoahGenerationalControlThread and ShenandoahControlThread, they handle the _requested_gc_cause and ShenandoahHeap::cancelled_cause differently in other places as well, the change in this PR will minimize the gap to unify the behavior, and also fix potentially missed allocation failure notifications in some rare cases for both control threads. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA (includes tier1 but no unit unit with gtest) > - [x] [x] gtest Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Only load _requested_gc_cause once when figuring out pending requests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28758/files - new: https://git.openjdk.org/jdk/pull/28758/files/b3e18e9f..467b2746 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28758&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28758&range=01-02 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28758.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28758/head:pull/28758 PR: https://git.openjdk.org/jdk/pull/28758 From wkemper at openjdk.org Thu Dec 11 17:47:50 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 11 Dec 2025 17:47:50 GMT Subject: RFR: 8373468: Shenandoah: Mutator may block at _gc_waiters_lock after allocation failure even block parameter is false [v3] In-Reply-To: References: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> Message-ID: On Thu, 11 Dec 2025 17:38:43 GMT, Xiaolong Peng wrote: >> There is behavior discrepancy between Shenandoah generational mode and non-generational when it handles mutator allocation failure, as stated in the description of [JDK-8373468](https://bugs.openjdk.org/browse/JDK-8373468), in non-generational, the `block` parameter may not always work. >> >> Further looking into ShenandoahGenerationalControlThread and ShenandoahControlThread, they handle the _requested_gc_cause and ShenandoahHeap::cancelled_cause differently in other places as well, the change in this PR will minimize the gap to unify the behavior, and also fix potentially missed allocation failure notifications in some rare cases for both control threads. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] GHA (includes tier1 but no unit unit with gtest) >> - [x] [x] gtest > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Only load _requested_gc_cause once when figuring out pending requests Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahControlThread.hpp line 46: > 44: > 45: ShenandoahSharedFlag _gc_requested; > 46: Atomic _requested_gc_cause; `_requested_gc_cause` should always be accessed when holding the `_control_lock`. Making this atomic should not be unnecessary. ------------- PR Review: https://git.openjdk.org/jdk/pull/28758#pullrequestreview-3568599040 PR Review Comment: https://git.openjdk.org/jdk/pull/28758#discussion_r2611507205 From xpeng at openjdk.org Thu Dec 11 18:38:59 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 18:38:59 GMT Subject: RFR: 8373468: Shenandoah: Mutator may block at _gc_waiters_lock after allocation failure even block parameter is false [v4] In-Reply-To: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> References: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> Message-ID: > There is behavior discrepancy between Shenandoah generational mode and non-generational when it handles mutator allocation failure, as stated in the description of [JDK-8373468](https://bugs.openjdk.org/browse/JDK-8373468), in non-generational, the `block` parameter may not always work. > > Further looking into ShenandoahGenerationalControlThread and ShenandoahControlThread, they handle the _requested_gc_cause and ShenandoahHeap::cancelled_cause differently in other places as well, the change in this PR will minimize the gap to unify the behavior, and also fix potentially missed allocation failure notifications in some rare cases for both control threads. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA (includes tier1 but no unit unit with gtest) > - [x] [x] gtest Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Revert change to use Atomic for _requested_gc_cause ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28758/files - new: https://git.openjdk.org/jdk/pull/28758/files/467b2746..469dc1fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28758&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28758&range=02-03 Stats: 37 lines in 4 files changed: 21 ins; 3 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/28758.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28758/head:pull/28758 PR: https://git.openjdk.org/jdk/pull/28758 From xpeng at openjdk.org Thu Dec 11 18:39:01 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 18:39:01 GMT Subject: RFR: 8373468: Shenandoah: Mutator may block at _gc_waiters_lock after allocation failure even block parameter is false [v3] In-Reply-To: References: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> Message-ID: On Thu, 11 Dec 2025 17:44:35 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Only load _requested_gc_cause once when figuring out pending requests > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.hpp line 46: > >> 44: >> 45: ShenandoahSharedFlag _gc_requested; >> 46: Atomic _requested_gc_cause; > > `_requested_gc_cause` should always be accessed when holding the `_control_lock`. Making this atomic should not be unnecessary. Yes, the change to use Atomic for _requested_gc_cause is unnecessary, I have reverted it with a minor change for ShenandoahControlThread to always access _requested_gc_cause with `_control_lock` held. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28758#discussion_r2611654969 From kdnilsen at openjdk.org Thu Dec 11 18:54:36 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 18:54:36 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: <1vJOThZHQ4oUjz0K3Q__TNKqQjmamKli559hUQHPaDI=.50025ca8-1309-4d72-aaad-05dd4daf75fb@github.com> On Wed, 10 Dec 2025 23:50:38 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FRACTIONAL_DENOMINATOR constat > > src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.hpp line 306: > >> 304: >> 305: // How much growth in usage before we trigger old collection as a percent of soft_max_capacity >> 306: size_t _growth_percent_before_compaction; > > I'd prefer we didn't use the term "compaction". Would replacing it with "collection", as in the comment, work? Or is there a specific reason you want to say "compaction" here? Thanks. Replacing with collection. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2611704905 From xpeng at openjdk.org Thu Dec 11 18:56:15 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 18:56:15 GMT Subject: RFR: 8373468: Shenandoah: Mutator may block at _gc_waiters_lock after allocation failure even block parameter is false [v5] In-Reply-To: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> References: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> Message-ID: > There is behavior discrepancy between Shenandoah generational mode and non-generational when it handles mutator allocation failure, as stated in the description of [JDK-8373468](https://bugs.openjdk.org/browse/JDK-8373468), in non-generational, the `block` parameter may not always work. > > Further looking into ShenandoahGenerationalControlThread and ShenandoahControlThread, they handle the _requested_gc_cause and ShenandoahHeap::cancelled_cause differently in other places as well, the change in this PR will minimize the gap to unify the behavior, and also fix potentially missed allocation failure notifications in some rare cases for both control threads. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] GHA (includes tier1 but no unit unit with gtest) > - [x] gtest Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: format issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28758/files - new: https://git.openjdk.org/jdk/pull/28758/files/469dc1fc..085a0497 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28758&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28758&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28758.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28758/head:pull/28758 PR: https://git.openjdk.org/jdk/pull/28758 From kdnilsen at openjdk.org Thu Dec 11 19:00:14 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 19:00:14 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: <-Ma5FOOlP7qtDmaJtpnglct3_plcnFLYnVnUwWdLpTA=.c154aed7-fc00-4011-8d96-8652cc996234@github.com> References: <-Ma5FOOlP7qtDmaJtpnglct3_plcnFLYnVnUwWdLpTA=.c154aed7-fc00-4011-8d96-8652cc996234@github.com> Message-ID: On Thu, 11 Dec 2025 00:14:42 GMT, Y. Srinivas Ramakrishna wrote: >> And make those two consistent with either "percent_live" or "live_percent" below. >> >> (These comments from me actually belong in the .hpp where these are defined.) > > I think "percent" at the end makes sense, so "live_percent" and "growth_percent", so only the "initial growth percent" name needs to be adjusted for consistency. I believe I have made these changes now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2611724273 From wkemper at openjdk.org Thu Dec 11 20:37:10 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 11 Dec 2025 20:37:10 GMT Subject: RFR: 8373468: Shenandoah: Mutator may block at _gc_waiters_lock after allocation failure even block parameter is false [v5] In-Reply-To: References: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> Message-ID: On Thu, 11 Dec 2025 18:56:15 GMT, Xiaolong Peng wrote: >> There is behavior discrepancy between Shenandoah generational mode and non-generational when it handles mutator allocation failure, as stated in the description of [JDK-8373468](https://bugs.openjdk.org/browse/JDK-8373468), in non-generational, the `block` parameter may not always work. >> >> Further looking into ShenandoahGenerationalControlThread and ShenandoahControlThread, they handle the _requested_gc_cause and ShenandoahHeap::cancelled_cause differently in other places as well, the change in this PR will minimize the gap to unify the behavior, and also fix potentially missed allocation failure notifications in some rare cases for both control threads. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] GHA (includes tier1 but no unit unit with gtest) >> - [x] gtest > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > format issue Can you describe the scenario you are trying to fix here? The two control threads have different idle/wakeup conditions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28758#issuecomment-3643683986 From kdnilsen at openjdk.org Thu Dec 11 21:25:05 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 21:25:05 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:46:19 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 65: >> >>> 63: "(Generational mode only) If the usage within old generation " \ >>> 64: "has grown by at least this percent of its live memory size " \ >>> 65: "at the start of the previous old-generation marking effort, " \ >> >> Did you intend to say "at the _end_ of the previous old-generation marking effort" above? > > Actually, since we are using SATB protocol for old marking, the live data is measured as of the start of GC. (We subtract out the allocations (promotions) that happen during concurrent GC. It's been my intention to use a SATB measurement of live data, but it looks like that's not yet implemented. Would like to leave as is for now. Will address this in a future PR. For preservation of thoughts on how to do this: 1. At ShenandoahConcurrentGC::op_init_mark(), when we parallel_heap_region_iterate() over ShenandoahInitMarkUpdateRegionStateClosure, we can capture for each heap its "current usage" 2. At ShenandoahOldHeuristics::prepare_for_old_colections(), this is where we currently set_live_bytes_at_last_mark() to the current live in old. 3. At this point, I would like to compute for each old region the difference between its current old used and its old-used-at-start-of-gc (this represents new allocations (promotions) that happened during concurrent old marking). Subtract the sum of these differences from the live memory calculated today. This represents the true live memory at start of old marking (according to SATB theory). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2612115759 From kdnilsen at openjdk.org Thu Dec 11 21:30:28 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 21:30:28 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v6] In-Reply-To: References: Message-ID: > When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. > > When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 35%, of the memory not live in old at the last marking of old. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Response to reviewer suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28561/files - new: https://git.openjdk.org/jdk/pull/28561/files/41a83389..c71969c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=04-05 Stats: 46 lines in 7 files changed: 11 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/28561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28561/head:pull/28561 PR: https://git.openjdk.org/jdk/pull/28561 From kdnilsen at openjdk.org Thu Dec 11 21:55:17 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 21:55:17 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v7] In-Reply-To: References: Message-ID: > When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. > > When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 35%, of the memory not live in old at the last marking of old. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: white space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28561/files - new: https://git.openjdk.org/jdk/pull/28561/files/c71969c7..ee81bd92 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28561&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28561/head:pull/28561 PR: https://git.openjdk.org/jdk/pull/28561 From xpeng at openjdk.org Thu Dec 11 22:23:00 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Dec 2025 22:23:00 GMT Subject: RFR: 8373468: Shenandoah: Mutator may block at _gc_waiters_lock after allocation failure even block parameter is false [v5] In-Reply-To: References: <62mzeOhRYpxYa0sNRcH2vs58S-jcGhv8gAqIx2R4L5M=.9914f2b4-4632-4f3c-ac05-0132b0d1370d@github.com> Message-ID: On Thu, 11 Dec 2025 20:34:55 GMT, William Kemper wrote: > Can you describe the scenario you are trying to fix here? The two control threads have different idle/wakeup conditions. The major thing I am fixing in the PR is to make ShenandoahControlThread always honor the `block` parameter. In current impl ShenandoahControlThread may still block the mutator thread even `block` parameter is `false`, because the ShenandoahControlThread::request_gc blocks the mutator use _gc_waiters_lock for allocation failure; while ShenandoahGenerationalControlThread::request_gc doesn't, it simplify call `notify_cancellation(cause)` if cause is allocation failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28758#issuecomment-3644032272 From wkemper at openjdk.org Thu Dec 11 23:15:36 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 11 Dec 2025 23:15:36 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v7] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 21:55:17 GMT, Kelvin Nilsen wrote: >> When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. >> >> When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 35%, of the memory not live in old at the last marking of old. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > white space Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28561#pullrequestreview-3569660574 From kdnilsen at openjdk.org Thu Dec 11 23:15:39 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 23:15:39 GMT Subject: RFR: 8373225: GenShen: More adaptive old-generation growth heuristics [v5] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 21:22:29 GMT, Kelvin Nilsen wrote: >> Actually, since we are using SATB protocol for old marking, the live data is measured as of the start of GC. (We subtract out the allocations (promotions) that happen during concurrent GC. > > It's been my intention to use a SATB measurement of live data, but it looks like that's not yet implemented. Would like to leave as is for now. Will address this in a future PR. For preservation of thoughts on how to do this: > 1. At ShenandoahConcurrentGC::op_init_mark(), when we parallel_heap_region_iterate() over ShenandoahInitMarkUpdateRegionStateClosure, we can capture for each heap its "current usage" > 2. At ShenandoahOldHeuristics::prepare_for_old_colections(), this is where we currently set_live_bytes_at_last_mark() to the current live in old. > 3. At this point, I would like to compute for each old region the difference between its current old used and its old-used-at-start-of-gc (this represents new allocations (promotions) that happened during concurrent old marking). Subtract the sum of these differences from the live memory calculated today. This represents the true live memory at start of old marking (according to SATB theory). used at mark start is basically top at mark start... That simplifies things. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28561#discussion_r2612350194 From kdnilsen at openjdk.org Thu Dec 11 23:18:18 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Dec 2025 23:18:18 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild [v4] In-Reply-To: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: > This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. > > This addresses a problem that results if available memory is probed while we are rebuilding the freeset. > > Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Add rebuild synchronization to capacity() and used() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27612/files - new: https://git.openjdk.org/jdk/pull/27612/files/8462a290..3c29dc10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27612&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27612&range=02-03 Stats: 12 lines in 1 file changed: 9 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27612.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27612/head:pull/27612 PR: https://git.openjdk.org/jdk/pull/27612 From kdnilsen at openjdk.org Fri Dec 12 14:05:35 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 12 Dec 2025 14:05:35 GMT Subject: Integrated: 8373225: GenShen: More adaptive old-generation growth heuristics In-Reply-To: References: Message-ID: On Sat, 29 Nov 2025 01:10:02 GMT, Kelvin Nilsen wrote: > When old-gen consumes a small percentage of heap size, trigger when old-gen expands by more than ShenandoahMinOldGenGrowthPercent, with default value 50%, from the live data in old at time of previous old-gen mark. > > When old-gen consumes a larger percentage of heap size, we trigger when old-gen expands by more than ShenandoahMinOldGenGrowthRemainingHeapPercent, with default value 35%, of the memory not live in old at the last marking of old. This pull request has now been integrated. Changeset: 41001437 Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/410014377c210463d654b841bafbcf36947aa960 Stats: 166 lines in 11 files changed: 106 ins; 7 del; 53 mod 8373225: GenShen: More adaptive old-generation growth heuristics Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/jdk/pull/28561 From qamai at openjdk.org Fri Dec 12 16:06:22 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 16:06:22 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v2] In-Reply-To: References: Message-ID: > Hi, > > This is extracted from #28570 , there are 2 issues here: > > - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. > - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. > > Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Fix Shenandoah ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28789/files - new: https://git.openjdk.org/jdk/pull/28789/files/e9789170..1e026354 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=00-01 Stats: 10 lines in 1 file changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28789/head:pull/28789 PR: https://git.openjdk.org/jdk/pull/28789 From ysr at openjdk.org Sat Dec 13 00:06:04 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 13 Dec 2025 00:06:04 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Fri, 5 Dec 2025 18:53:37 GMT, William Kemper wrote: >> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Set requested gc cause under a lock when allocation fails Thanks for cleaning this up. Did you review the non-generational ShenandoahControlThread and uses thereof to make sure a similar issues doesn't exist there? As Xiaolong states, it might be worthwhile to do a refactor that shares as much as needed and no more, and to do so cleanly. This looks good; sorry for the delay in reviewing. ? src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 143: > 141: void notify_control_thread(MonitorLocker& ml, GCCause::Cause cause, ShenandoahGeneration* generation); > 142: void notify_control_thread(GCCause::Cause cause); > 143: void notify_control_thread(MonitorLocker& ml, GCCause::Cause cause); Nit: I'd (subjectively) order them thus: (nct = notify_control_thread) 1. nct(cause) 2. nct(ml, cause) 3. nct(cause, generation) 4. nct(ml, cause, generation) For completeness in the documentation comment preceding, state that if an argument, cause or generation, is missing, it isn't updated. I am assuming that there is a specific small subset of cause values where the generation isn't important to spell out and really implies "isn't necessary or is implicitly understood" for cancellation/request cause? Is there a call argument/consistency check that might be done in the nct:s where these bottom out to confirm this, or am I being unnecessarily paranoid? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28665#pullrequestreview-3573943496 PR Review Comment: https://git.openjdk.org/jdk/pull/28665#discussion_r2615874522 From wkemper at openjdk.org Sat Dec 13 00:18:57 2025 From: wkemper at openjdk.org (William Kemper) Date: Sat, 13 Dec 2025 00:18:57 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Sat, 13 Dec 2025 00:03:10 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Set requested gc cause under a lock when allocation fails > > Thanks for cleaning this up. > > Did you review the non-generational ShenandoahControlThread and uses thereof to make sure a similar issues doesn't exist there? > > As Xiaolong states, it might be worthwhile to do a refactor that shares as much as needed and no more, and to do so cleanly. > > This looks good; sorry for the delay in reviewing. > > ? @ysramakrishna , @pengxiaolong - The non-generational control thread is less susceptible to this sort of issue because it has the responsibility of evaluating trigger conditions. It's loop therefore sleeps with a timed wait when the GC cycle is complete. If it misses a cancelled gc request, it will see it on the next iteration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28665#issuecomment-3648576231 From wkemper at openjdk.org Sat Dec 13 00:25:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Sat, 13 Dec 2025 00:25:56 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v2] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Fri, 12 Dec 2025 23:57:50 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Set requested gc cause under a lock when allocation fails > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 143: > >> 141: void notify_control_thread(MonitorLocker& ml, GCCause::Cause cause, ShenandoahGeneration* generation); >> 142: void notify_control_thread(GCCause::Cause cause); >> 143: void notify_control_thread(MonitorLocker& ml, GCCause::Cause cause); > > Nit: > > I'd (subjectively) order them thus: (nct = notify_control_thread) > > 1. nct(cause) > 2. nct(ml, cause) > 3. nct(cause, generation) > 4. nct(ml, cause, generation) > > For completeness in the documentation comment preceding, state that if an argument, cause or generation, is missing, it isn't updated. > > I am assuming that there is a specific small subset of cause values where the generation isn't important to spell out and really implies "isn't necessary or is implicitly understood" for cancellation/request cause? Is there a call argument/consistency check that might be done in the nct:s where these bottom out to confirm this, or am I being unnecessarily paranoid? Yes, there are two uses where we don't need the generation: * It's important to _not_ update the generation for an allocation failure (degenerated cycle needs to use same generation) * We are shutting down the JVM and don't want to start another cycle. All cases need to pass a `GCCause`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28665#discussion_r2615896228 From wkemper at openjdk.org Sat Dec 13 00:44:21 2025 From: wkemper at openjdk.org (William Kemper) Date: Sat, 13 Dec 2025 00:44:21 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v3] In-Reply-To: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: > In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge remote-tracking branch 'jdk/master' into fix-missed-cancellation - Improve comment - Set requested gc cause under a lock when allocation fails - Expand scope of control lock so that it can't miss cancellation notifications ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28665/files - new: https://git.openjdk.org/jdk/pull/28665/files/1081f21e..4c82d21c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28665&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28665&range=01-02 Stats: 35761 lines in 486 files changed: 23753 ins; 9524 del; 2484 mod Patch: https://git.openjdk.org/jdk/pull/28665.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28665/head:pull/28665 PR: https://git.openjdk.org/jdk/pull/28665 From ysr at openjdk.org Sat Dec 13 00:44:21 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 13 Dec 2025 00:44:21 GMT Subject: RFR: 8373100: Genshen: Control thread can miss allocation failure notification [v3] In-Reply-To: References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Sat, 13 Dec 2025 00:41:01 GMT, William Kemper wrote: >> In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge remote-tracking branch 'jdk/master' into fix-missed-cancellation > - Improve comment > - Set requested gc cause under a lock when allocation fails > - Expand scope of control lock so that it can't miss cancellation notifications Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28665#pullrequestreview-3573980764 From wkemper at openjdk.org Sat Dec 13 00:49:28 2025 From: wkemper at openjdk.org (William Kemper) Date: Sat, 13 Dec 2025 00:49:28 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen Message-ID: The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. ------------- Commit messages: - Oops, change name of class in test xdoc - Oops, initialize new field - Add missing newline - Add comments, fix typo, survive multiple young GCs - WIP: checkpoint - Clean up test, increase likelihood of cross generational references - Add test for genshen reference processing Changes: https://git.openjdk.org/jdk/pull/28810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373203 Stats: 472 lines in 14 files changed: 437 ins; 18 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From shade at openjdk.org Mon Dec 15 10:17:11 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 15 Dec 2025 10:17:11 GMT Subject: RFR: 8373266: Strengthen constant CardTable base accesses In-Reply-To: References: Message-ID: <_szeC-N8HfgGKOuDjXYHs8SwZFQQnffiG31iwdLYO9k=.8e56abbd-cfd5-49fb-ab22-178616705f56@github.com> On Mon, 8 Dec 2025 18:45:04 GMT, Aleksey Shipilev wrote: > Shenandoah and G1 are using CardTable for most of its infrastructure, but flip the card tables as they go, and maintain the actual card table reference in TLS. As such, accessing card table base from assembler and compilers runs into risk of accidentally encoding the wrong card table base in generated code. > > Most of the current code avoids this trouble by carefully implementing their GC barriers to avoid touching shared parts where card table base constness is assumed. _Except_ for JVMCI, that reads the card table base for G1 barrier set, and that is wrong. The JVMCI users would need to rectify this downstream. > > Shenandoah added a few asserts to catch these errors: > SHENANDOAHGC_ONLY(assert(!UseShenandoahGC, "Shenandoah byte_map_base is not constant.");) > > ...but G1 would also benefit from the similar safety mechanism. > > This PR strengthens the code to prevent future accidents. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc` > - [x] Linux x86_64 server fastdebug, `all` with Serial, Parallel, G1, Shenandoah, Z > - [x] Linux AArch64 server fastdebug, `all` with Serial, Parallel, G1, Shenandoah, Z > - [x] GHA, cross-compilation only `all` tests are passing with various GC combinations as well. Ready for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28703#issuecomment-3654844328 From roland at openjdk.org Mon Dec 15 15:25:44 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 Dec 2025 15:25:44 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v2] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:47:34 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/graphKit.cpp line 4191: >> >>> 4189: Node* res_mem = _gvn.transform(new SCMemProjNode(_gvn.transform(str))); >>> 4190: if (adr_type == TypePtr::BOTTOM) { >>> 4191: set_all_memory(res_mem); >> >> I'm confused by this. Doesn't `StrCompressedCopyNode` only write to dst? So the only part of the memory state that it updates is the one for `TypeAryPtr::BYTES`? > > It is because if a node consumes more memory than it produces, we need to compute its anti-dependencies. And since we do not compute anti-dependencies of these nodes, it is safer to make them kill all the memory they consume. What do you think? Could this be fixed by appending a `MemBarCPUOrderNode` on the slice of src? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28789#discussion_r2619864649 From wkemper at openjdk.org Mon Dec 15 15:54:45 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 15 Dec 2025 15:54:45 GMT Subject: Integrated: 8373100: Genshen: Control thread can miss allocation failure notification In-Reply-To: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> References: <0zYhRl0mOYzH1sYZRFhxfr06N5-5Kh78wCVSCfVA2Qo=.7583bd34-3e9a-4f8e-a274-d1d2ba09a442@github.com> Message-ID: On Thu, 4 Dec 2025 20:35:42 GMT, William Kemper wrote: > In some cases, the control thread may fail to observe an allocation failure. This results in the thread which failed to allocate waiting forever for the control thread to run a cycle. Depending on which thread fails to allocate, the process may not make progress. This pull request has now been integrated. Changeset: ea6493c4 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/ea6493c4e1de2bc9615beee389b2d335669dc542 Stats: 23 lines in 2 files changed: 4 ins; 8 del; 11 mod 8373100: Genshen: Control thread can miss allocation failure notification Reviewed-by: ysr, kdnilsen, xpeng ------------- PR: https://git.openjdk.org/jdk/pull/28665 From kdnilsen at openjdk.org Mon Dec 15 19:57:52 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 15 Dec 2025 19:57:52 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v4] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 23:16:35 GMT, Rui Li wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> Tests: >> - Ran the repro app `StableLiveSet.java` in https://bugs.openjdk.org/browse/JDK-8372543. Without fix, tip had ~2910 times gc in 20 sec with `-XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive -XX:SoftMaxHeapSize=512m -Xmx31g` jvm args. With the fix, only 18 times in 20 sec. >> - GHA passed. >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > log and naming fixes Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28622#pullrequestreview-3579895704 From kdnilsen at openjdk.org Mon Dec 15 19:57:55 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 15 Dec 2025 19:57:55 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v3] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 22:46:07 GMT, Rui Li wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 940: >> >>> 938: >>> 939: size_t ShenandoahGeneration::soft_available_exclude_evac_reserve() const { >>> 940: size_t result = available(ShenandoahHeap::heap()->soft_max_capacity() * (100.0 - ShenandoahEvacReserve) / 100); >> >> I'm a little uncomfortable with this approach. It's mostly a question of how we name it. The evac reserve is not always this value. In particular, we may shrink the young evac reserves after we have selected the cset. Also of concern is that if someone invokes this function on old_generation(), it looks like they'll get a bogus (not meaningful) value. >> >> I think I'd be more comfortable with naming this to something like "mutator_available_when_gc_is_idle()". If we keep it virtual, then OldGeneration should override with "assert(false, "Not relevant to old generation") > > Talked offline. Rename this to `soft_mutator_available` Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28622#discussion_r2620646753 From kdnilsen at openjdk.org Mon Dec 15 22:00:13 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 15 Dec 2025 22:00:13 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC Message-ID: Add a triggering penalty when we execute degenerated GC cycle. ------------- Commit messages: - remove redundant code - Increase heuristic penalties following degenerated GC Changes: https://git.openjdk.org/jdk/pull/28834/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373714 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28834/head:pull/28834 PR: https://git.openjdk.org/jdk/pull/28834 From kdnilsen at openjdk.org Mon Dec 15 23:39:03 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 15 Dec 2025 23:39:03 GMT Subject: RFR: 8373720: GenShen: Count live-at-old mark using Snapshot at Beginning Message-ID: Live memory in old is measured as of the start of old-generation concurrent marking. Memory promoted during concurrent old marking (memory above TAMS for the old-generation heap regions) and memory regions promoted in place during concurrent old marking are excluded from the total live data at start of old marking. ------------- Commit messages: - do not count promoted-in-place live data toward TAMS old live data Changes: https://git.openjdk.org/jdk/pull/28837/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28837&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373720 Stats: 19 lines in 4 files changed: 17 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28837.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28837/head:pull/28837 PR: https://git.openjdk.org/jdk/pull/28837 From kdnilsen at openjdk.org Mon Dec 15 23:46:20 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 15 Dec 2025 23:46:20 GMT Subject: RFR: 8373720: GenShen: Count live-at-old mark using Snapshot at Beginning [v2] In-Reply-To: References: Message-ID: > Live memory in old is measured as of the start of old-generation concurrent marking. > > Memory promoted during concurrent old marking (memory above TAMS for the old-generation heap regions) and memory regions promoted in place during concurrent old marking are excluded from the total live data at start of old marking. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge remote-tracking branch 'jdk/master' into fix-live-data-at-old-mark - do not count promoted-in-place live data toward TAMS old live data ------------- Changes: https://git.openjdk.org/jdk/pull/28837/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28837&range=01 Stats: 17 lines in 4 files changed: 14 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28837.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28837/head:pull/28837 PR: https://git.openjdk.org/jdk/pull/28837 From wkemper at openjdk.org Tue Dec 16 00:00:55 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 16 Dec 2025 00:00:55 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 21:53:02 GMT, Kelvin Nilsen wrote: > Add a triggering penalty when we execute degenerated GC cycle. We have to be careful not to revert: https://bugs.openjdk.org/browse/JDK-8368152. src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 314: > 312: ShenandoahCollectorPolicy* policy = heap->shenandoah_policy(); > 313: policy->record_degenerated(_generation->is_young(), _abbreviated, progress); > 314: if (progress || (heap->mode()->is_generational() && !policy->generational_should_upgrade_degenerated_gc())) { I'm not sure why we want to change the logic here. Previously, if there was no progress, and we were in generational mode and the policy said _not_ to upgrade to a full GC, nothing would happen here. Now we are treating that case the same as if there were progress, when there might not have been. `heap->notifiy_gc_progress()` resets the `no progress counter`, which has implications for how many times Shenandoah will retry allocations. We need to be careful to avoid an allocation loop that just runs degenerated cycles forever (brownout). ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28834#pullrequestreview-3580616145 PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2621222637 From kdnilsen at openjdk.org Tue Dec 16 00:04:30 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 Dec 2025 00:04:30 GMT Subject: RFR: 8373720: GenShen: Count live-at-old mark using Snapshot at Beginning [v3] In-Reply-To: References: Message-ID: > Live memory in old is measured as of the start of old-generation concurrent marking. > > Memory promoted during concurrent old marking (memory above TAMS for the old-generation heap regions) and memory regions promoted in place during concurrent old marking are excluded from the total live data at start of old marking. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: fix up comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28837/files - new: https://git.openjdk.org/jdk/pull/28837/files/f9ee6646..5a53f72c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28837&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28837&range=01-02 Stats: 5 lines in 1 file changed: 3 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28837.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28837/head:pull/28837 PR: https://git.openjdk.org/jdk/pull/28837 From wkemper at openjdk.org Tue Dec 16 00:09:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 16 Dec 2025 00:09:53 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v4] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 23:16:35 GMT, Rui Li wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> Tests: >> - Ran the repro app `StableLiveSet.java` in https://bugs.openjdk.org/browse/JDK-8372543. Without fix, tip had ~2910 times gc in 20 sec with `-XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive -XX:SoftMaxHeapSize=512m -Xmx31g` jvm args. With the fix, only 18 times in 20 sec. >> - GHA passed. >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > log and naming fixes Looks good! ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28622#pullrequestreview-3580644369 From duke at openjdk.org Tue Dec 16 04:17:06 2025 From: duke at openjdk.org (duke) Date: Tue, 16 Dec 2025 04:17:06 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v4] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 23:16:35 GMT, Rui Li wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> Tests: >> - Ran the repro app `StableLiveSet.java` in https://bugs.openjdk.org/browse/JDK-8372543. Without fix, tip had ~2910 times gc in 20 sec with `-XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive -XX:SoftMaxHeapSize=512m -Xmx31g` jvm args. With the fix, only 18 times in 20 sec. >> - GHA passed. >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > log and naming fixes @rgithubli Your change (at version 8dea51646e3be9abc7c7e190251762d8b4132b74) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28622#issuecomment-3658683628 From qamai at openjdk.org Tue Dec 16 06:23:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 06:23:14 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v3] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 15:23:23 GMT, Roland Westrelin wrote: >> It is because if a node consumes more memory than it produces, we need to compute its anti-dependencies. And since we do not compute anti-dependencies of these nodes, it is safer to make them kill all the memory they consume. What do you think? > > Could this be fixed by appending a `MemBarCPUOrderNode` on the slice of src? That's a really great idea! I have implemented it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28789#discussion_r2621940305 From qamai at openjdk.org Tue Dec 16 06:23:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 06:23:12 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v3] In-Reply-To: References: Message-ID: > Hi, > > This is extracted from #28570 , there are 2 issues here: > > - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. > - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. > > Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Use MemBar instead of widening the intrinsic memory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28789/files - new: https://git.openjdk.org/jdk/pull/28789/files/1e026354..9649a2f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=01-02 Stats: 62 lines in 3 files changed: 39 ins; 8 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/28789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28789/head:pull/28789 PR: https://git.openjdk.org/jdk/pull/28789 From xpeng at openjdk.org Tue Dec 16 07:05:02 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 16 Dec 2025 07:05:02 GMT Subject: RFR: 8372543: Shenandoah: undercalculated the available size when soft max takes effect [v4] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 23:16:35 GMT, Rui Li wrote: >> Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. >> >> Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: >> >> >> available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used >> soft_tail = Xmx - soft_max >> if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc >> >> >> The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. >> >> >> Suggested fix: when deciding when to trigger gc, use logic similar to below: >> >> mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; >> available = mutator_soft_capacity - used; >> if (available < mutator_soft_capacity) // trigger gc >> ``` >> >> Tests: >> - Ran the repro app `StableLiveSet.java` in https://bugs.openjdk.org/browse/JDK-8372543. Without fix, tip had ~2910 times gc in 20 sec with `-XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive -XX:SoftMaxHeapSize=512m -Xmx31g` jvm args. With the fix, only 18 times in 20 sec. >> - GHA passed. >> >> ------- >> This change also improved gc logging: >> >> Before: >> >> [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) >> [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% >> external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B >> >> >> After: >> >> [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) >> [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: >> 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: >> 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: >> 122, Reserved: 102M, Max free available in a single region: 1024K; > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > log and naming fixes Looks good, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28622#issuecomment-3659113092 From duke at openjdk.org Tue Dec 16 07:05:03 2025 From: duke at openjdk.org (Rui Li) Date: Tue, 16 Dec 2025 07:05:03 GMT Subject: Integrated: 8372543: Shenandoah: undercalculated the available size when soft max takes effect In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 02:02:18 GMT, Rui Li wrote: > Detailed math and repro see https://bugs.openjdk.org/browse/JDK-8372543. > > Currently in shenandoah, when deciding whether to have gc, how we calculate available size is: > > > available = (Xmx * (100 - ShenandoahEvacReserve) / 100) - used > soft_tail = Xmx - soft_max > if (available - soft_tail < ShenandoahMinFreeThreshold * soft_max) // trigger gc > > > The if condition `available - soft_tail` will be reduced to: `-(ShenandoahEvacReserve/100) * Xmx - used + soft_max`, which means when soft max is the same, the larger Xmx is, the less free size the app would have and the more gc it would have, which does not make sense, especially for the case where the app is mostly idle. This caused one of our internal customers experienced frequent gc with minimal workload, when soft max heap size was set way lower than Xmx. > > > Suggested fix: when deciding when to trigger gc, use logic similar to below: > > mutator_soft_capacity = soft_max * (100 - ShenandoahEvacReserve) / 100; > available = mutator_soft_capacity - used; > if (available < mutator_soft_capacity) // trigger gc > ``` > > Tests: > - Ran the repro app `StableLiveSet.java` in https://bugs.openjdk.org/browse/JDK-8372543. Without fix, tip had ~2910 times gc in 20 sec with `-XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive -XX:SoftMaxHeapSize=512m -Xmx31g` jvm args. With the fix, only 18 times in 20 sec. > - GHA passed. > > ------- > This change also improved gc logging: > > Before: > > [6.831s][info][gc ] Trigger: Free (52230K) is below minimum threshold (52428K) > [6.831s][info][gc,free ] Free: 1587M, Max: 1024K regular, 1539M humongous, Frag: 2% > external, 18% internal; Used: 352M, Mutator Free: 1940 Collector Reserve: 103M, Max: 1024K; Used: 0B > > > After: > > [8.358s][info][gc ] Trigger: Free (Soft mutator free) (51498K) is below minimum threshold (52428K) > [8.358s][info][gc,free ] Whole heap stats: Total free: 1509M, Total used: 401M, Max free in a single region: > 1024K, Max humongous: 1490M; Frag stats: External: 0%, Internal: 21%; Mutator freeset stats: Partition count: > 1911, Reserved: 1509M, Max free available in a single region: 1024K; Collector freeset stats: Partition count: > 122, Reserved: 102M, Max free available in a single region: 1024K; This pull request has now been integrated. Changeset: b1e8c4e0 Author: Rui Li Committer: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/b1e8c4e030f42ea3146b2502c9ab030bc79a8147 Stats: 243 lines in 13 files changed: 152 ins; 61 del; 30 mod 8372543: Shenandoah: undercalculated the available size when soft max takes effect Reviewed-by: wkemper, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/28622 From shade at openjdk.org Tue Dec 16 12:00:24 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Dec 2025 12:00:24 GMT Subject: RFR: 8373789: No PCH release build failure after JDK-8372543 Message-ID: Simple missing include. Seems to uniquely affect release builds only, which explains why GHA have not caught it in the PR that regressed it: GHA runs no-PCH builds only for fastdebug. Additional testing: - [x] Ad-hoc builds now pass ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/28843/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28843&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373789 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28843/head:pull/28843 PR: https://git.openjdk.org/jdk/pull/28843 From tschatzl at openjdk.org Tue Dec 16 13:11:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 16 Dec 2025 13:11:51 GMT Subject: RFR: 8373789: No PCH release build failure after JDK-8372543 In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 11:49:15 GMT, Aleksey Shipilev wrote: > Simple missing include. Seems to uniquely affect release builds only, which explains why GHA have not caught it in the PR that regressed it: GHA runs no-PCH builds only for fastdebug. > > Additional testing: > - [x] Ad-hoc builds now pass Lgtm and trivial. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28843#pullrequestreview-3583013556 From shade at openjdk.org Tue Dec 16 13:22:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Dec 2025 13:22:02 GMT Subject: RFR: 8373789: No PCH release build failure after JDK-8372543 In-Reply-To: References: Message-ID: <68IVLDAGw1QAoIcGs2NJldof4crtrni3r6gchE8XDW8=.f0f88d20-6605-4763-ae3d-66268dabad7d@github.com> On Tue, 16 Dec 2025 13:08:54 GMT, Thomas Schatzl wrote: >> Simple missing include. Seems to uniquely affect release builds only, which explains why GHA have not caught it in the PR that regressed it: GHA runs no-PCH builds only for fastdebug. >> >> Additional testing: >> - [x] Ad-hoc builds now pass > > Lgtm and trivial. Thanks for quick review, @tschatzl! GHA builds have passed. I expect no test failures. So, here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28843#issuecomment-3660484191 From shade at openjdk.org Tue Dec 16 13:22:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Dec 2025 13:22:04 GMT Subject: Integrated: 8373789: No PCH release build failure after JDK-8372543 In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 11:49:15 GMT, Aleksey Shipilev wrote: > Simple missing include. Seems to uniquely affect release builds only, which explains why GHA have not caught it in the PR that regressed it: GHA runs no-PCH builds only for fastdebug. > > Additional testing: > - [x] Ad-hoc builds now pass This pull request has now been integrated. Changeset: a61394b1 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/a61394b1da40cfbb617fec35553da2d3c3e27d37 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8373789: No PCH release build failure after JDK-8372543 Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/28843 From kdnilsen at openjdk.org Tue Dec 16 14:53:41 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 Dec 2025 14:53:41 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 23:53:30 GMT, William Kemper wrote: >> Add a triggering penalty when we execute degenerated GC cycle. > > src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 314: > >> 312: ShenandoahCollectorPolicy* policy = heap->shenandoah_policy(); >> 313: policy->record_degenerated(_generation->is_young(), _abbreviated, progress); >> 314: if (progress || (heap->mode()->is_generational() && !policy->generational_should_upgrade_degenerated_gc())) { > > I'm not sure why we want to change the logic here. Previously, if there was no progress, and we were in generational mode and the policy said _not_ to upgrade to a full GC, nothing would happen here. Now we are treating that case the same as if there were progress, when there might not have been. `heap->notifiy_gc_progress()` resets the `no progress counter`, which has implications for how many times Shenandoah will retry allocations. We need to be careful to avoid an allocation loop that just runs degenerated cycles forever (brownout). This "avoids" a third arm of the if-then-else (rather than if-then-else-if-then-else-if). In the case that we do upgrade to full gc, we'll experience a full-gc penalty. In the case that we do not upgrade to full-gc following failed degen progress, we need "somehow" to incur the degen penalty. This is how I proposed to do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2623601570 From stefank at openjdk.org Tue Dec 16 14:56:15 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 Dec 2025 14:56:15 GMT Subject: RFR: 8373801: Adopt arraycopy OopCopyResult from the lworld branch Message-ID: In the Valhalla project there's code to restrict nulls in object arrays. In the GC arraycopy barrier we there have to both do null checks in addition to the already existing cast checks. If one of these two failed the code needs to propagate this up through the callers to the code that asked to do the checks. There it will throw the suitable exception. Previously, it was enough to say success or failed, and a boolean was enough. Now we need three states. So the bool was replaced with an OopCopyResult enum. See: https://github.com/openjdk/valhalla/pull/1792 I propose that we bring this over to the mainline to lower the diff between lworld and the openjdk/jdk. ------------- Commit messages: - 8373801: Adopt arraycopy OopCopyResult from the lworld branch Changes: https://git.openjdk.org/jdk/pull/28850/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28850&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373801 Stats: 199 lines in 12 files changed: 70 ins; 13 del; 116 mod Patch: https://git.openjdk.org/jdk/pull/28850.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28850/head:pull/28850 PR: https://git.openjdk.org/jdk/pull/28850 From stefank at openjdk.org Tue Dec 16 15:04:44 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 Dec 2025 15:04:44 GMT Subject: RFR: 8373801: Adopt arraycopy OopCopyResult from the lworld branch [v2] In-Reply-To: References: Message-ID: > In the Valhalla project there's code to restrict nulls in object arrays. In the GC arraycopy barrier we there have to both do null checks in addition to the already existing cast checks. If one of these two failed the code needs to propagate this up through the callers to the code that asked to do the checks. There it will throw the suitable exception. Previously, it was enough to say success or failed, and a boolean was enough. Now we need three states. So the bool was replaced with an OopCopyResult enum. > > See: https://github.com/openjdk/valhalla/pull/1792 > > I propose that we bring this over to the mainline to lower the diff between lworld and the openjdk/jdk. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Remove throw NPE function ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28850/files - new: https://git.openjdk.org/jdk/pull/28850/files/41ff47bd..25c24e3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28850&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28850&range=00-01 Stats: 9 lines in 1 file changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28850.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28850/head:pull/28850 PR: https://git.openjdk.org/jdk/pull/28850 From kdnilsen at openjdk.org Tue Dec 16 15:12:04 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 Dec 2025 15:12:04 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 14:51:08 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 314: >> >>> 312: ShenandoahCollectorPolicy* policy = heap->shenandoah_policy(); >>> 313: policy->record_degenerated(_generation->is_young(), _abbreviated, progress); >>> 314: if (progress || (heap->mode()->is_generational() && !policy->generational_should_upgrade_degenerated_gc())) { >> >> I'm not sure why we want to change the logic here. Previously, if there was no progress, and we were in generational mode and the policy said _not_ to upgrade to a full GC, nothing would happen here. Now we are treating that case the same as if there were progress, when there might not have been. `heap->notifiy_gc_progress()` resets the `no progress counter`, which has implications for how many times Shenandoah will retry allocations. We need to be careful to avoid an allocation loop that just runs degenerated cycles forever (brownout). > > This "avoids" a third arm of the if-then-else (rather than if-then-else-if-then-else-if). In the case that we do upgrade to full gc, we'll experience a full-gc penalty. In the case that we do not upgrade to full-gc following failed degen progress, we need "somehow" to incur the degen penalty. This is how I proposed to do that. I see this composition of the code results in unclear intentions. Before the if test, we tell policy that we had a degenerated gc with "no progress". But inside the if-body, we tell the generation heuristics that we had a successful degenerated cycle. So I should restructure this code. What I was trying to fix is that in the previous code, we never imposed a Degen Penalty if we had no degen progress and we did not upgrade to full GC. Thanks for this catch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2623680165 From kdnilsen at openjdk.org Tue Dec 16 15:24:04 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 Dec 2025 15:24:04 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 23:58:09 GMT, William Kemper wrote: > We have to be careful not to revert: https://bugs.openjdk.org/browse/JDK-8368152. Thanks for heads up on this issue. So will we be ok as long as we do not "accidentally" treat a no-progress degen as if it experienced progress? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28834#issuecomment-3661093957 From roland at openjdk.org Tue Dec 16 16:54:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Dec 2025 16:54:02 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v3] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 06:23:12 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is extracted from #28570 , there are 2 issues here: >> >> - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. >> - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. >> >> Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Use MemBar instead of widening the intrinsic memory src/hotspot/share/opto/graphKit.cpp line 4210: > 4208: // StoreC -> MemBar -> MergeMem -> compress_string -> MergeMem -> CharMem > 4209: // --------------------------------> > 4210: Node* all_mem = reset_memory(); This code sequence is used several times. Would it make sense to factor it out in its own method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28789#discussion_r2624030618 From kdnilsen at openjdk.org Tue Dec 16 16:57:53 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 Dec 2025 16:57:53 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v2] In-Reply-To: References: Message-ID: > Add a triggering penalty when we execute degenerated GC cycle. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: refactor for reviewer requests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28834/files - new: https://git.openjdk.org/jdk/pull/28834/files/d67ca8e5..3f12ff15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=00-01 Stats: 28 lines in 7 files changed: 25 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28834/head:pull/28834 PR: https://git.openjdk.org/jdk/pull/28834 From kdnilsen at openjdk.org Tue Dec 16 17:28:03 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 Dec 2025 17:28:03 GMT Subject: RFR: 8373720: GenShen: Count live-at-old mark using Snapshot at Beginning [v4] In-Reply-To: References: Message-ID: > Live memory in old is measured as of the start of old-generation concurrent marking. > > Memory promoted during concurrent old marking (memory above TAMS for the old-generation heap regions) and memory regions promoted in place during concurrent old marking are excluded from the total live data at start of old marking. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge remote-tracking branch 'jdk/master' into fix-live-data-at-old-mark-gh - fix up comments - Merge remote-tracking branch 'jdk/master' into fix-live-data-at-old-mark - do not count promoted-in-place live data toward TAMS old live data ------------- Changes: https://git.openjdk.org/jdk/pull/28837/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28837&range=03 Stats: 19 lines in 4 files changed: 16 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28837.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28837/head:pull/28837 PR: https://git.openjdk.org/jdk/pull/28837 From wkemper at openjdk.org Tue Dec 16 17:53:35 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 16 Dec 2025 17:53:35 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v2] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 16:57:53 GMT, Kelvin Nilsen wrote: >> Add a triggering penalty when we execute degenerated GC cycle. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > refactor for reviewer requests src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 322: > 320: } else { > 321: _generation->heuristics()->record_unsuccessful_degenerated(); > 322: } Suggestion: policy->record_degenerated(_generation->is_young(), _abbreviated, progress); _generation->heuristics()->record_success_degenerated(); if (progress) { heap->notify_gc_progress(); } else if (!heap->mode()->is_generational() || policy->generational_should_upgrade_degenerated_gc()) { // Upgrade to full GC, register full-GC impact on heuristics. op_degenerated_futile(); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2624207680 From wkemper at openjdk.org Tue Dec 16 18:31:08 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 16 Dec 2025 18:31:08 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v2] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 16:57:53 GMT, Kelvin Nilsen wrote: >> Add a triggering penalty when we execute degenerated GC cycle. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > refactor for reviewer requests Suggest we either rename `record_successful_degenerated` -> `record_degenerated`, or just reuse it. src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 321: > 319: op_degenerated_futile(); > 320: } else { > 321: _generation->heuristics()->record_unsuccessful_degenerated(); Suggestion: _generation->heuristics()->record_successful_degenerated(); I think the confusion here is that we are conflating `progress` and `success`. The "progress" notion here is about triggering a full GC or giving up entirely. The degenerated cycle is "successful" because it did not run a full GC. Maybe we should rename `record_successful_degenerated` to `record_degenerated` (or, perhaps even `apply_degenerated_penalty`). I was about to suggest we pull `record_success_degenerated` out of the logic entirely, but that would mean upgraded degen cycles would be penalized again when the full GC completes. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28834#pullrequestreview-3584439536 PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2624308749 From wkemper at openjdk.org Tue Dec 16 19:43:49 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 16 Dec 2025 19:43:49 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v2] In-Reply-To: References: Message-ID: > The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. > > When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). > > To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. > > This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Heal old discovered lists in parallel - Fix comment - Factor duplicate code into shared method - Heal discovered oops in common place for degen and concurrent update refs - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Clear bootstrap mode for full GC that might have bypassed degenerated cycle - Do not bypass card barrier when healing discovered list - Consolidate management of bootstrap cycle configuration - Use Events::log, not Event for simple log messages - Oops, change name of class in test xdoc - ... and 6 more: https://git.openjdk.org/jdk/compare/3f36a04d...1aaa4bfb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28810/files - new: https://git.openjdk.org/jdk/pull/28810/files/f8eb0cae..1aaa4bfb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=00-01 Stats: 312435 lines in 3092 files changed: 209779 ins; 60024 del; 42632 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From ysr at openjdk.org Tue Dec 16 23:15:22 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 16 Dec 2025 23:15:22 GMT Subject: RFR: 8373720: GenShen: Count live-at-old mark using Snapshot at Beginning [v4] In-Reply-To: References: Message-ID: <9tpC1gt4kkX4Y7ofFW6NZcHbjdx-eTuIxDsMV8kDctI=.b73fac85-6573-4381-b2d5-01063f9cceb5@github.com> On Tue, 16 Dec 2025 17:28:03 GMT, Kelvin Nilsen wrote: >> Live memory in old is measured as of the start of old-generation concurrent marking. >> >> Memory promoted during concurrent old marking (memory above TAMS for the old-generation heap regions) and memory regions promoted in place during concurrent old marking are excluded from the total live data at start of old marking. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge remote-tracking branch 'jdk/master' into fix-live-data-at-old-mark-gh > - fix up comments > - Merge remote-tracking branch 'jdk/master' into fix-live-data-at-old-mark > - do not count promoted-in-place live data toward TAMS old live data Looks good to me. Just curious if it changes performance with any of the standard benchmarks? ? src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 390: > 388: inline bool has_live() const; > 389: inline size_t get_live_data_bytes() const; > 390: inline size_t get_live_data_words() const; May be add a 1-line documentation for this API stating that this is the total size of objects under TAMS marked by the most recent marking cycle for that region. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28837#pullrequestreview-3585330418 PR Review Comment: https://git.openjdk.org/jdk/pull/28837#discussion_r2625048418 From ysr at openjdk.org Tue Dec 16 23:33:40 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 16 Dec 2025 23:33:40 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v2] In-Reply-To: References: Message-ID: <_QnV10ychv2AQj3TN6gch1p8B-OGMTsN6FTcbBJSn9U=.cba48ddd-cb45-4e25-9633-5ef7e9cfa4ea@github.com> On Tue, 16 Dec 2025 18:27:25 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor for reviewer requests > > src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 321: > >> 319: op_degenerated_futile(); >> 320: } else { >> 321: _generation->heuristics()->record_unsuccessful_degenerated(); > > Suggestion: > > _generation->heuristics()->record_successful_degenerated(); > > I think the confusion here is that we are conflating `progress` and `success`. The "progress" notion here is about triggering a full GC or giving up entirely. The degenerated cycle is "successful" because it did not run a full GC. Maybe we should rename `record_successful_degenerated` to `record_degenerated` (or, perhaps even `apply_degenerated_penalty`). I was about to suggest we pull `record_success_degenerated` out of the logic entirely, but that would mean upgraded degen cycles would be penalized again when the full GC completes. May be let the heuristics (or the policy) track progress as well, and inform the actuator (i.e. op degenerated) whether it should upgrade to a full gc. It almost feels like heuristics and policy and actuator are leaking abstractions. It feels like heuristics keep track of the model parameters and learn from sensors, and the policy consults a specific heuristic to inform actuator (i.e. actions). By that model, you'd have the actuator sending the sensor information to the heuristics and asking the policy (or the heuristics, if you conflate heuristics and policy) to decide which step to take next. It would seem that evaluation of the notion of progress then moves to the policy too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2625082943 From wkemper at openjdk.org Wed Dec 17 00:29:18 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 17 Dec 2025 00:29:18 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v3] In-Reply-To: References: Message-ID: > The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. > > When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). > > To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. > > This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Sort includes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28810/files - new: https://git.openjdk.org/jdk/pull/28810/files/1aaa4bfb..bc42a6ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From stefank at openjdk.org Wed Dec 17 07:58:55 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 17 Dec 2025 07:58:55 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v3] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 00:29:18 GMT, William Kemper wrote: >> The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. >> >> When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). >> >> To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. >> >> This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Sort includes I added a few style proposals to make the Shenandoah sections look more consistent with the surrounding style in the shared code. src/hotspot/share/prims/whitebox.cpp line 704: > 702: #endif // INCLUDE_G1GC > 703: > 704: #if INCLUDE_SHENANDOAHGC Suggestion: #if INCLUDE_SHENANDOAHGC src/hotspot/share/prims/whitebox.cpp line 741: > 739: WB_END > 740: > 741: #endif Suggestion: #endif // INCLUDE_SHENANDOAHGC src/hotspot/share/prims/whitebox.cpp line 2939: > 2937: > 2938: #endif > 2939: Suggestion: #endif test/lib/jdk/test/whitebox/WhiteBox.java line 317: > 315: public native long[] g1GetMixedGCInfo(int liveness); > 316: > 317: public native int shenandoahRegionSize(); Suggestion: // Shenandoah public native int shenandoahRegionSize(); ------------- PR Review: https://git.openjdk.org/jdk/pull/28810#pullrequestreview-3586442631 PR Review Comment: https://git.openjdk.org/jdk/pull/28810#discussion_r2625971356 PR Review Comment: https://git.openjdk.org/jdk/pull/28810#discussion_r2625972227 PR Review Comment: https://git.openjdk.org/jdk/pull/28810#discussion_r2625970693 PR Review Comment: https://git.openjdk.org/jdk/pull/28810#discussion_r2625975644 From stefank at openjdk.org Wed Dec 17 09:41:45 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 17 Dec 2025 09:41:45 GMT Subject: RFR: 8373801: Adopt arraycopy OopCopyResult from the lworld branch [v2] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 15:04:44 GMT, Stefan Karlsson wrote: >> In the Valhalla project there's code to restrict nulls in object arrays. In the GC arraycopy barrier we there have to both do null checks in addition to the already existing cast checks. If one of these two failed the code needs to propagate this up through the callers to the code that asked to do the checks. There it will throw the suitable exception. Previously, it was enough to say success or failed, and a boolean was enough. Now we need three states. So the bool was replaced with an OopCopyResult enum. >> >> See: https://github.com/openjdk/valhalla/pull/1792 >> >> I propose that we bring this over to the mainline to lower the diff between lworld and the openjdk/jdk. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Remove throw NPE function Tier1 testing passes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28850#issuecomment-3664510129 From tschatzl at openjdk.org Wed Dec 17 15:42:24 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 17 Dec 2025 15:42:24 GMT Subject: RFR: 8373801: Adopt arraycopy OopCopyResult from the lworld branch [v2] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 15:04:44 GMT, Stefan Karlsson wrote: >> In the Valhalla project there's code to restrict nulls in object arrays. In the GC arraycopy barrier we there have to both do null checks in addition to the already existing cast checks. If one of these two failed the code needs to propagate this up through the callers to the code that asked to do the checks. There it will throw the suitable exception. Previously, it was enough to say success or failed, and a boolean was enough. Now we need three states. So the bool was replaced with an OopCopyResult enum. >> >> See: https://github.com/openjdk/valhalla/pull/1792 >> >> I propose that we bring this over to the mainline to lower the diff between lworld and the openjdk/jdk. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Remove throw NPE function Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28850#pullrequestreview-3588346814 From kdnilsen at openjdk.org Wed Dec 17 16:04:07 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 17 Dec 2025 16:04:07 GMT Subject: RFR: 8373720: GenShen: Count live-at-old mark using Snapshot at Beginning [v5] In-Reply-To: References: Message-ID: > Live memory in old is measured as of the start of old-generation concurrent marking. > > Memory promoted during concurrent old marking (memory above TAMS for the old-generation heap regions) and memory regions promoted in place during concurrent old marking are excluded from the total live data at start of old marking. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Add some comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28837/files - new: https://git.openjdk.org/jdk/pull/28837/files/273a857e..a30c2fd5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28837&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28837&range=03-04 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28837.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28837/head:pull/28837 PR: https://git.openjdk.org/jdk/pull/28837 From kdnilsen at openjdk.org Wed Dec 17 16:04:08 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 17 Dec 2025 16:04:08 GMT Subject: RFR: 8373720: GenShen: Count live-at-old mark using Snapshot at Beginning [v4] In-Reply-To: <9tpC1gt4kkX4Y7ofFW6NZcHbjdx-eTuIxDsMV8kDctI=.b73fac85-6573-4381-b2d5-01063f9cceb5@github.com> References: <9tpC1gt4kkX4Y7ofFW6NZcHbjdx-eTuIxDsMV8kDctI=.b73fac85-6573-4381-b2d5-01063f9cceb5@github.com> Message-ID: On Tue, 16 Dec 2025 23:12:19 GMT, Y. Srinivas Ramakrishna wrote: > Looks good to me. Just curious if it changes performance with any of the standard benchmarks? > I don't see measurable impact on any standard benchmarks. I have witnessed situations when studying GC logs where the live data at previous old GC has been somewhat unreliable due to very long (lots of interruptions) old marking efforts and lots of floating garbage. > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 390: > >> 388: inline bool has_live() const; >> 389: inline size_t get_live_data_bytes() const; >> 390: inline size_t get_live_data_words() const; > > May be add a 1-line documentation for this API stating that this is the total size of objects under TAMS marked by the most recent marking cycle for that region. Oh, and may be state when it's valid to read its value for a region (i.e. between marking cycles for that region). (Random thought: can promotion in place just clear this value at promotion and save the check you have for old gen, or could that cause issues elsewhere?) Thanks. I've added some comments. I think we don't want to overwrite live-data at promote-in-place time. It seems that information can be useful to other heuristics, though I don't think we make use of it at the moment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28837#issuecomment-3666015960 PR Review Comment: https://git.openjdk.org/jdk/pull/28837#discussion_r2627629678 From wkemper at openjdk.org Wed Dec 17 19:39:57 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 17 Dec 2025 19:39:57 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v4] In-Reply-To: References: Message-ID: > The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. > > When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). > > To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. > > This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Fix idiosyncratic white space in whitebox Co-authored-by: Stefan Karlsson ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28810/files - new: https://git.openjdk.org/jdk/pull/28810/files/bc42a6ee..74b7307c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=02-03 Stats: 6 lines in 2 files changed: 3 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From ysr at openjdk.org Wed Dec 17 22:24:44 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 17 Dec 2025 22:24:44 GMT Subject: RFR: 8373720: GenShen: Count live-at-old mark using Snapshot at Beginning [v5] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 16:04:07 GMT, Kelvin Nilsen wrote: >> Live memory in old is measured as of the start of old-generation concurrent marking. >> >> Memory promoted during concurrent old marking (memory above TAMS for the old-generation heap regions) and memory regions promoted in place during concurrent old marking are excluded from the total live data at start of old marking. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Add some comments LGTM ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28837#pullrequestreview-3589916438 From kdnilsen at openjdk.org Wed Dec 17 22:24:45 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 17 Dec 2025 22:24:45 GMT Subject: Integrated: 8373720: GenShen: Count live-at-old mark using Snapshot at Beginning In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 23:32:42 GMT, Kelvin Nilsen wrote: > Live memory in old is measured as of the start of old-generation concurrent marking. > > Memory promoted during concurrent old marking (memory above TAMS for the old-generation heap regions) and memory regions promoted in place during concurrent old marking are excluded from the total live data at start of old marking. This pull request has now been integrated. Changeset: 17d633a8 Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/17d633a8ee7538625501a90469cb6a68b9ba4820 Stats: 28 lines in 4 files changed: 25 ins; 2 del; 1 mod 8373720: GenShen: Count live-at-old mark using Snapshot at Beginning Reviewed-by: ysr ------------- PR: https://git.openjdk.org/jdk/pull/28837 From roland at openjdk.org Thu Dec 18 08:40:00 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 18 Dec 2025 08:40:00 GMT Subject: [jdk26] RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs Message-ID: Hi all, This pull request contains a backport of commit [00068a80](https://github.com/openjdk/jdk/commit/00068a80304a809297d0df8698850861e9a1c5e9) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Roland Westrelin on 10 Dec 2025 and was reviewed by Christian Hagedorn, Quan Anh Mai, Galder Zamarre?o and Emanuel Peter. Thanks! ------------- Commit messages: - Backport 00068a80304a809297d0df8698850861e9a1c5e9 Changes: https://git.openjdk.org/jdk/pull/28892/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28892&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354282 Stats: 367 lines in 13 files changed: 266 ins; 27 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/28892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28892/head:pull/28892 PR: https://git.openjdk.org/jdk/pull/28892 From duke at openjdk.org Thu Dec 18 20:53:46 2025 From: duke at openjdk.org (duke) Date: Thu, 18 Dec 2025 20:53:46 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking [v5] In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 23:31:41 GMT, Nityanand Rai wrote: >> Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. > > Nityanand Rai has updated the pull request incrementally with one additional commit since the last revision: > > hardening of comments > > remove unintended files @nityarai08 Your change (at version 4054a52e7791a9237ee742c62fdb3c97164f5a25) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28204#issuecomment-3672168308 From duke at openjdk.org Thu Dec 18 21:18:36 2025 From: duke at openjdk.org (Nityanand Rai) Date: Thu, 18 Dec 2025 21:18:36 GMT Subject: Integrated: 8371284: GenShen: Avoid unnecessary card marking In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 20:42:25 GMT, Nityanand Rai wrote: > Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. This pull request has now been integrated. Changeset: 8a93658e Author: Nityanand Rai Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/8a93658e87e2e2f344d7dbfa6f916bd28175d013 Stats: 14 lines in 1 file changed: 14 ins; 0 del; 0 mod 8371284: GenShen: Avoid unnecessary card marking Reviewed-by: wkemper, shade, ysr ------------- PR: https://git.openjdk.org/jdk/pull/28204 From wkemper at openjdk.org Thu Dec 18 21:58:19 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 18 Dec 2025 21:58:19 GMT Subject: RFR: 8374048: Genshen: Backout fix for missed cancellation notice Message-ID: The "fix" for the relatively infrequent bug: https://bugs.openjdk.org/browse/JDK-8373100, is causing a frequent `nullptr` error. We are backing out the problematic fix while we work on a better solution. ------------- Commit messages: - Revert "8373100: Genshen: Control thread can miss allocation failure notification" Changes: https://git.openjdk.org/jdk/pull/28910/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28910&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374048 Stats: 23 lines in 2 files changed: 8 ins; 4 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/28910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28910/head:pull/28910 PR: https://git.openjdk.org/jdk/pull/28910 From kdnilsen at openjdk.org Thu Dec 18 22:48:43 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 18 Dec 2025 22:48:43 GMT Subject: RFR: 8374048: Genshen: Backout fix for missed cancellation notice In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 21:51:54 GMT, William Kemper wrote: > The "fix" for the relatively infrequent bug: https://bugs.openjdk.org/browse/JDK-8373100, is causing a frequent `nullptr` error. We are backing out the problematic fix while we work on a better solution. Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28910#pullrequestreview-3595519776 From ysr at openjdk.org Thu Dec 18 23:28:41 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 18 Dec 2025 23:28:41 GMT Subject: RFR: 8374048: Genshen: Backout fix for missed cancellation notice In-Reply-To: References: Message-ID: <5zftusgjr9Nj1B2tdwbaU9a5q-OxcsPM8ItHAI3jI8A=.426ca5ab-f05d-4f21-9c9c-3b2e1a1b6d81@github.com> On Thu, 18 Dec 2025 21:51:54 GMT, William Kemper wrote: > The "fix" for the relatively infrequent bug: https://bugs.openjdk.org/browse/JDK-8373100, is causing a frequent `nullptr` error. We are backing out the problematic fix while we work on a better solution. Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28910#pullrequestreview-3595644797 From kdnilsen at openjdk.org Thu Dec 18 23:57:28 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 18 Dec 2025 23:57:28 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v4] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 19:39:57 GMT, William Kemper wrote: >> The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. >> >> When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). >> >> To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. >> >> This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix idiosyncratic white space in whitebox > > Co-authored-by: Stefan Karlsson Thanks for chasing this problem down. Very important fix. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28810#pullrequestreview-3595738667 From aboldtch at openjdk.org Fri Dec 19 08:42:50 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 19 Dec 2025 08:42:50 GMT Subject: RFR: 8373801: Adopt arraycopy OopCopyResult from the lworld branch [v2] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 15:04:44 GMT, Stefan Karlsson wrote: >> In the Valhalla project there's code to restrict nulls in object arrays. In the GC arraycopy barrier we there have to both do null checks in addition to the already existing cast checks. If one of these two failed the code needs to propagate this up through the callers to the code that asked to do the checks. There it will throw the suitable exception. Previously, it was enough to say success or failed, and a boolean was enough. Now we need three states. So the bool was replaced with an OopCopyResult enum. >> >> See: https://github.com/openjdk/valhalla/pull/1792 >> >> I propose that we bring this over to the mainline to lower the diff between lworld and the openjdk/jdk. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Remove throw NPE function lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28850#pullrequestreview-3597566345 From jsikstro at openjdk.org Fri Dec 19 09:07:46 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 19 Dec 2025 09:07:46 GMT Subject: RFR: 8373801: Adopt arraycopy OopCopyResult from the lworld branch [v2] In-Reply-To: References: Message-ID: <7RIWg55D2iB93_2D8gGiB5XRwuIjjLH0Pi07F5r4j80=.4e83979d-466e-4095-9268-a761a7a14d4f@github.com> On Tue, 16 Dec 2025 15:04:44 GMT, Stefan Karlsson wrote: >> In the Valhalla project there's code to restrict nulls in object arrays. In the GC arraycopy barrier we there have to both do null checks in addition to the already existing cast checks. If one of these two failed the code needs to propagate this up through the callers to the code that asked to do the checks. There it will throw the suitable exception. Previously, it was enough to say success or failed, and a boolean was enough. Now we need three states. So the bool was replaced with an OopCopyResult enum. >> >> See: https://github.com/openjdk/valhalla/pull/1792 >> >> I propose that we bring this over to the mainline to lower the diff between lworld and the openjdk/jdk. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Remove throw NPE function Looks good. ------------- Marked as reviewed by jsikstro (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28850#pullrequestreview-3597654894 From stefank at openjdk.org Fri Dec 19 09:47:10 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 19 Dec 2025 09:47:10 GMT Subject: RFR: 8373801: Adopt arraycopy OopCopyResult from the lworld branch [v3] In-Reply-To: References: Message-ID: > In the Valhalla project there's code to restrict nulls in object arrays. In the GC arraycopy barrier we there have to both do null checks in addition to the already existing cast checks. If one of these two failed the code needs to propagate this up through the callers to the code that asked to do the checks. There it will throw the suitable exception. Previously, it was enough to say success or failed, and a boolean was enough. Now we need three states. So the bool was replaced with an OopCopyResult enum. > > See: https://github.com/openjdk/valhalla/pull/1792 > > I propose that we bring this over to the mainline to lower the diff between lworld and the openjdk/jdk. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28850/files - new: https://git.openjdk.org/jdk/pull/28850/files/25c24e3d..5825d92a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28850&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28850&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28850.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28850/head:pull/28850 PR: https://git.openjdk.org/jdk/pull/28850 From jsikstro at openjdk.org Fri Dec 19 09:47:11 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 19 Dec 2025 09:47:11 GMT Subject: RFR: 8373801: Adopt arraycopy OopCopyResult from the lworld branch [v3] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 09:43:55 GMT, Stefan Karlsson wrote: >> In the Valhalla project there's code to restrict nulls in object arrays. In the GC arraycopy barrier we there have to both do null checks in addition to the already existing cast checks. If one of these two failed the code needs to propagate this up through the callers to the code that asked to do the checks. There it will throw the suitable exception. Previously, it was enough to say success or failed, and a boolean was enough. Now we need three states. So the bool was replaced with an OopCopyResult enum. >> >> See: https://github.com/openjdk/valhalla/pull/1792 >> >> I propose that we bring this over to the mainline to lower the diff between lworld and the openjdk/jdk. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Review comment Marked as reviewed by jsikstro (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28850#pullrequestreview-3597794124 From stefank at openjdk.org Fri Dec 19 10:55:51 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 19 Dec 2025 10:55:51 GMT Subject: RFR: 8373801: Adopt arraycopy OopCopyResult from the lworld branch [v3] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 09:47:10 GMT, Stefan Karlsson wrote: >> In the Valhalla project there's code to restrict nulls in object arrays. In the GC arraycopy barrier we there have to both do null checks in addition to the already existing cast checks. If one of these two failed the code needs to propagate this up through the callers to the code that asked to do the checks. There it will throw the suitable exception. Previously, it was enough to say success or failed, and a boolean was enough. Now we need three states. So the bool was replaced with an OopCopyResult enum. >> >> See: https://github.com/openjdk/valhalla/pull/1792 >> >> I propose that we bring this over to the mainline to lower the diff between lworld and the openjdk/jdk. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Review comment Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28850#issuecomment-3674585869 From stefank at openjdk.org Fri Dec 19 10:55:52 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 19 Dec 2025 10:55:52 GMT Subject: Integrated: 8373801: Adopt arraycopy OopCopyResult from the lworld branch In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 14:47:15 GMT, Stefan Karlsson wrote: > In the Valhalla project there's code to restrict nulls in object arrays. In the GC arraycopy barrier we there have to both do null checks in addition to the already existing cast checks. If one of these two failed the code needs to propagate this up through the callers to the code that asked to do the checks. There it will throw the suitable exception. Previously, it was enough to say success or failed, and a boolean was enough. Now we need three states. So the bool was replaced with an OopCopyResult enum. > > See: https://github.com/openjdk/valhalla/pull/1792 > > I propose that we bring this over to the mainline to lower the diff between lworld and the openjdk/jdk. This pull request has now been integrated. Changeset: 53e77d21 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/53e77d21c2308daad7d4aecf05da56609ed0291c Stats: 192 lines in 12 files changed: 63 ins; 13 del; 116 mod 8373801: Adopt arraycopy OopCopyResult from the lworld branch Reviewed-by: jsikstro, tschatzl, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/28850 From stefank at openjdk.org Fri Dec 19 14:30:05 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 19 Dec 2025 14:30:05 GMT Subject: RFR: 8374145: Remove legacy locking remnants from markWord Message-ID: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> There's a bunch of unused code in markWord that used to be used by the removed legacy locking. I propose that we remove it. There's a Shenandoah change that should be checked closer. My thinking is: `is_being_inflated`: always returns false `has_displaced_mark_helper`: Only returns true if both these are true: * `!UseCompactObjectTable` - there's an early return for this earlier in the function. * `lockbits == monitor_value` - checked in the if-statement above ------------- Commit messages: - 8374145: Remove legacy locking remnants from markWord Changes: https://git.openjdk.org/jdk/pull/28927/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28927&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374145 Stats: 33 lines in 2 files changed: 0 ins; 32 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28927.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28927/head:pull/28927 PR: https://git.openjdk.org/jdk/pull/28927 From aboldtch at openjdk.org Fri Dec 19 14:30:06 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 19 Dec 2025 14:30:06 GMT Subject: RFR: 8374145: Remove legacy locking remnants from markWord In-Reply-To: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> References: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> Message-ID: <2Xbj1QOxPiOM5_e4mNuEbQpn2VFxwJ19xXpBnAy2_T8=.7d5d7174-3a37-464b-a25c-f89d7a5028f5@github.com> On Fri, 19 Dec 2025 14:05:13 GMT, Stefan Karlsson wrote: > There's a bunch of unused code in markWord that used to be used by the removed legacy locking. I propose that we remove it. > > There's a Shenandoah change that should be checked closer. My thinking is: > `is_being_inflated`: always returns false > `has_displaced_mark_helper`: Only returns true if both these are true: > * `!UseCompactObjectTable` - there's an early return for this earlier in the function. > * `lockbits == monitor_value` - checked in the if-statement above Looks good. Just one thought. src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 349: > 347: } else if (w.is_being_inflated() || w.has_displaced_mark_helper()) { > 348: // Informs caller that we aren't able to determine the age > 349: return markWord::max_age + 1; // sentinel Not sure if we want an `!w.has_displaced_mark_helper()` assert. Even if we already check for (and handled) the exact conditions (in the current implementation) for `has_displaced_mark_helper()`. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28927#pullrequestreview-3598758617 PR Review Comment: https://git.openjdk.org/jdk/pull/28927#discussion_r2635220791 From stefank at openjdk.org Fri Dec 19 14:30:07 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 19 Dec 2025 14:30:07 GMT Subject: RFR: 8374145: Remove legacy locking remnants from markWord In-Reply-To: <2Xbj1QOxPiOM5_e4mNuEbQpn2VFxwJ19xXpBnAy2_T8=.7d5d7174-3a37-464b-a25c-f89d7a5028f5@github.com> References: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> <2Xbj1QOxPiOM5_e4mNuEbQpn2VFxwJ19xXpBnAy2_T8=.7d5d7174-3a37-464b-a25c-f89d7a5028f5@github.com> Message-ID: On Fri, 19 Dec 2025 14:10:24 GMT, Axel Boldt-Christmas wrote: >> There's a bunch of unused code in markWord that used to be used by the removed legacy locking. I propose that we remove it. >> >> There's a Shenandoah change that should be checked closer. My thinking is: >> `is_being_inflated`: always returns false >> `has_displaced_mark_helper`: Only returns true if both these are true: >> * `!UseCompactObjectTable` - there's an early return for this earlier in the function. >> * `lockbits == monitor_value` - checked in the if-statement above > > src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 349: > >> 347: } else if (w.is_being_inflated() || w.has_displaced_mark_helper()) { >> 348: // Informs caller that we aren't able to determine the age >> 349: return markWord::max_age + 1; // sentinel > > Not sure if we want an `!w.has_displaced_mark_helper()` assert. Even if we already check for (and handled) the exact conditions (in the current implementation) for `has_displaced_mark_helper()`. That would work for me. I'll wait for input from the Shenandoah devs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28927#discussion_r2635237702 From wkemper at openjdk.org Fri Dec 19 18:04:49 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 19 Dec 2025 18:04:49 GMT Subject: Integrated: 8374048: Genshen: Backout fix for missed cancellation notice In-Reply-To: References: Message-ID: <7aOseFLpZlK5kTyw-ZLucN3IYPDnH3z3-Cd7_FKeiYo=.bd8ddc21-ac84-4649-95da-c21fd596d91f@github.com> On Thu, 18 Dec 2025 21:51:54 GMT, William Kemper wrote: > The "fix" for the relatively infrequent bug: https://bugs.openjdk.org/browse/JDK-8373100, is causing a frequent `nullptr` error. We are backing out the problematic fix while we work on a better solution. This pull request has now been integrated. Changeset: c1ad393e Author: William Kemper URL: https://git.openjdk.org/jdk/commit/c1ad393e25c253c9b4e09824bf5fceee134e08c0 Stats: 23 lines in 2 files changed: 8 ins; 4 del; 11 mod 8374048: Genshen: Backout fix for missed cancellation notice Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/28910 From kbarrett at openjdk.org Fri Dec 19 18:08:28 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 19 Dec 2025 18:08:28 GMT Subject: RFR: 8374145: Remove legacy locking remnants from markWord In-Reply-To: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> References: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> Message-ID: On Fri, 19 Dec 2025 14:05:13 GMT, Stefan Karlsson wrote: > There's a bunch of unused code in markWord that used to be used by the removed legacy locking. I propose that we remove it. > > There's a Shenandoah change that should be checked closer. My thinking is: > `is_being_inflated`: always returns false > `has_displaced_mark_helper`: Only returns true if both these are true: > * `!UseCompactObjectTable` - there's an early return for this earlier in the function. > * `lockbits == monitor_value` - checked in the if-statement above Looks good. No opinion about @xmas92 suggestion in the shenandoah change. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28927#pullrequestreview-3599550836 From wkemper at openjdk.org Fri Dec 19 19:02:13 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 19 Dec 2025 19:02:13 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v5] In-Reply-To: References: Message-ID: > The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. > > When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). > > To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. > > This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Fix idiosyncratic white space in whitebox Co-authored-by: Stefan Karlsson - Sort includes - Heal old discovered lists in parallel - Fix comment - Factor duplicate code into shared method - Heal discovered oops in common place for degen and concurrent update refs - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Clear bootstrap mode for full GC that might have bypassed degenerated cycle - Do not bypass card barrier when healing discovered list - ... and 9 more: https://git.openjdk.org/jdk/compare/400d8cfb...f621b70c ------------- Changes: https://git.openjdk.org/jdk/pull/28810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=04 Stats: 667 lines in 20 files changed: 535 ins; 84 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From coleenp at openjdk.org Fri Dec 19 19:13:07 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 Dec 2025 19:13:07 GMT Subject: RFR: 8374145: Remove legacy locking remnants from markWord In-Reply-To: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> References: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> Message-ID: On Fri, 19 Dec 2025 14:05:13 GMT, Stefan Karlsson wrote: > There's a bunch of unused code in markWord that used to be used by the removed legacy locking. I propose that we remove it. > > There's a Shenandoah change that should be checked closer. My thinking is: > `is_being_inflated`: always returns false > `has_displaced_mark_helper`: Only returns true if both these are true: > * `!UseCompactObjectTable` - there's an early return for this earlier in the function. > * `lockbits == monitor_value` - checked in the if-statement above Thanks for finding these. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28927#pullrequestreview-3599722153 From wkemper at openjdk.org Fri Dec 19 19:28:37 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 19 Dec 2025 19:28:37 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null Message-ID: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> This PR attempts to simplify the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. ------------- Commit messages: - Add comments - Revert back to what should be on this branch - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash - Don't know how this file got deleted - Carry over gc cancellation to gc request - Do not let allocation failure requests be overwritten by other requests - Fix degen point handling - Try to simplify control thread protocol - Remove instrumentation, add assertion and try a fix - Instrumentation and fix Changes: https://git.openjdk.org/jdk/pull/28932/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28932&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373819 Stats: 86 lines in 2 files changed: 38 ins; 17 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/28932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28932/head:pull/28932 PR: https://git.openjdk.org/jdk/pull/28932 From kdnilsen at openjdk.org Fri Dec 19 21:48:10 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 19 Dec 2025 21:48:10 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v3] In-Reply-To: References: Message-ID: > Add a triggering penalty when we execute degenerated GC cycle. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge remote-tracking branch 'jdk/master' into add-degen-penalty - refactor for reviewer requests - remove redundant code - Increase heuristic penalties following degenerated GC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28834/files - new: https://git.openjdk.org/jdk/pull/28834/files/3f12ff15..f6d85fb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=01-02 Stats: 11144 lines in 448 files changed: 7254 ins; 1520 del; 2370 mod Patch: https://git.openjdk.org/jdk/pull/28834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28834/head:pull/28834 PR: https://git.openjdk.org/jdk/pull/28834 From stefank at openjdk.org Mon Dec 22 09:35:12 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 22 Dec 2025 09:35:12 GMT Subject: RFR: 8374145: Remove legacy locking remnants from markWord In-Reply-To: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> References: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> Message-ID: On Fri, 19 Dec 2025 14:05:13 GMT, Stefan Karlsson wrote: > There's a bunch of unused code in markWord that used to be used by the removed legacy locking. I propose that we remove it. > > There's a Shenandoah change that should be checked closer. My thinking is: > `is_being_inflated`: always returns false > `has_displaced_mark_helper`: Only returns true if both these are true: > * `!UseCompactObjectTable` - there's an early return for this earlier in the function. > * `lockbits == monitor_value` - checked in the if-statement above Thanks for the reviews! ------------- PR Review: https://git.openjdk.org/jdk/pull/28927#pullrequestreview-3603327790 From stefank at openjdk.org Mon Dec 22 09:35:13 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 22 Dec 2025 09:35:13 GMT Subject: RFR: 8374145: Remove legacy locking remnants from markWord In-Reply-To: References: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> <2Xbj1QOxPiOM5_e4mNuEbQpn2VFxwJ19xXpBnAy2_T8=.7d5d7174-3a37-464b-a25c-f89d7a5028f5@github.com> Message-ID: <2bKOJuGLuOye4emeXOSMwPqjWF0fBShko2UYWutYwxo=.f9f0240b-2d7c-4a3b-bd15-5e1ea16ed124@github.com> On Fri, 19 Dec 2025 14:16:25 GMT, Stefan Karlsson wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 349: >> >>> 347: } else if (w.is_being_inflated() || w.has_displaced_mark_helper()) { >>> 348: // Informs caller that we aren't able to determine the age >>> 349: return markWord::max_age + 1; // sentinel >> >> Not sure if we want an `!w.has_displaced_mark_helper()` assert. Even if we already check for (and handled) the exact conditions (in the current implementation) for `has_displaced_mark_helper()`. > > That would work for me. I'll wait for input from the Shenandoah devs. In the interest to get this integrated I've created an Enh for this: https://bugs.openjdk.org/browse/JDK-8374191 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28927#discussion_r2639248718 From stefank at openjdk.org Mon Dec 22 09:35:14 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 22 Dec 2025 09:35:14 GMT Subject: Integrated: 8374145: Remove legacy locking remnants from markWord In-Reply-To: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> References: <4yUGXErqsnP6fbNf5jqqZe5oZL1rtf8YFEAwnlDVFRs=.0609c860-7f4f-4da7-897e-3929499e2a87@github.com> Message-ID: On Fri, 19 Dec 2025 14:05:13 GMT, Stefan Karlsson wrote: > There's a bunch of unused code in markWord that used to be used by the removed legacy locking. I propose that we remove it. > > There's a Shenandoah change that should be checked closer. My thinking is: > `is_being_inflated`: always returns false > `has_displaced_mark_helper`: Only returns true if both these are true: > * `!UseCompactObjectTable` - there's an early return for this earlier in the function. > * `lockbits == monitor_value` - checked in the if-statement above This pull request has now been integrated. Changeset: e6c3ebe2 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/e6c3ebe27b0dd4cbf1885d79ea50acb208e364fa Stats: 33 lines in 2 files changed: 0 ins; 32 del; 1 mod 8374145: Remove legacy locking remnants from markWord Reviewed-by: aboldtch, kbarrett, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/28927 From serb at openjdk.org Wed Dec 24 01:49:49 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Wed, 24 Dec 2025 01:49:49 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed Message-ID: The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done ` ------------- Commit messages: - 8374316: Update copyright year to 2025 for hotspot in files where it was missed Changes: https://git.openjdk.org/jdk/pull/28970/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28970&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374316 Stats: 519 lines in 519 files changed: 0 ins; 0 del; 519 mod Patch: https://git.openjdk.org/jdk/pull/28970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28970/head:pull/28970 PR: https://git.openjdk.org/jdk/pull/28970 From kbarrett at openjdk.org Wed Dec 24 04:34:53 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 24 Dec 2025 04:34:53 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed In-Reply-To: References: Message-ID: On Wed, 24 Dec 2025 00:16:59 GMT, Sergey Bylokhov wrote: > The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) There are a lot of non-hotspot files in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28970#issuecomment-3688645543 From serb at openjdk.org Wed Dec 24 04:52:58 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Wed, 24 Dec 2025 04:52:58 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed In-Reply-To: References: Message-ID: On Wed, 24 Dec 2025 04:32:25 GMT, Kim Barrett wrote: > There are a lot of non-hotspot files in this PR. Most of them are related and were touched by the same commits pushed to `src/hotspot`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28970#issuecomment-3688672537 From serb at openjdk.org Wed Dec 24 05:27:56 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Wed, 24 Dec 2025 05:27:56 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed In-Reply-To: References: Message-ID: <9cmIm_ANkhZ9knsjEEdCJjy4qJH6TGMFcLoPMS1OVBQ=.53c83270-52bb-4c26-a415-e54a9ed37f56@github.com> On Wed, 24 Dec 2025 00:16:59 GMT, Sergey Bylokhov wrote: > The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done ` I can split out some of these changes into a separate PR to make this one smaller, but I already have a bunch of them =( ------------- PR Comment: https://git.openjdk.org/jdk/pull/28970#issuecomment-3688721427 From wkemper at openjdk.org Thu Dec 25 14:30:03 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 25 Dec 2025 14:30:03 GMT Subject: RFR: Merge openjdk/jdk21u:master Message-ID: Merges tag jdk-21.0.10+6 ------------- Commit messages: - 8317970: Bump target macosx-x64 version to 11.00.00 - 8372534: Update Libpng to 1.6.51 The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/231/files Stats: 660 lines in 23 files changed: 297 ins; 224 del; 139 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/231.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/231/head:pull/231 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/231 From kdnilsen at openjdk.org Thu Dec 25 20:25:02 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 Dec 2025 20:25:02 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v14] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 60 commits: - Finish merge - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates - Fix mistaken merge resolution - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates The resulting fastdebug build has 64 failures. I need to debug these. Probably introduced by improper resolution of merge conflicts - fix error in merge conflict resolution - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates - rework CompressedClassSpaceSizeinJmapHeap.java - fix errors in CompressedClassSpaceSizeInJmapHeap.java - Add debug instrumentation to CompressedClassSpaceSizeInJmapHeap.java - fix two indexing bugs - ... and 50 more: https://git.openjdk.org/jdk/compare/98b7792a...7b9c4d64 ------------- Changes: https://git.openjdk.org/jdk/pull/24319/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=13 Stats: 281 lines in 31 files changed: 109 ins; 30 del; 142 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From kdnilsen at openjdk.org Thu Dec 25 22:34:40 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 Dec 2025 22:34:40 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v4] In-Reply-To: References: Message-ID: > Add a triggering penalty when we execute degenerated GC cycle. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge remote-tracking branch 'jdk/master' into add-degen-penalty - Merge remote-tracking branch 'jdk/master' into add-degen-penalty - refactor for reviewer requests - remove redundant code - Increase heuristic penalties following degenerated GC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28834/files - new: https://git.openjdk.org/jdk/pull/28834/files/f6d85fb7..87b41568 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=02-03 Stats: 2191 lines in 691 files changed: 739 ins; 326 del; 1126 mod Patch: https://git.openjdk.org/jdk/pull/28834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28834/head:pull/28834 PR: https://git.openjdk.org/jdk/pull/28834 From serb at openjdk.org Sat Dec 27 07:40:58 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Sat, 27 Dec 2025 07:40:58 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed In-Reply-To: References: Message-ID: On Wed, 24 Dec 2025 00:16:59 GMT, Sergey Bylokhov wrote: > The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done ` I will update this PR to include all changes in src/hotspot and test/hotspot only. The rest will be done separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28970#issuecomment-3693798808 From serb at openjdk.org Sat Dec 27 08:27:48 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Sat, 27 Dec 2025 08:27:48 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed [v2] In-Reply-To: References: Message-ID: <1PnVxFvQfC9jvy7jFAVnY-0nOLDWnqVC71iAAizAzWI=.389a28a8-1a6b-4a7e-a67f-cdcb62b4af23@github.com> > The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done ` Sergey Bylokhov has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8374316: Update copyright year to 2025 for hotspot in files where it was missed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28970/files - new: https://git.openjdk.org/jdk/pull/28970/files/e4855247..12ba39f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28970&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28970&range=00-01 Stats: 222 lines in 222 files changed: 0 ins; 0 del; 222 mod Patch: https://git.openjdk.org/jdk/pull/28970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28970/head:pull/28970 PR: https://git.openjdk.org/jdk/pull/28970 From serb at openjdk.org Sun Dec 28 03:51:36 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Sun, 28 Dec 2025 03:51:36 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed [v3] In-Reply-To: References: Message-ID: > The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done ` Sergey Bylokhov has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8374316: Update copyright year to 2025 for hotspot in files where it was missed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28970/files - new: https://git.openjdk.org/jdk/pull/28970/files/12ba39f6..b8fe879e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28970&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28970&range=01-02 Stats: 5 lines in 5 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28970/head:pull/28970 PR: https://git.openjdk.org/jdk/pull/28970 From serb at openjdk.org Sun Dec 28 03:51:36 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Sun, 28 Dec 2025 03:51:36 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed [v2] In-Reply-To: <1PnVxFvQfC9jvy7jFAVnY-0nOLDWnqVC71iAAizAzWI=.389a28a8-1a6b-4a7e-a67f-cdcb62b4af23@github.com> References: <1PnVxFvQfC9jvy7jFAVnY-0nOLDWnqVC71iAAizAzWI=.389a28a8-1a6b-4a7e-a67f-cdcb62b4af23@github.com> Message-ID: On Sat, 27 Dec 2025 08:27:48 GMT, Sergey Bylokhov wrote: >> The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) >> >> The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: >> >> `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done ` > > Sergey Bylokhov has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8374316: Update copyright year to 2025 for hotspot in files where it was missed I have to exclude files updated by this commit, since it was a copyright-only change: JDK-8364597: Replace THL A29 Limited with Tencent https://github.com/openjdk/jdk/commit/4c9eaddaef8 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28970#issuecomment-3693857773 From serb at openjdk.org Sun Dec 28 03:56:39 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Sun, 28 Dec 2025 03:56:39 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed [v4] In-Reply-To: References: Message-ID: > The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done ` Sergey Bylokhov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into copy_hotspot - 8374316: Update copyright year to 2025 for hotspot in files where it was missed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28970/files - new: https://git.openjdk.org/jdk/pull/28970/files/b8fe879e..5b29083f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28970&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28970&range=02-03 Stats: 902 lines in 771 files changed: 32 ins; 91 del; 779 mod Patch: https://git.openjdk.org/jdk/pull/28970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28970/head:pull/28970 PR: https://git.openjdk.org/jdk/pull/28970 From kbarrett at openjdk.org Mon Dec 29 09:40:59 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 29 Dec 2025 09:40:59 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed [v4] In-Reply-To: References: Message-ID: On Sun, 28 Dec 2025 03:56:39 GMT, Sergey Bylokhov wrote: >> The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) >> >> The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: >> >> ~~`git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done `~~ >> >> `git diff origin/master --name-only | while read f; do git log origin/master --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` > > Sergey Bylokhov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into copy_hotspot > - 8374316: Update copyright year to 2025 for hotspot in files where it was missed Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28970#pullrequestreview-3615459136 From eastigeevich at openjdk.org Mon Dec 29 16:32:13 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 29 Dec 2025 16:32:13 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v19] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Testing results: linux fastdebug build > - Neoverse-N1 (Graviton 2) > - [x] tier1: passed > - [x] tier2: passed > - [x] tier3: passed > - [x] tier4: 3 failures > - `containers/docker/TestJcmdWithSideCar.java`: JD... Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: - Merge branch 'master' into JDK-8370947 - Fix SpecJVM2008 regressions - Merge branch 'master' into JDK-8370947 - Fix macos and windows aarch64 builds - Add cache DIC IDC status to VM_Version - Fix tier3 failures - Fix tier1 failures - Implement nested ICacheInvalidationContext - Fix linux-cross-compile build aarch64 - Merge branch 'master' into JDK-8370947 - ... and 17 more: https://git.openjdk.org/jdk/compare/5e685f6f...8a99c0e7 ------------- Changes: https://git.openjdk.org/jdk/pull/28328/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=18 Stats: 872 lines in 33 files changed: 806 ins; 22 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From eastigeevich at openjdk.org Mon Dec 29 16:40:15 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 29 Dec 2025 16:40:15 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v20] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Testing results: linux fastdebug build > - Neoverse-N1 (Graviton 2) > - [x] tier1: passed > - [x] tier2: passed > - [x] tier3: passed > - [x] tier4: 3 failures > - `containers/docker/TestJcmdWithSideCar.java`: JD... Evgeny Astigeevich has updated the pull request incrementally with three additional commits since the last revision: - Restore deleted comment - Remove redundant blank line - Remove redundant include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/8a99c0e7..2473fa5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=18-19 Stats: 3 lines in 3 files changed: 1 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From eastigeevich at openjdk.org Mon Dec 29 21:51:20 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 29 Dec 2025 21:51:20 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. > * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. > > Testing results: linux fastdebug build > - Neoverse-N1 (Graviton 2) > - [x] tier1: passed > - [x] tier2: passed > - [x] tier3: passed > - [x] tier4: 3 failures > - `containers/docker/TestJcmdWithSideCar.java`: JD... Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Fix linux-cross-compile riscv64 build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/2473fa5c..967786b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=19-20 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From xpeng at openjdk.org Mon Dec 29 23:54:21 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 29 Dec 2025 23:54:21 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v19] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 264 commits: - Fix build error after merging from tip - Merge branch 'master' into cas-alloc-1 - Merge branch 'master' into cas-alloc-1 - Some comments updates as suggested in PR review - Fix build failure after merge - Expend promoted from ShenandoahOldCollectorAllocator - Merge branch 'master' into cas-alloc-1 - Address PR comments - Merge branch 'openjdk:master' into cas-alloc-1 - Add missing header for ShenandoahFreeSetPartitionId - ... and 254 more: https://git.openjdk.org/jdk/compare/5e685f6f...792f011e ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=18 Stats: 1644 lines in 25 files changed: 1296 ins; 235 del; 113 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From serb at openjdk.org Tue Dec 30 12:12:07 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Tue, 30 Dec 2025 12:12:07 GMT Subject: Integrated: 8374316: Update copyright year to 2025 for hotspot in files where it was missed In-Reply-To: References: Message-ID: On Wed, 24 Dec 2025 00:16:59 GMT, Sergey Bylokhov wrote: > The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > ~~`git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done `~~ > > `git diff origin/master --name-only | while read f; do git log origin/master --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` This pull request has now been integrated. Changeset: a6462d64 Author: Sergey Bylokhov URL: https://git.openjdk.org/jdk/commit/a6462d641cba004829f9136df22f3d953c0e0c5d Stats: 451 lines in 451 files changed: 0 ins; 0 del; 451 mod 8374316: Update copyright year to 2025 for hotspot in files where it was missed Reviewed-by: kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/28970 From aph at openjdk.org Wed Dec 31 15:22:06 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 31 Dec 2025 15:22:06 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References: Message-ID: <15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> On Mon, 29 Dec 2025 21:51:20 GMT, Evgeny Astigeevich wrote: >> Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. >> >> Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: >> - Disable coherent icache. >> - Trap IC IVAU instructions. >> - Execute: >> - `tlbi vae3is, xzr` >> - `dsb sy` >> >> `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. >> >> As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: >> >> "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." >> >> This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. >> >> Changes include: >> >> * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. >> * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. >> * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. >> * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. >> >> Testing results: linux fastdebug build >> - Neoverse-N1 (Graviton 2) >> - [x] tier1: passed >> - [x] tier2: passed >> - [x] tier3: passed >> - [x] tier4: 3 failu... > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Fix linux-cross-compile riscv64 build Is there any reason not to do this by default on all AArch64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3702378081 From eastigeevich at openjdk.org Wed Dec 31 16:10:10 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 31 Dec 2025 16:10:10 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: <15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> References: <15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> Message-ID: On Wed, 31 Dec 2025 15:19:11 GMT, Andrew Haley wrote: > Is there any reason not to do this by default on all AArch64? It will be turned on if AArch64 has `ctr_el0.IDC` and `ctr_el0.DIC` set. See https://github.com/openjdk/jdk/pull/28328/changes#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R663 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3702445761 From wkemper at openjdk.org Wed Dec 31 18:21:17 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 31 Dec 2025 18:21:17 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v2] In-Reply-To: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: > This PR attempts to simplify the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Take regulator thread out of STS before requesting GC The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28932/files - new: https://git.openjdk.org/jdk/pull/28932/files/8b3acc85..e416d123 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28932&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28932&range=00-01 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28932/head:pull/28932 PR: https://git.openjdk.org/jdk/pull/28932