From aph at openjdk.org Thu Jan 1 13:15:59 2026 From: aph at openjdk.org (Andrew Haley) Date: Thu, 1 Jan 2026 13:15:59 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References: <15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> Message-ID: On Wed, 31 Dec 2025 16:06:53 GMT, Evgeny Astigeevich wrote: > > Is there any reason not to do this by default on all AArch64? > > It will be turned on if AArch64 has `ctr_el0.IDC` and `ctr_el0.DIC` set. See https://github.com/openjdk/jdk/pull/28328/changes#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R663 Sure, I can see that, but is there any reason not to do this by default on all AArch64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3703679652 From eastigeevich at openjdk.org Thu Jan 1 20:38:08 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 1 Jan 2026 20:38:08 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References: <15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> Message-ID: On Thu, 1 Jan 2026 13:13:07 GMT, Andrew Haley wrote: > Sure, I can see that, but is there any reason not to do this by default on all AArch64? Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3704085326 From aph at openjdk.org Fri Jan 2 12:11:55 2026 From: aph at openjdk.org (Andrew Haley) Date: Fri, 2 Jan 2026 12:11:55 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References: <15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> Message-ID: <2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com> On Thu, 1 Jan 2026 20:35:25 GMT, Evgeny Astigeevich wrote: > > Sure, I can see that, but is there any reason not to do this by default on all AArch64? > > Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? In a perfect world we'd do this for all AArch64. But Linux-only would be good too. But is there any reason not to do this on all Linux systems? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3705195536 From kbarrett at openjdk.org Fri Jan 2 13:54:02 2026 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 2 Jan 2026 13:54:02 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References: Message-ID: <1HoFAoxwMTDW1GJteYe1Bl3X9erLJtcdjdY7kEqOMgE=.f34e8855-352e-4654-9297-6af29b5f17de@github.com> On Mon, 29 Dec 2025 21:51:20 GMT, Evgeny Astigeevich wrote: >> Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. >> >> Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: >> - Disable coherent icache. >> - Trap IC IVAU instructions. >> - Execute: >> - `tlbi vae3is, xzr` >> - `dsb sy` >> >> `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. >> >> As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: >> >> "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." >> >> This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. >> >> Changes include: >> >> * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. >> * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. >> * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. >> * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. >> >> Testing results: linux fastdebug build >> - Neoverse-N1 (Graviton 2) >> - [x] tier1: passed >> - [x] tier2: passed >> - [x] tier3: passed >> - [x] tier4: 3 failu... > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Fix linux-cross-compile riscv64 build src/hotspot/share/runtime/icache.hpp line 139: > 137: class DefaultICacheInvalidationContext : StackObj { > 138: private: > 139: NONCOPYABLE(DefaultICacheInvalidationContext); Not a review, just a drive-by comment. @xmas92 suggested moving the `NONCOPYABLE` to the private part of the class, as a style issue. It used to be that `NONCOPYABLE` was best used in the private part of a class, because of how it was implemented. But with the change to using deleted definitions, it's actually better to have it in the public part. That way you get an "attempt to use a deleted function" error rather than possibly getting an "attempt to use an inaccessible function" error. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2657751265 From eastigeevich at openjdk.org Fri Jan 2 15:43:15 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 2 Jan 2026 15:43:15 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: <2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com> References: <15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> <2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com> Message-ID: On Fri, 2 Jan 2026 12:07:57 GMT, Andrew Haley wrote: > > > Sure, I can see that, but is there any reason not to do this by default on all AArch64? > > > > > > Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? > > In a perfect world we'd do this for all AArch64. But Linux-only would be good too. But is there any reason not to do this on all Linux systems? IMO, a reason is that not all AArch64 might have both `ctr_el0.IDC` and `ctr_el0.DIC` set. If either of `ctr_el0.IDC`/`ctr_el0.DIC` or both are not set, we will need to use `DC`/`IC` with real addresses to clean and to invalidate caches. We will need to choose between invalidating modified instructions or the whole nmethod's code. Invalidating modified instructions will need tracking of modified instructions. Invalidating the whole nmethod does not need tracking but it can be expensive vs invalidating particular instructions. IMO cases of Java running on AArch64 CPU where both `ctr_el0.IDC` and `ctr_el0.DIC` are not set, might be rare. Jacob Bramley from Arm has nice blog posts about Caches and Self-Modifying Code: - [Caches and Self-Modifying Code: Implementing `__clear_cache`](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-implementing-clear-cache) - [Caches and Self-Modifying Code: Working with Threads](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads) In this PR we optimize two parts invalidating caches: 1. GCs patching code. This is invalidation of modified instructions. 2. Generation and installation of code. This is invalidation of the whole code. The second case can be optimized for all AArch64. Is there anything I am missing? And we can do optimized cache invalidation on all AArch64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3705607781 From aph at openjdk.org Fri Jan 2 18:06:58 2026 From: aph at openjdk.org (Andrew Haley) Date: Fri, 2 Jan 2026 18:06:58 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References: <15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> <2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com> Message-ID: On Fri, 2 Jan 2026 15:39:50 GMT, Evgeny Astigeevich wrote: > > > > Sure, I can see that, but is there any reason not to do this by default on all AArch64? > > > > > > > > > Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? > > > > > > In a perfect world we'd do this for all AArch64. But Linux-only would be good too. But is there any reason not to do this on all Linux systems? > > IMO, a reason is that not all AArch64 might have both `ctr_el0.IDC` and `ctr_el0.DIC` set. If either of `ctr_el0.IDC`/`ctr_el0.DIC` or both are not set, we will need to use `DC`/`IC` with real addresses to clean and to invalidate caches. We will need to choose between invalidating modified instructions or the whole nmethod's code. Invalidating modified instructions will need tracking of modified instructions. Invalidating the whole nmethod does not need tracking but it can be expensive vs invalidating particular instructions. Ah, I see. So it looks like we'll have to maintain two entirely different bodies of code to do the cache management. That will be a recurring pain, and is disappointing. > > IMO cases of Java running on AArch64 CPU where both `ctr_el0.IDC` and `ctr_el0.DIC` are not set, might be rare. > > Jacob Bramley from Arm has nice blog posts about Caches and Self-Modifying Code: > > * [Caches and Self-Modifying Code: Implementing `__clear_cache`](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-implementing-clear-cache) That's useful. It's worth taking advantage of cache-coherent implementations (when they're not broken!) by not emitting unnecessary instructions. > * [Caches and Self-Modifying Code: Working with Threads](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads) Thanks. That's a good read, but no surprises. I'm fairly sure we've been doing most of that for as long as the port has existed. > In this PR we optimize two parts invalidating caches: > > 1. GCs patching code. This is invalidation of modified instructions. > > 2. Generation and installation of code. This is invalidation of the whole code. > > > The second case can be optimized for all AArch64. > > Is there anything I am missing? And we can do optimized cache invalidation on all AArch64? Probably not, but I've been working on a patch to minimize the invalidation we do today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3705936075 From eastigeevich at openjdk.org Fri Jan 2 22:07:00 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 2 Jan 2026 22:07:00 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References: <15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> <2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com> Message-ID: On Fri, 2 Jan 2026 18:02:49 GMT, Andrew Haley wrote: >>> > > Sure, I can see that, but is there any reason not to do this by default on all AArch64? >>> > >>> > >>> > Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? >>> >>> In a perfect world we'd do this for all AArch64. But Linux-only would be good too. But is there any reason not to do this on all Linux systems? >> >> IMO, a reason is that not all AArch64 might have both `ctr_el0.IDC` and `ctr_el0.DIC` set. If either of `ctr_el0.IDC`/`ctr_el0.DIC` or both are not set, we will need to use `DC`/`IC` with real addresses to clean and to invalidate caches. We will need to choose between invalidating modified instructions or the whole nmethod's code. Invalidating modified instructions will need tracking of modified instructions. Invalidating the whole nmethod does not need tracking but it can be expensive vs invalidating particular instructions. >> >> IMO cases of Java running on AArch64 CPU where both `ctr_el0.IDC` and `ctr_el0.DIC` are not set, might be rare. >> >> Jacob Bramley from Arm has nice blog posts about Caches and Self-Modifying Code: >> - [Caches and Self-Modifying Code: Implementing `__clear_cache`](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-implementing-clear-cache) >> - [Caches and Self-Modifying Code: Working with Threads](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads) >> >> In this PR we optimize two parts invalidating caches: >> 1. GCs patching code. This is invalidation of modified instructions. >> 2. Generation and installation of code. This is invalidation of the whole code. >> >> The second case can be optimized for all AArch64. >> >> Is there anything I am missing? And we can do optimized cache invalidation on all AArch64? > >> > > > Sure, I can see that, but is there any reason not to do this by default on all AArch64? >> > > >> > > >> > > Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? >> > >> > >> > In a perfect world we'd do this for all AArch64. But Linux-only would be good too. But is there any reason not to do this on all Linux systems? >> >> IMO, a reason is that not all AArch64 might have both `ctr_el0.IDC` and `ctr_el0.DIC` set. If either of `ctr_el0.IDC`/`ctr_el0.DIC` or both are not set, we will need to use `DC`/`IC` with real addresses to clean and to invalidate caches. We will need to choose between invalidating modified instructions or the whole nmethod's code. Invalidating modified instructions will need tracking of modified instructions. Invalidating the whole nmethod does not need tracking but it can be expensive vs invalidating particular instructions. > > Ah, I see. So it looks like we'll have to maintain two entirely different bodies of code to do the cache management. That will be a recurring pain, and is disappointing. > >> >> IMO cases of Java running on AArch64 CPU where both `ctr_el0.IDC` and `ctr_el0.DIC` are not set, might be rare. >> >> Jacob Bramley from Arm has nice blog posts about Caches and Self-Modifying Code: >> >> * [Caches and Self-Modifying Code: Implementing `__clear_cache`](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-implementing-clear-cache) > > That's useful. It's worth taking advantage of cache-coherent implementations (when they're not broken!) by not emitting unnecessary instructions. > >> * [Caches and Self-Modifying Code: Working with Threads](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads) > > Thanks. That's a good read, but no surprises. I'm fairly sure we've been doing most of that for as long as the port has existed. > >> In this PR we optimize two parts invalidating caches: >> >> 1. GCs patching code. This is invalidation of modified instructions. >> >> 2. Generation and installation of code. This is invalidation of the whole code. >> >> >> The second case can be optimized for all AArch64. >> >> Is there anything I am missing? And we can do optimized cache invalidation on all AArch64? > > Probably not, but I've been working on a patch to minimize the invalidation we do today. @theRealAph > ... > Probably not, but I've been working on a patch to minimize the invalidation we do today. Does this mean we don't need this PR or need to rework it? Could you please provide more details? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3706299227 From aph at openjdk.org Sat Jan 3 09:59:05 2026 From: aph at openjdk.org (Andrew Haley) Date: Sat, 3 Jan 2026 09:59:05 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References: <15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> <2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com> Message-ID: <8hQOV9QhPL_j5g7WpcwzJc_8QPeAzLSIFk3ASRlCXa8=.3e1ac93f-2f6a-4dce-bead-31241240cb6e@github.com> On Fri, 2 Jan 2026 22:04:00 GMT, Evgeny Astigeevich wrote: > > Probably not, but I've been working on a patch to minimize the invalidation we do today. > > Does this mean we don't need this PR or need to rework it? Could you please provide more details? It makes no difference to this patch. I'm still experimenting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3706937169 From roland at openjdk.org Mon Jan 5 09:34:24 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 09:34:24 GMT Subject: [jdk26] RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: > Hi all, > > This pull request contains a backport of commit [00068a80](https://github.com/openjdk/jdk/commit/00068a80304a809297d0df8698850861e9a1c5e9) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 10 Dec 2025 and was reviewed by Christian Hagedorn, Quan Anh Mai, Galder Zamarre?o and Emanuel Peter. > > Thanks! Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'jdk26' into backport-rwestrel-00068a80-jdk26 - Backport 00068a80304a809297d0df8698850861e9a1c5e9 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28892/files - new: https://git.openjdk.org/jdk/pull/28892/files/ceb2ac15..4121d277 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28892&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28892&range=00-01 Stats: 1087 lines in 32 files changed: 746 ins; 238 del; 103 mod Patch: https://git.openjdk.org/jdk/pull/28892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28892/head:pull/28892 PR: https://git.openjdk.org/jdk/pull/28892 From chagedorn at openjdk.org Mon Jan 5 14:50:30 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 14:50:30 GMT Subject: [jdk26] RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 09:34:24 GMT, Roland Westrelin wrote: >> Hi all, >> >> This pull request contains a backport of commit [00068a80](https://github.com/openjdk/jdk/commit/00068a80304a809297d0df8698850861e9a1c5e9) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Roland Westrelin on 10 Dec 2025 and was reviewed by Christian Hagedorn, Quan Anh Mai, Galder Zamarre?o and Emanuel Peter. >> >> Thanks! > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'jdk26' into backport-rwestrel-00068a80-jdk26 > - Backport 00068a80304a809297d0df8698850861e9a1c5e9 Looks good! I submitted some testing which passed. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28892#pullrequestreview-3627129917 From roland at openjdk.org Mon Jan 5 14:50:31 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 14:50:31 GMT Subject: [jdk26] RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 14:42:05 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'jdk26' into backport-rwestrel-00068a80-jdk26 >> - Backport 00068a80304a809297d0df8698850861e9a1c5e9 > > Looks good! I submitted some testing which passed. @chhagedorn thanks for review and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/28892#issuecomment-3710731279 From roland at openjdk.org Mon Jan 5 14:50:32 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 14:50:32 GMT Subject: [jdk26] Integrated: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 08:30:52 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [00068a80](https://github.com/openjdk/jdk/commit/00068a80304a809297d0df8698850861e9a1c5e9) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 10 Dec 2025 and was reviewed by Christian Hagedorn, Quan Anh Mai, Galder Zamarre?o and Emanuel Peter. > > Thanks! This pull request has now been integrated. Changeset: d8a1c1d0 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/d8a1c1d04cab940b4a6cbe82fa2e445102aa9896 Stats: 367 lines in 13 files changed: 266 ins; 27 del; 74 mod 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs Reviewed-by: chagedorn Backport-of: 00068a80304a809297d0df8698850861e9a1c5e9 ------------- PR: https://git.openjdk.org/jdk/pull/28892 From wkemper at openjdk.org Mon Jan 5 17:03:08 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Jan 2026 17:03:08 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v6] In-Reply-To: References: Message-ID: > The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. > > When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). > > To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. > > This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Heal discovered lists for any young collection coincides with old marking - Configure thread local mark closure on delegated old reference processor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28810/files - new: https://git.openjdk.org/jdk/pull/28810/files/f621b70c..d5b17d79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=04-05 Stats: 8 lines in 2 files changed: 4 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From wkemper at openjdk.org Mon Jan 5 17:03:12 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Jan 2026 17:03:12 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v5] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 19:02:13 GMT, William Kemper wrote: >> The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. >> >> When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). >> >> To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. >> >> This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Fix idiosyncratic white space in whitebox > > Co-authored-by: Stefan Karlsson > - Sort includes > - Heal old discovered lists in parallel > - Fix comment > - Factor duplicate code into shared method > - Heal discovered oops in common place for degen and concurrent update refs > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Clear bootstrap mode for full GC that might have bypassed degenerated cycle > - Do not bypass card barrier when healing discovered list > - ... and 9 more: https://git.openjdk.org/jdk/compare/400d8cfb...f621b70c This change has now passed internal testing pipelines several times. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28810#issuecomment-3711294483 From wkemper at openjdk.org Mon Jan 5 17:11:15 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Jan 2026 17:11:15 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v7] In-Reply-To: References: Message-ID: > The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. > > When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). > > To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. > > This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Heal discovered lists for any young collection coincides with old marking - Configure thread local mark closure on delegated old reference processor - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Fix idiosyncratic white space in whitebox Co-authored-by: Stefan Karlsson - Sort includes - Heal old discovered lists in parallel - Fix comment - Factor duplicate code into shared method - Heal discovered oops in common place for degen and concurrent update refs - ... and 12 more: https://git.openjdk.org/jdk/compare/4458cab4...ed0d0272 ------------- Changes: https://git.openjdk.org/jdk/pull/28810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=06 Stats: 669 lines in 20 files changed: 537 ins; 84 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From wkemper at openjdk.org Mon Jan 5 17:13:08 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Jan 2026 17:13:08 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v3] In-Reply-To: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: > This PR attempts to simplify the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash - Fix typo in assertion message - Take regulator thread out of STS before requesting GC The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. - Add comments - Revert back to what should be on this branch - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash - Don't know how this file got deleted - Carry over gc cancellation to gc request - Do not let allocation failure requests be overwritten by other requests - Fix degen point handling - ... and 3 more: https://git.openjdk.org/jdk/compare/4458cab4...8f4f55db ------------- Changes: https://git.openjdk.org/jdk/pull/28932/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28932&range=02 Stats: 95 lines in 4 files changed: 45 ins; 17 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/28932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28932/head:pull/28932 PR: https://git.openjdk.org/jdk/pull/28932 From kdnilsen at openjdk.org Mon Jan 5 19:31:14 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 19:31:14 GMT Subject: RFR: 8312116: JDK GenShen: make instantaneous allocation rate triggers more timely Message-ID: After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. 2. Sample allocation rates more frequently than once every 100 ms. 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. 4. When we detect acceleration of workload, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. ------------- Commit messages: - Change type of command-line args - fix white space - Add override to virtual methods - Fix race between allocation reporting and querying - add debug instrumentation - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - add instrumentation and fix bugs - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - some debug instrumentation - Merge remote-tracking branch 'origin/accelerated-triggers' into accelerated-triggers-gh - ... and 49 more: https://git.openjdk.org/jdk/compare/400d8cfb...c7046b5c Changes: https://git.openjdk.org/jdk/pull/29039/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312116 Stats: 1529 lines in 26 files changed: 1423 ins; 34 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From kdnilsen at openjdk.org Mon Jan 5 19:31:14 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 19:31:14 GMT Subject: RFR: 8312116: JDK GenShen: make instantaneous allocation rate triggers more timely In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 15:10:52 GMT, Kelvin Nilsen wrote: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of workload, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. This PR shows very slight improvements on specjbb tests: ~/github/jdk.accelerated-triggers/build/linux-x86_64-server-release/jdk/bin/java \ -XX:+UnlockExperimentalVMOptions \ -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms10g -Xmx10g -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \ -javaagent:/home/kdnilsen/lib/jHiccup-2.0.10/jHiccup.jar=-l,results/specjbb2015-master-jhiccup.log,-i,1000,-a \ -Xlog:async \ -Xlog:gc*=info \ -Xlog:safepoint*=info \ -Xlog:handshake*=info \ -jar /home/kdnilsen/lib/specjbb2015/specjbb2015.jar \ -m composite -ikv \ -p /home/kdnilsen/lib/specjbb2015/config/specjbb2015.props \ -raw /home/kdnilsen/lib/specjbb2015/config/template-C.raw >$t.accelerated-trigger.specjbb2015.out 2>$t.accelerated-triggers.specjbb2015.err image We have tested this new PR out with several different heap sizes on a particular Extremem workload and provide the results here. With 16GB heap size, both master and accelerated-triggers perform poorly. We consider the JVM to be under provisioned for this workload, and the behavior of accelerated-triggers is considered acceptable compared to master in this configuration. Accelerated-triggers has 0.24% to 30.5% worse latency across reported response-time percentiles. On average, it performs 57% more GC cycles, resulting in 50% fewer degenerated cycles (due to earlier triggers). CPU utilization is 0.60% higher. image With 20GB heap size, the benefits of accelerated-triggers are demonstrated in improved p50, p95, and p99 latencies. Note that accelerated-triggers is able to complete an average of 120% more old GCs than master. In this configuration, master is more vulnerable to starvation of old generation processing. Accelerated-triggers performed 30% fewer degenerated cycles and 30% fewer full GC cycles than master. image With 24GB heap size, both master and accelerated-triggers experienced degraded performance on one of five trials. This appears to have resulted from starvation of old-gen processing in both cases. Even so, the accelerated-triggers run was able to complete 5 old collections vs. only 4 completed old collections with master. For this configuration, we report both average results and trimmed average results. Average results favor accelerated-triggers at most percentiles. Trimmed average results favor master at most percentiles. image At 28GB heap size, accelerated-triggers shows signifcant strength compared to master. Three of five trials with master experienced degenerated cycles, and two of five trials with master experienced full GC. None of the five trials with accelerated-triggers experienced degenerated or full GC cycles. This manifests in generally better latency across all percentiles. image With the 31GB heap size, latencies are very similar between master and accelerated-triggers. Accelerated-triggers consumes 15% more CPU as it is performing 103% more GCs. Note that accelerated-triggers completes one more old GC than master, demonstrating that it is less vulnerable than master to starvation of old-gen processing. image Note that typical service deployments tend to be provisioned with excess resources. This allows the services to operate more reliably under transient spikes in client workload, and avoids "rare" triggering missteps that cause unwanted degenerated and full GC cycles. This particular workload would most typically be deployed today with a 31G heap if it were a production service. A goal of the GenShen engineering team is to enable more frugal use of CPU and memory resources. In the longer term, we would hope to enable reliable production deployment of this workload in 28GB or 24GB of memory. We have observed for some workload that accelerated-triggers increases contention between young-generation and old-generation GC activities, because it often forces more frequent young-generation activities. In practice, this is often balanced by more timely collection of young, which reduces "urgent" young collection efforts that occur when the JVM is under duress. Other development efforts are under way to allow more graceful cooperation between young-generation and old-generation concurrent activities when both feel the need to contend for CPU time. The workload used in the above tests is represented by this script: ~/github/jdk.accelerated-triggers/build/linux-x86_64-server-release/images/jdk/bin/java \ -XX:ActiveProcessorCount=16 \ -XX:+UnlockExperimentalVMOptions \ -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms$m -Xmx$m \ -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \ -XX:ShenandoahFullGCThreshold=1024 \ -XX:ShenandoahGuaranteedOldGCInterval=0 \ -XX:ShenandoahGuaranteedYoungGCInterval=0 \ -Xlog:"gc*=info,ergo" \ -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \ -XX:+UnlockDiagnosticVMOptions \ -jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \ -dDictionarySize=3000000 \ -dNumCustomers=9000000 \ -dNumProducts=240000 \ -dCustomerThreads=800 \ -dAllowAnyMatch=false \ -dCustomerPeriod=2s \ -dCustomerThinkTime=300ms \ -dKeywordSearchCount=4 \ -dSelectionCriteriaCount=2 \ -dProductReviewLength=12 \ -dServerThreads=5 \ -dServerPeriod=10s \ -dProductNameLength=10 \ -dBrowsingHistoryQueueCount=5 \ -dSalesTransactionQueueCount=5 \ -dProductDescriptionLength=320 \ -dProductReplacementPeriod=60s \ -dProductReplacementCount=25 \ -dCustomerReplacementPeriod=60s \ -dCustomerReplacementCount=1500 \ -dBrowsingExpiration=1m \ -dPhasedUpdates=true \ -dPhasedUpdateInterval=60s \ -dSimulationDuration=25m \ -dResponseTimeMeasurements=100000 \ >$t.$m.genshen.medium.accelerated.out \ 2>$t.$m.genshen.medium.accelerated.err & job_pid=$! sleep 1500 cpu_percent=$(ps -o cputime -o etime -p $job_pid) rss_kb=$(ps -o rss= -p $job_pid) rss_mb=$((rss_kb / 1024)) wait $job_pid echo "RSS: $rss_mb MB" >>$t.$m.genshen.medium.accelerated.out echo "$cpu_percent" >>$t.$m.genshen.medium.accelerated.out gzip $t.$m.genshen.medium.accelerated.out $t.$m.genshen.medium.accelerated.err ------------- PR Comment: https://git.openjdk.org/jdk/pull/29039#issuecomment-3710878539 PR Comment: https://git.openjdk.org/jdk/pull/29039#issuecomment-3711727740 From wkemper at openjdk.org Mon Jan 5 19:57:27 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Jan 2026 19:57:27 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v2] In-Reply-To: References: Message-ID: <2eKXF6_uIhKFz0g0791S9sJfRhorjlm2ssVUvCpClMc=.4ad1e73b-b0e4-416b-9c33-2cb9d4bceda2@github.com> On Mon, 5 Jan 2026 19:54:04 GMT, Kelvin Nilsen wrote: >> After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: >> >> 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. >> 2. Sample allocation rates more frequently than once every 100 ms. >> 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. >> 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > add another override Can we remove the `KELVIN_*` macros? Perhaps fine tune some of the logging to `log_trace(gc, ergo)` or `log_debug(gc, ergo)` where appropriate? ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29039#pullrequestreview-3628222792 From kdnilsen at openjdk.org Mon Jan 5 19:57:25 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 19:57:25 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v2] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: add another override ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29039/files - new: https://git.openjdk.org/jdk/pull/29039/files/c7046b5c..43664d66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From kdnilsen at openjdk.org Mon Jan 5 20:25:21 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 20:25:21 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v2] In-Reply-To: <2eKXF6_uIhKFz0g0791S9sJfRhorjlm2ssVUvCpClMc=.4ad1e73b-b0e4-416b-9c33-2cb9d4bceda2@github.com> References: <2eKXF6_uIhKFz0g0791S9sJfRhorjlm2ssVUvCpClMc=.4ad1e73b-b0e4-416b-9c33-2cb9d4bceda2@github.com> Message-ID: On Mon, 5 Jan 2026 19:54:04 GMT, William Kemper wrote: > Can we remove the `KELVIN_*` macros? Perhaps fine tune some of the logging to `log_trace(gc, ergo)` or `log_debug(gc, ergo)` where appropriate? So sorry. Forgot I still had all of that in there. Coming out now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29039#issuecomment-3711960984 From kdnilsen at openjdk.org Mon Jan 5 20:39:03 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 20:39:03 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove develop/debug instrumentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29039/files - new: https://git.openjdk.org/jdk/pull/29039/files/43664d66..959b274c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=01-02 Stats: 498 lines in 10 files changed: 0 ins; 497 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From xpeng at openjdk.org Mon Jan 5 21:04:45 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 5 Jan 2026 21:04:45 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: Message-ID: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - Fix build error after merging from tip - Merge branch 'master' into cas-alloc-1 - Merge branch 'master' into cas-alloc-1 - Some comments updates as suggested in PR review - Fix build failure after merge - Expend promoted from ShenandoahOldCollectorAllocator - Merge branch 'master' into cas-alloc-1 - Address PR comments - Merge branch 'openjdk:master' into cas-alloc-1 - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=19 Stats: 1644 lines in 25 files changed: 1296 ins; 235 del; 113 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From kdnilsen at openjdk.org Mon Jan 5 21:36:11 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 21:36:11 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v5] In-Reply-To: References: Message-ID: > Add a triggering penalty when we execute degenerated GC cycle. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: touch file to force tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28834/files - new: https://git.openjdk.org/jdk/pull/28834/files/87b41568..7b0efb3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28834/head:pull/28834 PR: https://git.openjdk.org/jdk/pull/28834 From kdnilsen at openjdk.org Mon Jan 5 21:49:35 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 21:49:35 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v15] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: touch file to force retest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/7b9c4d64..6480fef2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From dholmes at openjdk.org Tue Jan 6 02:02:16 2026 From: dholmes at openjdk.org (David Holmes) Date: Tue, 6 Jan 2026 02:02:16 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed [v4] In-Reply-To: References: Message-ID: On Sun, 28 Dec 2025 03:56:39 GMT, Sergey Bylokhov wrote: >> The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) >> >> The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: >> >> ~~`git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done `~~ >> >> `git diff origin/master --name-only | while read f; do git log origin/master --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` > > Sergey Bylokhov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into copy_hotspot > - 8374316: Update copyright year to 2025 for hotspot in files where it was missed Just be aware that if a file was created as part of a refactoring and the code was taken as-is from an existing file, then the copyright year range should have remained the same as the original file. I don't know if any of the files you modified fall into that category but just wanted to point out that looking at the commit date is not always correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28970#issuecomment-3712798915 From wkemper at openjdk.org Tue Jan 6 20:49:25 2026 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Jan 2026 20:49:25 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v2] In-Reply-To: <_QnV10ychv2AQj3TN6gch1p8B-OGMTsN6FTcbBJSn9U=.cba48ddd-cb45-4e25-9633-5ef7e9cfa4ea@github.com> References: <_QnV10ychv2AQj3TN6gch1p8B-OGMTsN6FTcbBJSn9U=.cba48ddd-cb45-4e25-9633-5ef7e9cfa4ea@github.com> Message-ID: On Tue, 16 Dec 2025 23:30:58 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 321: >> >>> 319: op_degenerated_futile(); >>> 320: } else { >>> 321: _generation->heuristics()->record_unsuccessful_degenerated(); >> >> Suggestion: >> >> _generation->heuristics()->record_successful_degenerated(); >> >> I think the confusion here is that we are conflating `progress` and `success`. The "progress" notion here is about triggering a full GC or giving up entirely. The degenerated cycle is "successful" because it did not run a full GC. Maybe we should rename `record_successful_degenerated` to `record_degenerated` (or, perhaps even `apply_degenerated_penalty`). I was about to suggest we pull `record_success_degenerated` out of the logic entirely, but that would mean upgraded degen cycles would be penalized again when the full GC completes. > > May be let the heuristics (or the policy) track progress as well, and inform the actuator (i.e. op degenerated) whether it should upgrade to a full gc. It almost feels like heuristics and policy and actuator are leaking abstractions. It feels like heuristics keep track of the model parameters and learn from sensors, and the policy consults a specific heuristic to inform actuator (i.e. actions). > > By that model, you'd have the actuator sending the sensor information to the heuristics and asking the policy (or the heuristics, if you conflate heuristics and policy) to decide which step to take next. It would seem that evaluation of the notion of progress then moves to the policy too. @kdnilsen , what do you think about having a single method called `record_degenerated`. It's a matter of fact without conflating progress and success. I don't like having duplicated code between `record_success_degenerated` and `record_unsuccessful_degenerated`. I understand what @ysramakrishna is saying, and I agree, but I think a change like that is beyond the scope of this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2666242735 From wkemper at openjdk.org Tue Jan 6 22:31:38 2026 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Jan 2026 22:31:38 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 20:39:03 GMT, Kelvin Nilsen wrote: >> After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: >> >> 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. >> 2. Sample allocation rates more frequently than once every 100 ms. >> 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. >> 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove develop/debug instrumentation Took another look over this. There is a lot to get through. I'll have more later. src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 179: > 177: " after adjusting for spike_headroom: %zu%s" > 178: " and penalties: %zu%s", _is_generational? _space_info->name(): "Global", > 179: byte_size_in_proper_unit(mutator_available), proper_unit_for_byte_size(mutator_available), Can we use the `PROPERFMT/PROPERFMTARGS` macros for these? I find they really improve readability. src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 199: > 197: > 198: // There is no headroom during evacuation and update refs. This information is not used to trigger the next GC. > 199: // Rather, it is made available to support throttling of allocations during GC. Is that true? or is allocation throttling part of another change? src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 275: > 273: } > 274: > 275: void ShenandoahAdaptiveHeuristics::add_gc_time(double timestamp, double gc_time) { Could we use `TruncatedSeq::predict_next` here? src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1218: > 1216: } else { > 1217: heap->heuristics()->start_idle_span(); > 1218: } Suggestion: _generation->heuristics()->start_idle_span(); ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29039#pullrequestreview-3632535527 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2666483213 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2666485239 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2666489076 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2666310682 From wkemper at openjdk.org Tue Jan 6 23:18:02 2026 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Jan 2026 23:18:02 GMT Subject: RFR: 8314599: [GenShen] Couple adaptive tenuring and generation size budgeting [v14] In-Reply-To: References: Message-ID: > Notable changes: > * Improvements to logging > * More accurate tracking of promotion failures > * Use shared allocation for promotions only when the size is above the maximum plab size (not the minimum size) > * Use census information gathered during mark to size promotion reserves and old generation > > With these changes, GenShen is expected to have fewer promotion failures and this is indeed the case. As a result of this, we expect less time to be spent in concurrent marking and update refs for young collections. We may also expect shorter concurrent evacuation phases because GenShen will have fewer densely packed regions stuck in the young generation. With more objects being promoted, we also expect to see longer remembered set scan times. This is generally the case across all benchmarks, but we do also see some counter-intuitive results. > > Here we are comparing 20 executions (10 on x86, 10 on aarch64) of the changes in the PR (experiment) against 20 executions of the same benchmarks results from tip. This is a summary of statistically significant changes of more than 5% across all benchmarks: > > > Concurrent Evacuation: 7 improvements, 3 regressions > ? Best improvements: extremem-large-45g (-29.6%), neo4j-analytics (-26.9%) > ? Worst regression: xalan (+53.7%) > > Concurrent Marking: 15 improvements, 1 regression > ? Best improvements: hyperalloc_a2048_o4096 (-30.1%), crypto.rsa (-27.3%) > ? Only regression: serial (+8.9%) > > Concurrent Scan Remembered Set: 7 improvements, 2 regressions > ? Best improvements: xalan (-49.4%), pmd (-49.0%), crypto.rsa (-41.8%) > ? Worst regression: extremem-phased (+52.4%) > > Concurrent Update Refs: 5 improvements, 4 regressions > ? Best improvements: crypto.rsa (-36.4%), mnemonics (-28.4%) > ? Worst regression: xalan (+89.4%) William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 81 commits: - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Fix comments, add back an assertion - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Accommodate behavior of global heuristic - Restore missing update for inplace promotion padding - Remove reference to adaptive tuning flag - Remove commented out assertion - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Adaptive tenuring is no longer optional We are using age census data to compute promotion reserves. The tenuring threshold may still be fixed by setting the min/max threshold to the same value. - ... and 71 more: https://git.openjdk.org/jdk/compare/7c979c14...f460f115 ------------- Changes: https://git.openjdk.org/jdk/pull/27632/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27632&range=13 Stats: 398 lines in 11 files changed: 158 ins; 173 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/27632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27632/head:pull/27632 PR: https://git.openjdk.org/jdk/pull/27632 From kdnilsen at openjdk.org Wed Jan 7 00:36:14 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:14 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Mon, 5 Jan 2026 21:04:45 GMT, Xiaolong Peng wrote: >> Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: >> >> * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. >> * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. >> * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. >> * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` >> >> I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: >> >> 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. >> >> java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" >> >> >> Openjdk TIP: >> >> ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== >> ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== >> ===== DaCapo tail ... > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: > > - Merge branch 'openjdk:master' into cas-alloc-1 > - Fix build error after merging from tip > - Merge branch 'master' into cas-alloc-1 > - Merge branch 'master' into cas-alloc-1 > - Some comments updates as suggested in PR review > - Fix build failure after merge > - Expend promoted from ShenandoahOldCollectorAllocator > - Merge branch 'master' into cas-alloc-1 > - Address PR comments > - Merge branch 'openjdk:master' into cas-alloc-1 > - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 This is a huge PR. Thanks for working through all the details to get this working. I've identified several issues that I believe require some further attention. We can discuss in a meeting if that would be helpful. src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 80: > 78: for (size_t i = 0; i < num_regions; i++) { > 79: ShenandoahHeapRegion* region = heap->get_region(i); > 80: assert(!region->is_active_alloc_region(), "Not expecting any active alloc region at the time"); Might change comment to: "Should be no active alloc regions when choosing collection set" src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 102: > 100: for (size_t i = 0; i < num_regions; i++) { > 101: ShenandoahHeapRegion* region = heap->get_region(i); > 102: assert(!region->is_active_alloc_region(), "Not expecting any active alloc region at the time"); Same suggestion here as with shenandoahGenerationalHeuristics.cpp. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 110: > 108: } > 109: > 110: uint dummy = 0; Don't call this "dummy". Call it regions_ready_for_refresh. Remember the value and pass it in as a new argument to attempt_allocation_slow() so that we don't have to recompute it later. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 114: > 112: HeapWord* obj = attempt_allocation_in_alloc_regions(req, in_new_region, alloc_start_index(), dummy); > 113: if (obj != nullptr) { > 114: return obj; Even in the case that we successfully fill our allocation request, if regions_ready_for_refresh is greater than some percentage of _alloc_region_count (e.g. > _alloc_region_count / 4), then we should grab the heap lock and refresh_alloc_regions() here. Otherwise, we will gradually degrade the number of directly_allocatable_regions until we are down to one before we refresh any of them. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 133: > 131: ShenandoahHeapAccountingUpdater accounting_updater(_free_set, ALLOC_PARTITION); > 132: > 133: if (regions_ready_for_refresh > 0u) { Since we've already taken the heap lock because we failed to allocate "fast", I'm ok to go ahead and refresh any regions that are ready right now, even if it's only 1 region. I'm wondering if we can avoid thrashing in the case that there are no more regions available. We might want to keep a state variable that represents whether there exist free-set regions with which to refresh our cache. This could be updated whenever we "add to" or "rebuild" the free set, and whenever refresh_alloc_regions() find there is insufficient supply to demand. We would want to avoid repeated calls to refresh_alloc_regions() if there are no "refresh_regions_available". src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 192: > 190: uint i = alloc_start_index; > 191: do { > 192: if (ShenandoahHeapRegion* r = nullptr; (r = _alloc_regions[i].address) != nullptr && r->is_active_alloc_region()) { Note that there is a race (and performance overhead) with checking r->is_active_alloc_region(). Though a region might be active when we check it here, it may be inactive by the time we attempt to atomic_allocate_in(). This is one reason I prefer to use "volatile_top == end" to denote !is_active_alloc_region. This way, you only have to check once (rather than checking is_active() and then checking has_available()). And there is no race between when you check and when you attempt to allocate. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 194: > 192: if (ShenandoahHeapRegion* r = nullptr; (r = _alloc_regions[i].address) != nullptr && r->is_active_alloc_region()) { > 193: bool ready_for_retire = false; > 194: HeapWord* obj = atomic_allocate_in(r, true, req, in_new_region, ready_for_retire); Insert before atomic_allocate_in: int contended Pass this as 6th arg to atomic_allocate_in() Add this code after atomic_allocate_in(): if ((i == alloc_start_index) && (contended > 1)) { randomize_start_index(); // I think this is realized by setting _alloc_start_index to UINT_MAX } src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 203: > 201: } > 202: } else if (r == nullptr || !r->is_active_alloc_region()) { > 203: regions_ready_for_refresh++; Add this code: if (i == alloc_start_index) { randomize_start_index(); } src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 214: > 212: > 213: template > 214: HeapWord* ShenandoahAllocator::atomic_allocate_in(ShenandoahHeapRegion* region, bool const is_alloc_region, ShenandoahAllocRequest &req, bool &in_new_region, bool &ready_for_retire) { Add argument: int &contended src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 219: > 217: size_t actual_size = req.size(); > 218: if (req.is_lab_alloc()) { > 219: obj = region->allocate_lab_atomic(req, actual_size, ready_for_retire); Pass contended arg to allocate_lab_atomic() src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 221: > 219: obj = region->allocate_lab_atomic(req, actual_size, ready_for_retire); > 220: } else { > 221: obj = region->allocate_atomic(actual_size, req, ready_for_retire); Pass contended arg to allocate_lab_atomic() src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 233: > 231: // evacuation are not updated during evacuation. For both young and old regions r, it is essential that all > 232: // PLABs be made parsable at the end of evacuation. This is enabled by retiring all plabs at end of evacuation. > 233: region->concurrent_set_update_watermark(region->top()); There's a race here. Multiple mutators may be updating watermark in parallel. It may be that the mutator who most recently allocated is not the mutator who makes the "most recent" overwrite of set_update_watermark(). I think the better fix is to remove this code. Update refs should just assume that update watermark equals top for any region in the Old gen, and for any region that was in the Collector partition. It may not be easy to know which regions were "in the Collector partition". Maybe we use a Sentinel value for update_watermark on all such regions. Just overwrite update_watermark(nullptr)? And check for this in update-refs? Needs a solution, and solution needs to be documented in code comments. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 254: > 252: // Step 1: find out the alloc regions which are ready to refresh. > 253: for (uint i = 0; i < _alloc_region_count; i++) { > 254: ShenandoahAllocRegion* alloc_region = &_alloc_regions[i]; We've got the heap lock here. why does this need to be atomic? Comments in the code should make this clear. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 263: > 261: } > 262: if (ALLOC_PARTITION == ShenandoahFreeSetPartitionId::Mutator) { > 263: if (free_bytes > 0) { We should have counted the entire region's available bytes as allocated when we made this a directly allocatable region. We should not need to further increase bytes allocated here. I would like to see an assert(free_bytes < PLAB::min_size() * HeapWordSize) here. Eventually, I'd want to generalize this code so that we could refresh regions that are not yet ready to be retired. In this case, we would want to unretire the region here. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 280: > 278: // Step 2: allocate region from FreeSets to fill the alloc regions or satisfy the alloc request. > 279: ShenandoahHeapRegion* reserved[MAX_ALLOC_REGION_COUNT]; > 280: int reserved_regions = _free_set->reserve_alloc_regions(ALLOC_PARTITION, refreshable_alloc_regions, I request we get rid of the min_free_words argument to free_set->reserve_alloc_regions(). See comments in the called function. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 304: > 302: log_debug(gc, alloc)("%sAllocator: Storing heap region %li to alloc region %i", > 303: _alloc_partition_name, reserved[i]->index(), refreshable[i]->alloc_region_index); > 304: AtomicAccess::store(&refreshable[i]->address, reserved[i]); Should not need to perform AtomicAccess because we hold the heap lock here. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 316: > 314: HeapWord* ShenandoahAllocator::allocate(ShenandoahAllocRequest &req, bool &in_new_region) { > 315: #ifdef ASSERT > 316: verify(req); Insert a comment above verify(): "Conform that req corresponds to ALLOC_PARTITION" src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 338: > 336: for (uint i = 0; i < _alloc_region_count; i++) { > 337: ShenandoahAllocRegion& alloc_region = _alloc_regions[i]; > 338: ShenandoahHeapRegion* r = AtomicAccess::load(&alloc_region.address); We've got heap lock and at safepoint. Do not need AtomicAccess here. That is more costly than necessary. I prefer to use regular fetch. If you prefer to keep AtomicAccess, please provide a comment in the code explaining why and we will revist. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 345: > 343: r->unset_active_alloc_region(); > 344: } > 345: AtomicAccess::store(&alloc_region.address, static_cast(nullptr)); Same here. We do not need AtomicAccess. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 350: > 348: total_free_bytes += free_bytes; > 349: total_regions_to_unretire++; > 350: _free_set->partitions()->unretire_to_partition(r, ALLOC_PARTITION); When we reserved this directly allocatable region, we increased bytes allocated() if the ALLOC_PARTITION was mutator. Here, we need to undo that: if (ALLOC_PARTITION == ShenandoahFreeSetPartitionId::Mutator) { decrease_bytes_allocated(free_bytes); } src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 353: > 351: if (!r->has_allocs()) { > 352: log_debug(gc, alloc)("%sAllocator: Reverting heap region %li to FREE due to no alloc in the region", > 353: _alloc_partition_name, r->index()); This code looks suspect to me. Maybe it works as is only because we are currently doing this only immediately before rebuilding free set. If that's the case, there should be some documentation and maybe even some asserts that confirm it is true. When we release_alloc_regions(), we should be adjusting the range for the associated partitions. The code that most closely resembles this functionality is in ShenandoahFreeSet::move_regions_from_collector_to_mutator(). This is the code that moves collector and old-collector partitions to the mutator partition after evacuation is done. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 360: > 358: } > 359: } > 360: assert(AtomicAccess::load(&alloc_region.address) == nullptr, "Alloc region is set to nullptr after release"); Do not need AtomicAccess here src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 364: > 362: _free_set->partitions()->decrease_used(ALLOC_PARTITION, total_free_bytes); > 363: _free_set->partitions()->increase_region_counts(ALLOC_PARTITION, total_regions_to_unretire); > 364: accounting_updater._need_update = true; Here is where you know which tallies have been affected by this operation. This is where you should specialize the calls to freeset recompute_total_used() and recompute_total_affiliated(). Either call those from here, or add parameters to your accounting_updater object so that you do not have to overcompute each operation. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 376: > 374: } > 375: > 376: THREAD_LOCAL uint ShenandoahMutatorAllocator::_alloc_start_index = UINT_MAX; I raised questions about this in a previous review. Have I overlooked your response? What is the tradeoff between declaring this THREAD_LOCAL vs. creating a new field in ShenandoahThreadLocal? I believe we need to use fields of ShenandoahThreadLocal so that we do not incur an overhead on all threads when JVM is not configured for Shenandoah GC. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 423: > 421: _yield_to_safepoint = false; > 422: } > 423: I suppose ShenandoahCollectorAllocator::randomize_start_index() might be a no-op. On the other hand, it would probably be better to use a random index for ShenandoahCollectorAllocator as well. We don't want to hobble one GC worker more than the others just because its preferred start index happens to hold a retire-ready region. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 428: > 426: } > 427: > 428: HeapWord* ShenandoahOldCollectorAllocator::allocate(ShenandoahAllocRequest& req, bool& in_new_region) { Confer with William Kemper about this. He is working on a change that may simplify the handling of PLABs, in which case ShenandoahOldCollectorAllocator can behave the same as ShenandoahCollector. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 436: > 434: // Make sure the old generation has room for either evacuations or promotions before trying to allocate. > 435: auto old_gen = ShenandoahHeap::heap()->old_generation(); > 436: if (req.is_old() && !old_gen->can_allocate(req)) { This test for req.is_old() appears to be unnecessary. The verify(req) assert above requires that req.is_old(). Perhaps the verify() method is too abstract. Add a comment there that says: "Confirm that req.is_old()" src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 56: > 54: virtual uint alloc_start_index() { return 0u; } > 55: > 56: // Attempt to allocate Comment needs to make clear that this is the main entry point for fast-path allocation from a directly allocatable region. This function delegates to slow-path allocation if it is unable to allocate from the directly allocatable regions. Not sure I like the name "attempt_allocation()". All of our allocation routines attempt to allocate and return a sentinel value (nullptr) if the allocation fails. This is no different. Just call it allocate_work(), and clarify that this is the helper routine of allocate() which does the work of allocating from a directly allocatable region without acquiring the heap lock if that is possible, and otherwise does a slow-path allocation which requires acquisition of the heap lock. I see that your comments are trying to say this. But the comments as written are not easy to understand. src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 69: > 67: > 68: // Attempt to allocate in a shared alloc region using atomic operation without holding the heap lock. > 69: // Returns nullptr and overwrites regions_ready_for_refresh with the number of shared alloc regions that are ready Suggest this edit: // Overwrites regions_ready_for_refresh with a lower bound on the number of shared alloc regions that are ready // to be retired during execution of this "do_fast_allocation" function. Returns nullptr if the allocation request could // not be fulfilled after a single traversal of directly allocatable regions. src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 79: > 77: int refresh_alloc_regions(ShenandoahAllocRequest* req = nullptr, bool* in_new_region = nullptr, HeapWord** obj = nullptr); > 78: #ifdef ASSERT > 79: virtual void verify(ShenandoahAllocRequest& req) { } Need a comment to explain what verify does. Is this simply checking to make sure the req is "properly formatted"? I think the intention is to enforce that req affiliation corresponds to ALLOC_PARTITION. Would be good to clarify this in the comment. Do we need this to be virtual? It seems like a single templated implementation would suffice. src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 91: > 89: virtual HeapWord* allocate(ShenandoahAllocRequest& req, bool& in_new_region); > 90: virtual void release_alloc_regions(); > 91: virtual void reserve_alloc_regions(); Need comments on these functions. Clarify pre-conditions and post-conditions. I think the intention is: 1. allocate(): Caller does not hold the heap lock. All allocations by mutator or GC are fulfilled by this function. This function tries to perform a CAS allocation without obtaining the global heap lock. If that fails, it will obtain the global heap lock and do a free-set allocation. As a side effect of doing a free-set allocation, some number of directly allocatable regions may be retired and replaced with new directly allocatable regions. 2. release_alloc_regions(): Caller must hold the heap lock. This causes all directly allocatable regions to be placed into the appropriate ShenandoahFreeSet partition. We do this in preparation for choosing a collection set and/or rebuilding the freeset. 3. reserve_alloc_regions(): Caller must hold the heap lock. This causes us to set aside N regions as directly allocatable by removing these regions from the relevant ShenandoahFreeSet partitions. Explain what happens if there are not N regions available. Clarify: these three function represent the entirety of the "public mutation API" that is exercised by mutators and GC workers as they interact with the free set? (There is another set of functions that could be characterized as the read-only API for obtaining state information about the free set. This provides information such as available memory, allocated bytes since GC start, etc.) src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 3045: > 3043: } > 3044: > 3045: int ShenandoahFreeSet::reserve_alloc_regions(ShenandoahFreeSetPartitionId partition, int regions_to_reserve, size_t min_free_words, ShenandoahHeapRegion** reserved_regions) { I request that we not enforce min_free_words when reserving allocation regions. This defeats the purpose of allocation bias. The objective is to consume fragmented memory early in the GC cycle (when we have more mitigation options if an allocation request ever fails). Note that every region that is in any partition has at least PLAB::min_size() available memory. By requiring that MUTATOR regions have PLAB::max_size() words, we are forcing ourselves to never consume the fragmented memory regions. (Towards the end of GC, when memory is in short supply, we will be unable to find directly allocatable MUTATOR regions. This will force ourselves to obtain the heap lock for every allocation. And these allocations will be inefficient because the remaining memory is highly fragmented.) src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 167: > 165: }; > 166: > 167: HeapWord* ShenandoahHeapRegion::allocate_atomic(size_t size, const ShenandoahAllocRequest& req, bool &ready_for_retire) { Suggest we add a fourth arg: int &contended We initialize contended to zero src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 187: > 185: return nullptr; > 186: } > 187: } Before iterating, increment contended by 1 src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 190: > 188: } > 189: > 190: HeapWord* ShenandoahHeapRegion::allocate_lab_atomic(const ShenandoahAllocRequest& req, size_t &actual_size, bool &ready_for_retire) { Suggest we add a fourth arg: int &contended We initialize contended to zero src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 218: > 216: return nullptr; > 217: } > 218: } Before we iterate, we increment contended by 1 src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 304: > 302: } > 303: > 304: inline void ShenandoahHeapRegion::concurrent_set_update_watermark(HeapWord* w) { See comment elsewhere in my feedback. I think we may want to use a special sentinel value to denote that watermark for Collector and OldCollector regions. For both of these, there is essentially not watermark value. If we try to set the value to top() from within a CAS-allocating mutator thread, we can end up setting watermark to the not-most-recent value of top(), which would result in misbehavior during update refs. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 567: > 565: "0 will allow back to back young collections to run during old " \ > 566: "collections.") \ > 567: \ once we resolve the various issues identified in feedback comments, I would be interested in results of experimenting with different values of these two parameters... ------------- Changes requested by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/26171#pullrequestreview-3628853514 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663181721 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663183357 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665709301 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665818148 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665800328 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666506691 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666328083 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666332994 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666334248 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666334844 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666335529 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666360404 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666366038 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666526965 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666564758 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666566961 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666567671 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663324871 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663327493 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666583027 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666637281 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666628228 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663337002 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663279917 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666642051 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666643974 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665511567 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665632758 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666273440 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663265276 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663261232 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666553882 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666309782 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666309888 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666310835 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666311617 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666683738 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666691665 From kdnilsen at openjdk.org Wed Jan 7 00:36:17 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:17 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v17] In-Reply-To: <2YVE3uR8bRJ_8qAtXN4WoRmeY0Y9xhzsmKbxqv5oL2M=.2811f02e-35b8-46cf-863a-db4006ca1a78@github.com> References: <2YVE3uR8bRJ_8qAtXN4WoRmeY0Y9xhzsmKbxqv5oL2M=.2811f02e-35b8-46cf-863a-db4006ca1a78@github.com> Message-ID: <4Qs9QlvWQ7gX6RW-rwPVJa-Ndhtx883aImTtiXyCYGk=.1c03a208-d6e5-42bc-bfc3-d1f95e968929@github.com> On Tue, 9 Dec 2025 21:03:21 GMT, Xiaolong Peng wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Some comments updates as suggested in PR review > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 41: > >> 39: _alloc_region_count(alloc_region_count), _free_set(free_set), _alloc_partition_name(ShenandoahRegionPartitions::partition_name(ALLOC_PARTITION)) { >> 40: if (alloc_region_count > 0) { >> 41: _alloc_regions = PaddedArray::create_unfreeable(alloc_region_count); > > Rethinking about the the PaddedArray used here, we may not really need it. > Allocator has multiple shared alloc regions for CAS, and only refreshes them when all of them run out of usable memory, so _alloc_regions won't be frequently updated, the PaddedArray here should have a negative performance impact. Are you running any experiments (on different hardware configurations) to test your assumptions about this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663185400 From kdnilsen at openjdk.org Wed Jan 7 00:36:17 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:17 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v17] In-Reply-To: <4Qs9QlvWQ7gX6RW-rwPVJa-Ndhtx883aImTtiXyCYGk=.1c03a208-d6e5-42bc-bfc3-d1f95e968929@github.com> References: <2YVE3uR8bRJ_8qAtXN4WoRmeY0Y9xhzsmKbxqv5oL2M=.2811f02e-35b8-46cf-863a-db4006ca1a78@github.com> <4Qs9QlvWQ7gX6RW-rwPVJa-Ndhtx883aImTtiXyCYGk=.1c03a208-d6e5-42bc-bfc3-d1f95e968929@github.com> Message-ID: <6J2xtZ1DLytiwflQ0wbQCtg8tsRAHadSdkZGZllLAxY=.af212eeb-44f7-4955-96ab-d069febe4e0e@github.com> On Tue, 6 Jan 2026 00:33:57 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 41: >> >>> 39: _alloc_region_count(alloc_region_count), _free_set(free_set), _alloc_partition_name(ShenandoahRegionPartitions::partition_name(ALLOC_PARTITION)) { >>> 40: if (alloc_region_count > 0) { >>> 41: _alloc_regions = PaddedArray::create_unfreeable(alloc_region_count); >> >> Rethinking about the the PaddedArray used here, we may not really need it. >> Allocator has multiple shared alloc regions for CAS, and only refreshes them when all of them run out of usable memory, so _alloc_regions won't be frequently updated, the PaddedArray here should have a negative performance impact. > > Are you running any experiments (on different hardware configurations) to test your assumptions about this? Please document the results of any experiments as rationale for the final design. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666275536 From kdnilsen at openjdk.org Wed Jan 7 00:36:20 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:20 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Wed, 3 Dec 2025 01:09:34 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 100: >> >>> 98: HeapWord* ShenandoahAllocator::attempt_allocation(ShenandoahAllocRequest& req, bool& in_new_region) { >>> 99: if (_alloc_region_count == 0u) { >>> 100: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); >> >> Looking for more comments here as well. What does it mean that _alloc_region_count == 0? Does this mean we have not yet initialized the directly allocatable regions (following a particular GC event)? Or does it mean that we have depleted all of the available regions and we are out of memory? In the first case, it seems we would want to replenish our supply of directly allocatable regions while we hold the GC lock. In the second case, it seems there's really no value in even attempting a slow allocation. (If we were unable to refresh our directly allocatable regions, then it will not find allocatable memory even on the other side of the heap lock...) > > I'll add comments on this, _alloc_region_count == 0 means we don't want to use any shared alloc region, it will always allocate with a heap lock, ideally the performance should be same as before, so it always simply find a region with enough space and allocate in the region. Put the comments describing functions in the .hpp file, where they are currently. But we need to enhance those comments. >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 158: >> >>> 156: if (r != nullptr) { >>> 157: bool ready_for_retire = false; >>> 158: obj = atomic_allocate_in(r, false, req, in_new_region, ready_for_retire); >> >> Not sure why we use atomic_allocate_in() here. We hold the heap lock so we don't need to use atomic operations. >> We should clarify with comments. > > It is not really necessary to `atomic_allocate_in` here, but I wanted reuse some of the codes in atomic_allocate_in, we can discuss this later, I can change it back to non-atomic version. Would prefer not to use the atomic_allocate code here. If you want to reuse code, maybe you can refactor allocate_in with an template argument. I notice that this PR makes lots of ShenandoahHeapRegion variables volatile: _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs. That will cause less efficient code to be generated whenever we are accessing this data from "behind" the GC heap lock, which should be rare, I guess. I raised some concerns/issues about the race that happens when we move a region from the directly-allocatable set into the global-heap-lock-protected ShenandoahFreeSet partition. Here's the scenario that I'm concerned about: 1. A mutator obtains pointer to directly allocatable region R 2. A second mutator performs a refresh, moving region R out of directly allocatable set (for whatever reason) 3. Region R is now eligible to satisfy allocations from behind the global heap lock 4. Some third mutator thread acquires the heap lock and fetches top for region $ 5. The first mutator performs its allocation within the same region R, not recognizing a CAS conflict 6. This third mutator allocates from region R at top, without using CAS. So both mutators think they own the same object I think this is not a problem if the "only" reason at step 2 above that we move region R out of directly allocatable set is because R is ready to be retired. In that case, there will be no subsequent heap-locked allocations in regions R. However, I anticipate the day in not-too-distant future when we will want to refresh regions even when they are not ready to be retired. Specifically, as we move rebuild-freeset out of safepoints, we will want to refresh regions before we acquire heap-lock to do rebuild, with the goal of making sure there is sufficient directly allocatable memory available that no mutator will be stalled because it needs to allocate during the time that the heap remains locked for the rebuild operation. So I suppose that if we always use atomic_allocate() even for allocations that happen while holding the heap lock, we won't have this problem. If we decide to keep this architecture, there should be comments explaining why we are doing it this way. (I am not real happy that we have to "pay the cost" of CAS in addition to paying the cost of global heap lock, but I think these allocations should be very rare. It seems this would only come up if, for example, a mutator wanted to allocate an object that is 1/2 the heap region size, and none of the directly allocatable regions have that much available memory.) My original proposal was to have a volatile_top which is used by CAS allocation and a nonvolatile_top that is used by heap-lock allocation. When we make a region directly allocatable, we copy its nonvolatile_top to the volatile_top. When we take a directly allocatable region and move it into the heap-locked freeset, we use CAS to set its volatile_top to end before we place the region into the freeset partition, assigning to nonvolatile_top the value held in volatile_top before the CAS operation. Whatever solution is used for this needs to be documented in the code. Feel free to copy and paste from this github comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665714073 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666236427 From kdnilsen at openjdk.org Wed Jan 7 00:36:21 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:21 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: <4BVpGlI0mK12KTI6b6l1UoH26vEwytUP1vlYX1Z-UJQ=.374ea1dd-38d7-46ea-b6a5-a99453171059@github.com> On Wed, 3 Dec 2025 01:15:03 GMT, Xiaolong Peng wrote: >> It is not an error, before calling into attempt_allocation_slow, it already called attempt_allocation_in_alloc_regions once and failed to allocate, slow path is always with heap lock. >> >> After taking the lock, we should try the attempt_allocation_in_alloc_regions right away, because other mutator thread may have refreshed the alloc regions while holding the lock. > > accounting_update is required for slow path, but you are right, it can be moved to somewhere later, e.g. line 128. My mistake on first read here. I see now that we only come into this function if fast-allocation failed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665689002 From kdnilsen at openjdk.org Wed Jan 7 00:36:21 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:21 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <4BVpGlI0mK12KTI6b6l1UoH26vEwytUP1vlYX1Z-UJQ=.374ea1dd-38d7-46ea-b6a5-a99453171059@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> <4BVpGlI0mK12KTI6b6l1UoH26vEwytUP1vlYX1Z-UJQ=.374ea1dd-38d7-46ea-b6a5-a99453171059@github.com> Message-ID: On Tue, 6 Jan 2026 17:28:27 GMT, Kelvin Nilsen wrote: >> accounting_update is required for slow path, but you are right, it can be moved to somewhere later, e.g. line 128. > > My mistake on first read here. I see now that we only come into this function if fast-allocation failed. But part of the reason for my confusion is that you are trying to do fast allocations while holding the heap lock! The reason we came into attempt_allocation_slow() is because we already failed to attempt_allocation_in_alloc_regions(). There's no need to call this a second time. You should have remembered regions_ready_for_refresh and passed this in as an argument to attempt_allocation_slow(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665698471 From kdnilsen at openjdk.org Wed Jan 7 00:36:22 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:22 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> <4BVpGlI0mK12KTI6b6l1UoH26vEwytUP1vlYX1Z-UJQ=.374ea1dd-38d7-46ea-b6a5-a99453171059@github.com> Message-ID: <0IfaybNazrHlrdcGhk-0080a-KqpdUwN_WXpYVe_fjc=.d68c5098-02b7-4a91-8ad3-4e8b310cba0b@github.com> On Tue, 6 Jan 2026 17:31:57 GMT, Kelvin Nilsen wrote: >> My mistake on first read here. I see now that we only come into this function if fast-allocation failed. > > But part of the reason for my confusion is that you are trying to do fast allocations while holding the heap lock! > > The reason we came into attempt_allocation_slow() is because we already failed to attempt_allocation_in_alloc_regions(). There's no need to call this a second time. You should have remembered regions_ready_for_refresh and passed this in as an argument to attempt_allocation_slow(). I'm not concerned that the count of regions_ready_for_refresh might be stale. If this count is getting incremented "during" our allocation, we will see this result soon enough. If multiple mutators fail fast-path allocation simultaneously, they will each acquire heap lock either way (existing implementation vs. new implementation that does not retry the allocation). Acquiring the heap lock is the "expensive" operation. If the first one refreshes allocation regions, then subsequent invocations will not find any regions to be refreshed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665704698 From serb at openjdk.org Wed Jan 7 02:24:25 2026 From: serb at openjdk.org (Sergey Bylokhov) Date: Wed, 7 Jan 2026 02:24:25 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed [v4] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 01:58:22 GMT, David Holmes wrote: >Just be aware that if a file was created as part of a refactoring and the code was taken as-is from an existing file, then the copyright year range should have remained the same as the original file. I don't know if any of the files you modified fall into that category but just wanted to point out that looking at the commit date is not always correct. I tried to catch rename/move-only or copyright-only changes, but I?m not 100% sure I filtered all of them out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28970#issuecomment-3717068454 From lkorinth at openjdk.org Wed Jan 7 12:35:42 2026 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 7 Jan 2026 12:35:42 GMT Subject: RFR: 8367993: G1: Speed up ConcurrentMark initialization [v2] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 10:02:41 GMT, Leo Korinth wrote: >> This change moves almost all of the ConcurrentMark initialisation from its constructor to the method `G1ConcurrentMark::fully_initialize()`. Thus, creation time of the VM can be slightly improved by postponing creation of ConcurrentMark. Most time is saved postponing creation of statistics buffers and threads. >> >> It is not obvious that this is the best solution. I have earlier experimented with lazily allocating statistics buffers _only_. One could also initialise a little bit more eagerly (for example the concurrent mark thread) and maybe get a slightly cleaner change. However IMO it seems better to not have ConcurrentMark "half initiated" with a created mark thread, but un-initialised worker threads. >> >> This change is depending on the integration of https://bugs.openjdk.org/browse/JDK-8373253. >> >> I will be out for vacation, and will be back after new year (and will not answer questions during that time), but I thought I get the pull request out now so that you can have a look. > > Leo Korinth has updated the pull request incrementally with 561 additional commits since the last revision: > > - Merge branch 'master' into _8367993 > - 8366058: Outdated comment in WinCAPISeedGenerator > > Reviewed-by: mullan > - 8357258: x86: Improve receiver type profiling reliability > > Reviewed-by: kvn, vlivanov > - 8373704: Improve "SocketException: Protocol family unavailable" message > > Reviewed-by: lucy, jpai > - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently > > Reviewed-by: jiefu, jbhateja, erfang, qamai > - 8343809: Add requires tag to mark tests that are incompatible with exploded image > > Reviewed-by: alanb, dholmes > - 8374465: Spurious dot in documentation for JVMTI ClassLoad > > Reviewed-by: kbarrett > - 8374317: Change GCM IV size to 12 bytes when encrypting/decrypting TLS session ticket > > Reviewed-by: djelinski, mpowers, ascarpino > - 8374444: Fix simple -Wzero-as-null-pointer-constant warnings > > Reviewed-by: aboldtch > - 8373847: Test javax/swing/JMenuItem/MenuItemTest/bug6197830.java failed because The test case automatically fails when clicking any items in the ?Nothing? menu in all four windows (Left-to-right)-Menu Item Test and (Right-to-left)-Menu Item Test > > Reviewed-by: serb, aivanov, dnguyen > - ... and 551 more: https://git.openjdk.org/jdk/compare/b907b295...0ece3767 I will redo the merge, I have done something strange. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28723#issuecomment-3718660595 From lkorinth at openjdk.org Wed Jan 7 12:58:43 2026 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 7 Jan 2026 12:58:43 GMT Subject: RFR: 8367993: G1: Speed up ConcurrentMark initialization [v3] In-Reply-To: References: Message-ID: > This change moves almost all of the ConcurrentMark initialisation from its constructor to the method `G1ConcurrentMark::fully_initialize()`. Thus, creation time of the VM can be slightly improved by postponing creation of ConcurrentMark. Most time is saved postponing creation of statistics buffers and threads. > > It is not obvious that this is the best solution. I have earlier experimented with lazily allocating statistics buffers _only_. One could also initialise a little bit more eagerly (for example the concurrent mark thread) and maybe get a slightly cleaner change. However IMO it seems better to not have ConcurrentMark "half initiated" with a created mark thread, but un-initialised worker threads. > > This change is depending on the integration of https://bugs.openjdk.org/browse/JDK-8373253. > > I will be out for vacation, and will be back after new year (and will not answer questions during that time), but I thought I get the pull request out now so that you can have a look. Leo Korinth has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 564 commits: - Merge branch '8373253' into 8367993 - Merge branch 'master' into _8373253 - Merge branch 'master' into _8367993 - 8366058: Outdated comment in WinCAPISeedGenerator Reviewed-by: mullan - 8357258: x86: Improve receiver type profiling reliability Reviewed-by: kvn, vlivanov - 8373704: Improve "SocketException: Protocol family unavailable" message Reviewed-by: lucy, jpai - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently Reviewed-by: jiefu, jbhateja, erfang, qamai - 8343809: Add requires tag to mark tests that are incompatible with exploded image Reviewed-by: alanb, dholmes - 8374465: Spurious dot in documentation for JVMTI ClassLoad Reviewed-by: kbarrett - 8374317: Change GCM IV size to 12 bytes when encrypting/decrypting TLS session ticket Reviewed-by: djelinski, mpowers, ascarpino - ... and 554 more: https://git.openjdk.org/jdk/compare/2aa8aa4b...28ccbb68 ------------- Changes: https://git.openjdk.org/jdk/pull/28723/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28723&range=02 Stats: 130308 lines in 3967 files changed: 83803 ins; 29735 del; 16770 mod Patch: https://git.openjdk.org/jdk/pull/28723.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28723/head:pull/28723 PR: https://git.openjdk.org/jdk/pull/28723 From kdnilsen at openjdk.org Wed Jan 7 14:54:07 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 14:54:07 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Wed, 7 Jan 2026 00:04:41 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 428: > >> 426: } >> 427: >> 428: HeapWord* ShenandoahOldCollectorAllocator::allocate(ShenandoahAllocRequest& req, bool& in_new_region) { > > Confer with William Kemper about this. He is working on a change that may simplify the handling of PLABs, in which case ShenandoahOldCollectorAllocator can behave the same as ShenandoahCollector. Alternatively, I don't think it would be too terribly difficult to implement try_allocate_aligned() function to support fast (CAS) allocation of aligned PLABs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2668774560 From kdnilsen at openjdk.org Wed Jan 7 14:58:37 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 14:58:37 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 18:13:09 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 114: > >> 112: HeapWord* obj = attempt_allocation_in_alloc_regions(req, in_new_region, alloc_start_index(), dummy); >> 113: if (obj != nullptr) { >> 114: return obj; > > Even in the case that we successfully fill our allocation request, if regions_ready_for_refresh is greater than some percentage of _alloc_region_count (e.g. > _alloc_region_count / 4), then we should grab the heap lock and refresh_alloc_regions() here. Otherwise, we will gradually degrade the number of directly_allocatable_regions until we are down to one before we refresh any of them. After further thought, am thinking the threshold for refresh_alloc_regions() might be if (regions_ready_for_refresh >= _alloc_region_count / 2). That would reduce the number of slow paths through the allocator. If we can re-randomize the thread-local start indexes when their original start index hits a retire-able region, this might work ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2668794471 From kdnilsen at openjdk.org Wed Jan 7 16:37:22 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 16:37:22 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v22] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 76 commits: - Remove note to self - Slight expansion of promo reserve - Remove bad assertion - Merge remote-tracking branch 'jdk/master' into share-collector-reserves-restart-gh - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - fix unsigned arithmetic underflow - Attempt fix for assertion failures - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Remove debug instrumentation - ... and 66 more: https://git.openjdk.org/jdk/compare/2d092840...d0a692ff ------------- Changes: https://git.openjdk.org/jdk/pull/25357/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=21 Stats: 1457 lines in 29 files changed: 777 ins; 286 del; 394 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Wed Jan 7 17:38:09 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 17:38:09 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v2] In-Reply-To: References: <_QnV10ychv2AQj3TN6gch1p8B-OGMTsN6FTcbBJSn9U=.cba48ddd-cb45-4e25-9633-5ef7e9cfa4ea@github.com> Message-ID: On Tue, 6 Jan 2026 20:46:23 GMT, William Kemper wrote: >> May be let the heuristics (or the policy) track progress as well, and inform the actuator (i.e. op degenerated) whether it should upgrade to a full gc. It almost feels like heuristics and policy and actuator are leaking abstractions. It feels like heuristics keep track of the model parameters and learn from sensors, and the policy consults a specific heuristic to inform actuator (i.e. actions). >> >> By that model, you'd have the actuator sending the sensor information to the heuristics and asking the policy (or the heuristics, if you conflate heuristics and policy) to decide which step to take next. It would seem that evaluation of the notion of progress then moves to the policy too. > > @kdnilsen , what do you think about having a single method called `record_degenerated`. It's a matter of fact without conflating progress and success. I don't like having duplicated code between `record_success_degenerated` and `record_unsuccessful_degenerated`. I understand what @ysramakrishna is saying, and I agree, but I think a change like that is beyond the scope of this PR. I like this idea. I'll try to make that work without breaking anything... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2669431913 From kdnilsen at openjdk.org Wed Jan 7 18:19:35 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 18:19:35 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v6] In-Reply-To: References: Message-ID: > Add a triggering penalty when we execute degenerated GC cycle. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Combine successful and unsuccessful into single method: report_degenerated() - remove gratuitous blank line - Merge remote-tracking branch 'jdk/master' into add-degen-penalty - touch file to force tests - Merge remote-tracking branch 'jdk/master' into add-degen-penalty - Merge remote-tracking branch 'jdk/master' into add-degen-penalty - refactor for reviewer requests - remove redundant code - Increase heuristic penalties following degenerated GC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28834/files - new: https://git.openjdk.org/jdk/pull/28834/files/7b0efb3e..888f92a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=04-05 Stats: 20369 lines in 1812 files changed: 4196 ins; 2531 del; 13642 mod Patch: https://git.openjdk.org/jdk/pull/28834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28834/head:pull/28834 PR: https://git.openjdk.org/jdk/pull/28834 From shade at openjdk.org Wed Jan 7 19:06:03 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Jan 2026 19:06:03 GMT Subject: RFR: 8373266: Strengthen constant CardTable base accesses In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 18:45:04 GMT, Aleksey Shipilev wrote: > Shenandoah and G1 are using CardTable for most of its infrastructure, but flip the card tables as they go, and maintain the actual card table reference in TLS. As such, accessing card table base from assembler and compilers runs into risk of accidentally encoding the wrong card table base in generated code. > > Most of the current code avoids this trouble by carefully implementing their GC barriers to avoid touching shared parts where card table base constness is assumed. _Except_ for JVMCI, that reads the card table base for G1 barrier set, and that is wrong. The JVMCI users would need to rectify this downstream. > > Shenandoah added a few asserts to catch these errors: > SHENANDOAHGC_ONLY(assert(!UseShenandoahGC, "Shenandoah byte_map_base is not constant.");) > > ...but G1 would also benefit from the similar safety mechanism. > > This PR strengthens the code to prevent future accidents. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc` > - [x] Linux x86_64 server fastdebug, `all` with Serial, Parallel, G1, Shenandoah, Z > - [x] Linux AArch64 server fastdebug, `all` with Serial, Parallel, G1, Shenandoah, Z > - [x] GHA, cross-compilation only Still waiting for reviews. @tschatzl, you might be interested in this from G1 side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28703#issuecomment-3720310765 From shade at openjdk.org Wed Jan 7 19:06:02 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Jan 2026 19:06:02 GMT Subject: RFR: 8373266: Strengthen constant CardTable base accesses [v2] In-Reply-To: References: Message-ID: > Shenandoah and G1 are using CardTable for most of its infrastructure, but flip the card tables as they go, and maintain the actual card table reference in TLS. As such, accessing card table base from assembler and compilers runs into risk of accidentally encoding the wrong card table base in generated code. > > Most of the current code avoids this trouble by carefully implementing their GC barriers to avoid touching shared parts where card table base constness is assumed. _Except_ for JVMCI, that reads the card table base for G1 barrier set, and that is wrong. The JVMCI users would need to rectify this downstream. > > Shenandoah added a few asserts to catch these errors: > SHENANDOAHGC_ONLY(assert(!UseShenandoahGC, "Shenandoah byte_map_base is not constant.");) > > ...but G1 would also benefit from the similar safety mechanism. > > This PR strengthens the code to prevent future accidents. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc` > - [x] Linux x86_64 server fastdebug, `all` with Serial, Parallel, G1, Shenandoah, Z > - [x] Linux AArch64 server fastdebug, `all` with Serial, Parallel, G1, Shenandoah, Z > - [x] GHA, cross-compilation only Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge branch 'master' into JDK-8373266-cardtable-asserts - Another build fix - Fix Minimal builds - Shenandoah non-generational can have nullptr card table - Also simplify CTBS builder - CI should also mention "const" - Fix JVMCI by answering proper things - Merge branch 'master' into JDK-8373266-cardtable-asserts - More fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28703/files - new: https://git.openjdk.org/jdk/pull/28703/files/26b6b071..040a84d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28703&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28703&range=00-01 Stats: 25810 lines in 2653 files changed: 14810 ins; 3456 del; 7544 mod Patch: https://git.openjdk.org/jdk/pull/28703.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28703/head:pull/28703 PR: https://git.openjdk.org/jdk/pull/28703 From kdnilsen at openjdk.org Wed Jan 7 19:08:27 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 19:08:27 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v2] In-Reply-To: References: <_QnV10ychv2AQj3TN6gch1p8B-OGMTsN6FTcbBJSn9U=.cba48ddd-cb45-4e25-9633-5ef7e9cfa4ea@github.com> Message-ID: On Wed, 7 Jan 2026 17:35:49 GMT, Kelvin Nilsen wrote: >> @kdnilsen , what do you think about having a single method called `record_degenerated`. It's a matter of fact without conflating progress and success. I don't like having duplicated code between `record_success_degenerated` and `record_unsuccessful_degenerated`. I understand what @ysramakrishna is saying, and I agree, but I think a change like that is beyond the scope of this PR. > > I like this idea. I'll try to make that work without breaking anything... I've committed this change and it is running through the CI pipeline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2669739585 From wkemper at openjdk.org Wed Jan 7 19:20:29 2026 From: wkemper at openjdk.org (William Kemper) Date: Wed, 7 Jan 2026 19:20:29 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v6] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 18:19:35 GMT, Kelvin Nilsen wrote: >> Add a triggering penalty when we execute degenerated GC cycle. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Combine successful and unsuccessful into single method: report_degenerated() > - remove gratuitous blank line > - Merge remote-tracking branch 'jdk/master' into add-degen-penalty > - touch file to force tests > - Merge remote-tracking branch 'jdk/master' into add-degen-penalty > - Merge remote-tracking branch 'jdk/master' into add-degen-penalty > - refactor for reviewer requests > - remove redundant code > - Increase heuristic penalties following degenerated GC Looks good to integrate, assuming testing pipelines pass. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28834#pullrequestreview-3636442278 From xpeng at openjdk.org Wed Jan 7 19:41:07 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 19:41:07 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Tue, 6 Jan 2026 20:43:39 GMT, Kelvin Nilsen wrote: >> It is not really necessary to `atomic_allocate_in` here, but I wanted reuse some of the codes in atomic_allocate_in, we can discuss this later, I can change it back to non-atomic version. > > Would prefer not to use the atomic_allocate code here. If you want to reuse code, maybe you can refactor allocate_in with an template argument. > > I notice that this PR makes lots of ShenandoahHeapRegion variables volatile: _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs. That will cause less efficient code to be generated whenever we are accessing this data from "behind" the GC heap lock, which should be rare, I guess. > > I raised some concerns/issues about the race that happens when we move a region from the directly-allocatable set into the global-heap-lock-protected ShenandoahFreeSet partition. > > Here's the scenario that I'm concerned about: > > 1. A mutator obtains pointer to directly allocatable region R > 2. A second mutator performs a refresh, moving region R out of directly allocatable set (for whatever reason) > 3. Region R is now eligible to satisfy allocations from behind the global heap lock > 4. Some third mutator thread acquires the heap lock and fetches top for region $ > 5. The first mutator performs its allocation within the same region R, not recognizing a CAS conflict > 6. This third mutator allocates from region R at top, without using CAS. So both mutators think they own the same object > > I think this is not a problem if the "only" reason at step 2 above that we move region R out of directly allocatable set is because R is ready to be retired. In that case, there will be no subsequent heap-locked allocations in regions R. However, I anticipate the day in not-too-distant future when we will want to refresh regions even when they are not ready to be retired. Specifically, as we move rebuild-freeset out of safepoints, we will want to refresh regions before we acquire heap-lock to do rebuild, with the goal of making sure there is sufficient directly allocatable memory available that no mutator will be stalled because it needs to allocate during the time that the heap remains locked for the rebuild operation. > > So I suppose that if we always use atomic_allocate() even for allocations that happen while holding the heap lock, we won't have this problem. If we decide to keep this architecture, there should be comments explaining why we are doing it this way. (I am not real happy that we have to "pay the cost" of CAS in addition to paying the cost of global heap lock, but I think these allocations should be very rare. It seems this would only come up if, for example, a mutator wanted to allocate an object that is 1/2 the heap regio... I will update the PR and not use atomic version here, and also another place in refresh_alloc_regions. Having volatile_top and nonvolatile_top seems necessary, it will make the code more complicated w/o much performance benefits, with CAS allocator, most of alloc request will be handled by the atomic code path, in only few cases we need non-atomic allocation: * After reserving alloc regions from free set before storing to alloc region, it performs obj allocation if the alloc request has not been satisfied yet. * After trying atomic allocation, refresh alloc regions fails, it will try to find a region in free set with enough space for the allocation request. Yes, all the _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs are volatile now, out of these fields, I believe I can maybe remove volatile for _age and _youth(?), but the update of the rest must be atomic because mutators will increase the values in the CAS allocation code path w/o heap lock. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2669831418 From kdnilsen at openjdk.org Wed Jan 7 19:58:58 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 19:58:58 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v23] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 78 commits: - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Fix whitespace and comment - Remove note to self - Slight expansion of promo reserve - Remove bad assertion - Merge remote-tracking branch 'jdk/master' into share-collector-reserves-restart-gh - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - fix unsigned arithmetic underflow - Attempt fix for assertion failures - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - ... and 68 more: https://git.openjdk.org/jdk/compare/dd20e915...9aa4a3e2 ------------- Changes: https://git.openjdk.org/jdk/pull/25357/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=22 Stats: 1456 lines in 29 files changed: 776 ins; 286 del; 394 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Wed Jan 7 20:32:30 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 20:32:30 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 21:11:42 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove develop/debug instrumentation > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1218: > >> 1216: } else { >> 1217: heap->heuristics()->start_idle_span(); >> 1218: } > > Suggestion: > > _generation->heuristics()->start_idle_span(); Very nice. Thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2669982434 From kdnilsen at openjdk.org Wed Jan 7 20:38:39 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 20:38:39 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 22:24:44 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove develop/debug instrumentation > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 179: > >> 177: " after adjusting for spike_headroom: %zu%s" >> 178: " and penalties: %zu%s", _is_generational? _space_info->name(): "Global", >> 179: byte_size_in_proper_unit(mutator_available), proper_unit_for_byte_size(mutator_available), > > Can we use the `PROPERFMT/PROPERFMTARGS` macros for these? I find they really improve readability. Agreed. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2669994974 From kdnilsen at openjdk.org Wed Jan 7 20:49:04 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 20:49:04 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 22:25:46 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove develop/debug instrumentation > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 199: > >> 197: >> 198: // There is no headroom during evacuation and update refs. This information is not used to trigger the next GC. >> 199: // Rather, it is made available to support throttling of allocations during GC. > > Is that true? or is allocation throttling part of another change? Sorry. Not true. Fixing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2670019293 From kdnilsen at openjdk.org Wed Jan 7 20:56:51 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 20:56:51 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: - Fix comment - Use PROPERFMT macros - Simplify code flow: reviewer suggestion - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - Remove develop/debug instrumentation - add another override - Change type of command-line args - fix white space - Add override to virtual methods - Fix race between allocation reporting and querying - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e ------------- Changes: https://git.openjdk.org/jdk/pull/29039/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=03 Stats: 1028 lines in 25 files changed: 921 ins; 35 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From kdnilsen at openjdk.org Wed Jan 7 20:56:54 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 20:56:54 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 22:27:41 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove develop/debug instrumentation > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 275: > >> 273: } >> 274: >> 275: void ShenandoahAdaptiveHeuristics::add_gc_time(double timestamp, double gc_time) { > > Could we use `TruncatedSeq::predict_next` here? In this PR, we keep TruncatedSeq::predict_next() functionality as that has proven to be "right" most of the time. TruncatedSeq::predict_next() assumes the next GC time is most effectively predicted as an average over a noisy history of previously measured GC times. This new function adds a new prediction mechanism which kicks in when we observe a "linearly increasing trend in GC times". This has been observed to occur during initialization and startup of new phases of a service workload, where GC(N) takes 400 ms, GC(N+1) takes 425 ms, GC(N+2) takes 465 ms, etc. The typical reason is because the workload is building up data structures and thus requires increasing amounts of time to mark and evacuate and update the increasing amounts of live data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2670036503 From xpeng at openjdk.org Wed Jan 7 21:13:08 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 21:13:08 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v17] In-Reply-To: <6J2xtZ1DLytiwflQ0wbQCtg8tsRAHadSdkZGZllLAxY=.af212eeb-44f7-4955-96ab-d069febe4e0e@github.com> References: <2YVE3uR8bRJ_8qAtXN4WoRmeY0Y9xhzsmKbxqv5oL2M=.2811f02e-35b8-46cf-863a-db4006ca1a78@github.com> <4Qs9QlvWQ7gX6RW-rwPVJa-Ndhtx883aImTtiXyCYGk=.1c03a208-d6e5-42bc-bfc3-d1f95e968929@github.com> <6J2xtZ1DLytiwflQ0wbQCtg8tsRAHadSdkZGZllLAxY=.af212eeb-44f7-4955-96ab-d069febe4e0e@github.com> Message-ID: <1MZQLDhJsqK5ZoPIVDYYRyVg0po67A6wVfIpsAl7Qa0=.d0bfa7e4-f448-4bb1-a386-b8226133e6a7@github.com> On Tue, 6 Jan 2026 21:00:03 GMT, Kelvin Nilsen wrote: >> Are you running any experiments (on different hardware configurations) to test your assumptions about this? > > Please document the results of any experiments as rationale for the final design. I did run some experiments and didn't see significant difference, I will keep keep current code using PaddedArray, meanwhile keep this conversation open and make a decision based metrics later after I address the other comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670084604 From kdnilsen at openjdk.org Wed Jan 7 21:26:52 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 21:26:52 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v24] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/9aa4a3e2..026e34df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=22-23 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From xpeng at openjdk.org Wed Jan 7 21:47:46 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 21:47:46 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: <3CDzFBUPf-x7Cp4HiQdzwytCJk9kdpDHGB0SjEtD5Kg=.d9833d73-f579-4ddb-bb3e-e7a9ce0743d0@github.com> On Tue, 6 Jan 2026 17:35:56 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 110: > >> 108: } >> 109: >> 110: uint dummy = 0; > > Don't call this "dummy". Call it regions_ready_for_refresh. Remember the value and pass it in as a new argument to attempt_allocation_slow() so that we don't have to recompute it later. The values from fast path won't be used anyway, that why I called it dummy. attempt_allocation_slow has to recompute it after acquiring heap lock. Imaging that have have two mutators, 8 shared alloc regions, both try to allocate the same time: 1. Both threads tried the fast path (attempt_allocation_in_alloc_regions) and failed, both see 8 alloc regions are ready to retire. 2. Both threads will call into attempt_allocation_slow 3. The first thread acquired heap lock refresh all the 8 alloc regions and allocate in one of the region. the thread release heap lock, 4. The 2nd thread acquires heap lock successfully after 1st thread released it, now the regions_ready_for_refresh it saw in fast path is stale and has to be recomputed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670164993 From kdnilsen at openjdk.org Wed Jan 7 21:54:47 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 21:54:47 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v25] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix confusing comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/026e34df..b064ecc5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=23-24 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Wed Jan 7 22:16:55 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 22:16:55 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v26] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - fix another typo - Fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/b064ecc5..a8520190 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=24-25 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Wed Jan 7 22:16:59 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 22:16:59 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v23] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 19:58:58 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 78 commits: > > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves > - Fix whitespace and comment > - Remove note to self > - Slight expansion of promo reserve > - Remove bad assertion > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves-restart-gh > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves > - fix unsigned arithmetic underflow > - Attempt fix for assertion failures > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves > - ... and 68 more: https://git.openjdk.org/jdk/compare/dd20e915...9aa4a3e2 src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 183: > 181: // Call the subclasses to add young-gen regions into the collection set. > 182: choose_collection_set_from_regiondata(collection_set, candidates, cand_idx, immediate_garbage + free); > 183: The general idea here is to see give young-gen first dibs at its reserves. But if young does not consumes its reserves, we'll see if we can repurpose some of those reserves to expand our old-gen evacuation efforts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670067635 From kdnilsen at openjdk.org Wed Jan 7 22:17:01 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 22:17:01 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v26] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 22:14:21 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - fix another typo > - Fix typo src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 213: > 211: // Entire region will be promoted, This region does not impact young-gen or old-gen evacuation reserve. > 212: // This region has been pre-selected and its impact on promotion reserve is already accounted for. > 213: I think this comment is obsolete. The line of code that it describes was removed in a previous PR. IIRC, we used to increment cur_young_garbage by r->garbage() plus r->get_live_data_bytes(). src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 127: > 125: byte_size_in_proper_unit(old_evacuation_budget), proper_unit_for_byte_size(old_evacuation_budget), > 126: unprocessed_old_collection_candidates()); > 127: This code is now used twice for mixed evacuation cycle, so I bundled the code into add_old_regions_to_cset(). The first time is when we prime the collection set. This is called to place certain old-gen regions into the cset before we chose the young-gen regions that are going to be collected. The second time is when we top-off the old collection set. This happens after young-gen regions have been placed into the cset. If there is unused reserve from young generation, we consider repurposing those reserves for old and try to expand the old collection set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670076558 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670138961 From kdnilsen at openjdk.org Wed Jan 7 22:17:05 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 22:17:05 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v24] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 21:26:52 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Add comment src/hotspot/share/gc/shenandoah/heuristics/shenandoahGlobalHeuristics.cpp line 50: > 48: > 49: void ShenandoahGlobalHeuristics::choose_global_collection_set(ShenandoahCollectionSet* cset, > 50: const ShenandoahHeuristics::RegionData* data, The general idea here: For a global GC, our collection set is based on garbage-first heuristic across all of young and all of old. We combine our old and young reserves into a shared pool of reserves. We choose cset regions in garbage-first order. Our choices of which regions to evacuate cause us to dedicate reserves to either old or young. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 279: > 277: } > 278: > 279: // After an abbreviated cycle, we reclaim immediate garbage. Rebuild the freeset in order to establish With this PR, some apportionment of reserves is done before the idle span. And each idle span is preceded by a freeset rebuild. At the time of rebuild, we make use of information gleaned from recent GC activities to decide how to balance the old and young reserves, such as: 1. Are there candidates for mixed evacuation? 2. What is the potential for promotion? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1219: > 1217: } > 1218: > 1219: void ShenandoahFreeSet::move_unaffiliated_regions_from_collector_to_old_collector(ssize_t count) { This allows us to "share" from young reserve to old reserve. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2571: > 2569: } > 2570: > 2571: Before this PR, we only "have_evacuation_reserves" when we rebuild at start of evacuation. With this PR, we always have_evacuation_reserves. That's because at the start of idle span, we are already anticipating what sort of evacuation will take place during the next GC cycle ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670125957 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670155851 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670166464 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670192097 From xpeng at openjdk.org Wed Jan 7 22:17:34 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:17:34 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <0IfaybNazrHlrdcGhk-0080a-KqpdUwN_WXpYVe_fjc=.d68c5098-02b7-4a91-8ad3-4e8b310cba0b@github.com> References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> <4BVpGlI0mK12KTI6b6l1UoH26vEwytUP1vlYX1Z-UJQ=.374ea1dd-38d7-46ea-b6a5-a99453171059@github.com> <0IfaybNazrHlrdcGhk-0080a-KqpdUwN_WXpYVe_fjc=.d68c5098-02b7-4a91-8ad3-4e8b310cba0b@github.com> Message-ID: On Tue, 6 Jan 2026 17:34:06 GMT, Kelvin Nilsen wrote: >> But part of the reason for my confusion is that you are trying to do fast allocations while holding the heap lock! >> >> The reason we came into attempt_allocation_slow() is because we already failed to attempt_allocation_in_alloc_regions(). There's no need to call this a second time. You should have remembered regions_ready_for_refresh and passed this in as an argument to attempt_allocation_slow(). > > I'm not concerned that the count of regions_ready_for_refresh might be stale. If this count is getting incremented "during" our allocation, we will see this result soon enough. If multiple mutators fail fast-path allocation simultaneously, they will each acquire heap lock either way (existing implementation vs. new implementation that does not retry the allocation). Acquiring the heap lock is the "expensive" operation. If the first one refreshes allocation regions, then subsequent invocations will not find any regions to be refreshed. The concern is the not "this count is getting incremented "during" our allocation", it is the the case when it get decremented because other mutators may have already refreshed all alloc regions before current mutator getting heap lock, we have call attempt_allocation_in_alloc_regions again after successfully acquiring heap lock because of this. Same design can be also found in G1, Parallel and Serial GC's CAS allocator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670238256 From xpeng at openjdk.org Wed Jan 7 22:27:56 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:27:56 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 21:32:30 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 254: > >> 252: // Step 1: find out the alloc regions which are ready to refresh. >> 253: for (uint i = 0; i < _alloc_region_count; i++) { >> 254: ShenandoahAllocRegion* alloc_region = &_alloc_regions[i]; > > We've got the heap lock here. why does this need to be atomic? Comments in the code should make this clear. I believe AtomicAccess::load here is not needed, I'll remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670262454 From xpeng at openjdk.org Wed Jan 7 22:53:26 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:53:26 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v21] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: - Update code comments - Update assert message - Only use atomic allocation when allocate from shared alloc regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/cf13b7b5..61d86546 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=19-20 Stats: 33 lines in 5 files changed: 13 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Wed Jan 7 22:53:29 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:53:29 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 00:32:24 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 102: > >> 100: for (size_t i = 0; i < num_regions; i++) { >> 101: ShenandoahHeapRegion* region = heap->get_region(i); >> 102: assert(!region->is_active_alloc_region(), "Not expecting any active alloc region at the time"); > > Same suggestion here as with shenandoahGenerationalHeuristics.cpp. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670313055 From xpeng at openjdk.org Wed Jan 7 22:56:19 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:56:19 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Wed, 7 Jan 2026 19:37:13 GMT, Xiaolong Peng wrote: >> Would prefer not to use the atomic_allocate code here. If you want to reuse code, maybe you can refactor allocate_in with an template argument. >> >> I notice that this PR makes lots of ShenandoahHeapRegion variables volatile: _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs. That will cause less efficient code to be generated whenever we are accessing this data from "behind" the GC heap lock, which should be rare, I guess. >> >> I raised some concerns/issues about the race that happens when we move a region from the directly-allocatable set into the global-heap-lock-protected ShenandoahFreeSet partition. >> >> Here's the scenario that I'm concerned about: >> >> 1. A mutator obtains pointer to directly allocatable region R >> 2. A second mutator performs a refresh, moving region R out of directly allocatable set (for whatever reason) >> 3. Region R is now eligible to satisfy allocations from behind the global heap lock >> 4. Some third mutator thread acquires the heap lock and fetches top for region $ >> 5. The first mutator performs its allocation within the same region R, not recognizing a CAS conflict >> 6. This third mutator allocates from region R at top, without using CAS. So both mutators think they own the same object >> >> I think this is not a problem if the "only" reason at step 2 above that we move region R out of directly allocatable set is because R is ready to be retired. In that case, there will be no subsequent heap-locked allocations in regions R. However, I anticipate the day in not-too-distant future when we will want to refresh regions even when they are not ready to be retired. Specifically, as we move rebuild-freeset out of safepoints, we will want to refresh regions before we acquire heap-lock to do rebuild, with the goal of making sure there is sufficient directly allocatable memory available that no mutator will be stalled because it needs to allocate during the time that the heap remains locked for the rebuild operation. >> >> So I suppose that if we always use atomic_allocate() even for allocations that happen while holding the heap lock, we won't have this problem. If we decide to keep this architecture, there should be comments explaining why we are doing it this way. (I am not real happy that we have to "pay the cost" of CAS in addition to paying the cost of global heap lock, but I think these allocations should be very rare. It seems this would only come up if, for example, a mutator wanted to allocate ... > > I will update the PR and not use atomic version here, and also another place in refresh_alloc_regions. > > Having volatile_top and nonvolatile_top seems necessary, it will make the code more complicated w/o much performance benefits, with CAS allocator, most of alloc request will be handled by the atomic code path, in only few > cases we need non-atomic allocation: > * After reserving alloc regions from free set before storing to alloc region, it performs obj allocation if the alloc request has not been satisfied yet. > * After trying atomic allocation, refresh alloc regions fails, it will try to find a region in free set with enough space for the allocation request. > > Yes, all the _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs are volatile now, out of these fields, I believe I can maybe remove volatile for _age and _youth(?), but the update of the rest must be atomic because mutators will increase the values in the CAS allocation code path w/o heap lock. I have updated the method `atomic_allocate_in` with a template parameter ATOMIC, now only when allocating from shared alloc regions the ATOMIC parameter is true to use atomic operations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670323493 From xpeng at openjdk.org Wed Jan 7 22:59:04 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:59:04 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v22] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: No need to use Atomic::load to read shared alloc region in refresh_alloc_regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/61d86546..f5038a3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=20-21 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Wed Jan 7 23:12:41 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 23:12:41 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Tue, 6 Jan 2026 17:37:46 GMT, Kelvin Nilsen wrote: >> I'll add comments on this, _alloc_region_count == 0 means we don't want to use any shared alloc region, it will always allocate with a heap lock, ideally the performance should be same as before, so it always simply find a region with enough space and allocate in the region. > > Put the comments describing functions in the .hpp file, where they are currently. But we need to enhance those comments. I have added comments on those functions, I'll keep adding more for those missing comments; meanwhile I am trying to avoid excessive comment, pleas point out if any of the comments is not clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670355339 From xpeng at openjdk.org Wed Jan 7 23:16:00 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 23:16:00 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v23] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: No need to use Atomic::load to read shared alloc region in release_alloc_regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/f5038a3a..917dd8a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=21-22 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Wed Jan 7 23:19:28 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 23:19:28 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: <_vhMVxLnvCdIHO_CJ8kaI3cLNKJSSYvqK7n_wriVhDk=.2d1649e4-512d-4682-842b-29541423b458@github.com> On Tue, 6 Jan 2026 01:55:58 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 338: > >> 336: for (uint i = 0; i < _alloc_region_count; i++) { >> 337: ShenandoahAllocRegion& alloc_region = _alloc_regions[i]; >> 338: ShenandoahHeapRegion* r = AtomicAccess::load(&alloc_region.address); > > We've got heap lock and at safepoint. Do not need AtomicAccess here. That is more costly than necessary. I prefer to use regular fetch. If you prefer to keep AtomicAccess, please provide a comment in the code explaining why and we will revist. The atomic load is not needed, I'll removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670368014 From kdnilsen at openjdk.org Thu Jan 8 00:14:17 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 00:14:17 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v3] In-Reply-To: References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: On Mon, 5 Jan 2026 17:13:08 GMT, William Kemper wrote: >> This PR attempts to simplify the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Fix typo in assertion message > - Take regulator thread out of STS before requesting GC > > The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. > - Add comments > - Revert back to what should be on this branch > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Don't know how this file got deleted > - Carry over gc cancellation to gc request > - Do not let allocation failure requests be overwritten by other requests > - Fix degen point handling > - ... and 3 more: https://git.openjdk.org/jdk/compare/4458cab4...8f4f55db Thanks for talking us through this PR. Lots of subtle issues here. Looks good to me. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28932#pullrequestreview-3637243572 From xpeng at openjdk.org Thu Jan 8 00:26:03 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 8 Jan 2026 00:26:03 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v24] In-Reply-To: References: Message-ID: <1qqqdCXoW9PWw_ERccC7zh6kMPBJyKHp9wprAEqbMgM=.24431e29-e46f-4a8b-ade6-d27506432169@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 271 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - No need to use Atomic::load to read shared alloc region in release_alloc_regions - No need to use Atomic::load to read shared alloc region in refresh_alloc_regions - Update code comments - Update assert message - Only use atomic allocation when allocate from shared alloc regions - Merge branch 'openjdk:master' into cas-alloc-1 - Fix build error after merging from tip - Merge branch 'master' into cas-alloc-1 - Merge branch 'master' into cas-alloc-1 - ... and 261 more: https://git.openjdk.org/jdk/compare/9a944e55...ef10341f ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=23 Stats: 1656 lines in 25 files changed: 1308 ins; 235 del; 113 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From wkemper at openjdk.org Thu Jan 8 00:47:42 2026 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Jan 2026 00:47:42 GMT Subject: RFR: Merge openjdk/jdk21u:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.10+6 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/231/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/231/files/2e594b6c..2e594b6c Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=231&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=231&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/231.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/231/head:pull/231 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/231 From wkemper at openjdk.org Thu Jan 8 00:48:58 2026 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Jan 2026 00:48:58 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v25] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 21:54:47 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix confusing comment Just posting my comments for today, more to follow. Also, this will conflict mightily with https://github.com/openjdk/jdk/pull/27632. Though I think using the age census to estimate promotion reserves is conceptually compatible with this PR. src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 178: > 176: bool need_to_finalize_mixed = false; > 177: if (_generation->is_young()) { > 178: need_to_finalize_mixed = heap->old_generation()->heuristics()->prime_collection_set(collection_set); We could push this logic for young collections down into `ShenandoahYoungHeuristics::choose_collection_set_from_regiondata` where `_generation` is always `ShenandoahYoungGeneration`. src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 342: > 340: } > 341: > 342: bool ShenandoahOldHeuristics::top_off_collection_set(ssize_t &add_regions_to_old) { Does `add_regions_to_old` really need to be signed? Seems like it will always be non-negative here. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 282: > 280: // reserves for the next GC cycle. > 281: assert(_abbreviated, "Only rebuild free set for abbreviated and old-marking cycles"); > 282: heap->rebuild_free_set(true /*concurrent*/); Should we move this up in the sequence? If promote in place fails we'd go to a degenerated cycle. After a cursory review of the degenerated cycle, it looks like it only rebuilds the freeset when evacuations are performed. Seems like rebuidling the freeset earlier before checking for cancellation might reduce the chance of a degenerated cycle and also guarantee the freeset is rebuilt. Would it make more sense to do this in `early_cleanup`? ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25357#pullrequestreview-3637002268 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670238461 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670502843 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670279391 From wkemper at openjdk.org Thu Jan 8 00:49:00 2026 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Jan 2026 00:49:00 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v26] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 22:16:55 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - fix another typo > - Fix typo src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 165: > 163: } > 164: > 165: assert(heap->is_concurrent_weak_root_in_progress(), "Must be doing weak roots now"); Was there a reason to remove this `assert`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670259539 From wkemper at openjdk.org Thu Jan 8 00:50:23 2026 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Jan 2026 00:50:23 GMT Subject: Integrated: Merge openjdk/jdk21u:master In-Reply-To: References: Message-ID: On Thu, 25 Dec 2025 14:24:27 GMT, William Kemper wrote: > Merges tag jdk-21.0.10+6 This pull request has now been integrated. Changeset: 02bb7604 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/02bb7604bb84b2aec47069148f0d64931b3f9743 Stats: 660 lines in 23 files changed: 297 ins; 224 del; 139 mod Merge ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/231 From duke at openjdk.org Thu Jan 8 04:46:41 2026 From: duke at openjdk.org (Harshit470250) Date: Thu, 8 Jan 2026 04:46:41 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v4] In-Reply-To: References: Message-ID: > This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. Harshit470250 has updated the pull request incrementally with three additional commits since the last revision: - move make_clone to barrierSetC2 - move make_clone to barrier_stubc2.hpp - move clone_type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27279/files - new: https://git.openjdk.org/jdk/pull/27279/files/4dfa36ca..630e4be0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27279&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27279&range=02-03 Stats: 52 lines in 4 files changed: 24 ins; 25 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27279/head:pull/27279 PR: https://git.openjdk.org/jdk/pull/27279 From kdnilsen at openjdk.org Thu Jan 8 15:54:04 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 15:54:04 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v26] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 22:16:55 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - fix another typo > - Fix typo > Just posting my comments for today, more to follow. Also, this will conflict mightily with #27632. Though I think using the age census to estimate promotion reserves is conceptually compatible with this PR. Agree that these ideas are "conceptually compatible". We'll need to resolve some conflicts between the two efforts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25357#issuecomment-3724495487 From kdnilsen at openjdk.org Thu Jan 8 16:02:47 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 16:02:47 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v25] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 22:14:15 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix confusing comment > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 178: > >> 176: bool need_to_finalize_mixed = false; >> 177: if (_generation->is_young()) { >> 178: need_to_finalize_mixed = heap->old_generation()->heuristics()->prime_collection_set(collection_set); > > We could push this logic for young collections down into `ShenandoahYoungHeuristics::choose_collection_set_from_regiondata` where `_generation` is always `ShenandoahYoungGeneration`. Good catch. Thanks for this suggestion. Much cleaner. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2672932077 From kdnilsen at openjdk.org Thu Jan 8 17:41:21 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 17:41:21 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v27] In-Reply-To: References: Message-ID: <3iF-Ny42_W-rxUDNFL7LVK4HtcRS8Hf3TiGbYWoWwOo=.55a4093f-c022-410c-8c9f-c0b270bdd194@github.com> > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Move special handling into ShenandoahYoungHeuristics::choose_collection_set_from_regiondata() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/a8520190..1002fb56 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=25-26 Stats: 100 lines in 22 files changed: 24 ins; 17 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Thu Jan 8 17:52:41 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 17:52:41 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v27] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 22:23:05 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Move special handling into ShenandoahYoungHeuristics::choose_collection_set_from_regiondata() > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 165: > >> 163: } >> 164: >> 165: assert(heap->is_concurrent_weak_root_in_progress(), "Must be doing weak roots now"); > > Was there a reason to remove this `assert`? May have been an accident. I'll put it back in and see what happens. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2673304717 From kdnilsen at openjdk.org Thu Jan 8 17:55:26 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 17:55:26 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v25] In-Reply-To: References: Message-ID: <2eDa-lXiPCUSmVvqX0WSUyuuo9rhH6R8JAzfZhefJeI=.68bb0618-2007-4832-937f-ff999917b941@github.com> On Thu, 8 Jan 2026 00:42:47 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix confusing comment > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 342: > >> 340: } >> 341: >> 342: bool ShenandoahOldHeuristics::top_off_collection_set(ssize_t &add_regions_to_old) { > > Does `add_regions_to_old` really need to be signed? Seems like it will always be non-negative here. Good point. I'm changing to unsigned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2673318439 From xpeng at openjdk.org Thu Jan 8 17:59:20 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 8 Jan 2026 17:59:20 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v25] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Invalid assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/ef10341f..47d850d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=23-24 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Thu Jan 8 19:42:13 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 8 Jan 2026 19:42:13 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Wed, 7 Jan 2026 00:03:15 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 423: > >> 421: _yield_to_safepoint = false; >> 422: } >> 423: > > I suppose ShenandoahCollectorAllocator::randomize_start_index() might be a no-op. On the other hand, it would probably be better to use a random index for ShenandoahCollectorAllocator as well. We don't want to hobble one GC worker more than the others just because its preferred start index happens to hold a retire-ready region. The code you put comments on is ShenandoahOldCollectorAllocator, which delegate the allocation to ShenandoahFreeSet in the PR, 0u in passed to ShenandoahAllocator constructor is the number of shared alloc regions, which should be 0. I'll work on ShenandoahOldCollectorAllocator to make it support CAS as well. The start index of ShenandoahCollectorAllocator is slightly different from ShenandoahMutatorAllocator, for GC worker we can always use `worker_id % _alloc_region_count`, but there is some corner cases we need to consider, e.g. Java thread may call into ShenandoahCollectorAllocator from load barrier code. Currently for non worker threads, the start index is always 0, I think this is the part we can change it to random. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2673640072 From xpeng at openjdk.org Thu Jan 8 19:42:15 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 8 Jan 2026 19:42:15 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: <8R0cFKoikfdToCQl3attk5YODAjTt3IExowjM8YqnNk=.04470378-016f-474d-9529-7e35b976b154@github.com> On Wed, 7 Jan 2026 14:51:49 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 428: >> >>> 426: } >>> 427: >>> 428: HeapWord* ShenandoahOldCollectorAllocator::allocate(ShenandoahAllocRequest& req, bool& in_new_region) { >> >> Confer with William Kemper about this. He is working on a change that may simplify the handling of PLABs, in which case ShenandoahOldCollectorAllocator can behave the same as ShenandoahCollector. > > Alternatively, I don't think it would be too terribly difficult to implement try_allocate_aligned() function to support fast (CAS) allocation of aligned PLABs. It is doable and not very difficult. I'll work on it in a following PR to make ShenandoahOldCollectorAllocator support CAS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2673645194 From xpeng at openjdk.org Thu Jan 8 20:01:29 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 8 Jan 2026 20:01:29 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 01:21:21 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 79: > >> 77: int refresh_alloc_regions(ShenandoahAllocRequest* req = nullptr, bool* in_new_region = nullptr, HeapWord** obj = nullptr); >> 78: #ifdef ASSERT >> 79: virtual void verify(ShenandoahAllocRequest& req) { } > > Need a comment to explain what verify does. Is this simply checking to make sure the req is "properly formatted"? I think the intention is to enforce that req affiliation corresponds to ALLOC_PARTITION. Would be good to clarify this in the comment. > > Do we need this to be virtual? It seems like a single templated implementation would suffice. Yes, it can be templated implementation, I'll update the implementation to use template. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2673695164 From kdnilsen at openjdk.org Thu Jan 8 20:02:08 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 20:02:08 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v28] In-Reply-To: References: Message-ID: <8sbofUK4dq_bIjHCI_XbpwPqJ8JePmr-b9REQnis5tA=.80b63d85-fe38-4c08-9c9c-e98ff262a557@github.com> > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Move rebuild free set earlier in an abbreviated GC cycle - Restore deleted assert statement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/1002fb56..2013599d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=26-27 Stats: 12 lines in 1 file changed: 8 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From xpeng at openjdk.org Thu Jan 8 20:41:47 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 8 Jan 2026 20:41:47 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 02:04:04 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 364: > >> 362: _free_set->partitions()->decrease_used(ALLOC_PARTITION, total_free_bytes); >> 363: _free_set->partitions()->increase_region_counts(ALLOC_PARTITION, total_regions_to_unretire); >> 364: accounting_updater._need_update = true; > > Here is where you know which tallies have been affected by this operation. This is where you should specialize the calls to freeset recompute_total_used() and recompute_total_affiliated(). Either call those from here, or add parameters to your accounting_updater object so that you do not have to overcompute each operation. This is not the only place, there are many other places in the allocation code path as well, I can specialize the calls to recompute_total_used and recompute_total_affiliated in the places, but that will make the code necessary rambling, I'd rather to solve this issue and simplify it in https://bugs.openjdk.org/browse/JDK-8373371. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2673820942 From kdnilsen at openjdk.org Thu Jan 8 20:49:13 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 20:49:13 GMT Subject: Integrated: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 21:53:02 GMT, Kelvin Nilsen wrote: > Add a triggering penalty when we execute degenerated GC cycle. This pull request has now been integrated. Changeset: 385c4f81 Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/385c4f8180d30c0e41b848eb4b2c1c8788211422 Stats: 12 lines in 7 files changed: 4 ins; 0 del; 8 mod 8373714: Shenandoah: Register heuristic penalties following a degenerated GC Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/28834 From kdnilsen at openjdk.org Thu Jan 8 22:53:43 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 22:53:43 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v16] In-Reply-To: References: Message-ID: <16mCKOBudyw5oGH_yFiwBPCBwsJrjugRyTWDmuA8Q2g=.ee646318-ba2b-4599-8a46-acdc41e5aa78@github.com> > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 62 commits: - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates - touch file to force retest - Finish merge - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates - Fix mistaken merge resolution - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates The resulting fastdebug build has 64 failures. I need to debug these. Probably introduced by improper resolution of merge conflicts - fix error in merge conflict resolution - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates - rework CompressedClassSpaceSizeinJmapHeap.java - fix errors in CompressedClassSpaceSizeInJmapHeap.java - ... and 52 more: https://git.openjdk.org/jdk/compare/385c4f81...6d10ae5a ------------- Changes: https://git.openjdk.org/jdk/pull/24319/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=15 Stats: 282 lines in 32 files changed: 109 ins; 31 del; 142 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From wkemper at openjdk.org Thu Jan 8 22:56:32 2026 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Jan 2026 22:56:32 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 20:56:51 GMT, Kelvin Nilsen wrote: >> After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: >> >> 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. >> 2. Sample allocation rates more frequently than once every 100 ms. >> 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. >> 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: > > - Fix comment > - Use PROPERFMT macros > - Simplify code flow: reviewer suggestion > - Merge remote-tracking branch 'jdk/master' into accelerated-triggers > - Remove develop/debug instrumentation > - add another override > - Change type of command-line args > - fix white space > - Add override to virtual methods > - Fix race between allocation reporting and querying > - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 134: > 132: if (_is_generational) { > 133: _regulator_thread = ShenandoahGenerationalHeap::heap()->regulator_thread(); > 134: size_t young_available = ShenandoahGenerationalHeap::heap()->young_generation()->max_capacity() - Consider pushing this down into `ShenandoahGenerationalHeuristics` src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 145: > 143: } > 144: > 145: double ShenandoahAdaptiveHeuristics::get_most_recent_wake_time() const { This introduces a cyclic dependency between control/regulator threads and the heuristics. Since control/regulator threads already _know_ about heuristics, could we instead have the threads invoke setters on the heuristics to provide these values? src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 192: > 190: > 191: void ShenandoahAdaptiveHeuristics::resume_idle_span() { > 192: size_t mutator_available = _free_set->capacity() - _free_set->used(); This is a little confusing to me. Isn't `available` defined as `capacity - used`? Why do we not use `available` here? src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 715: > 713: > 714: if (ShenandoahHeuristics::should_start_gc()) { > 715: // ShenandoahHeuristics::should_start_gc() has accepted trigger, or declined it. return ShenandoahHeuristics::should_start_gc(); src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.hpp line 73: > 71: bool is_spiking(double rate, double threshold) const; > 72: > 73: double interval() const { Not seeing where these new methods are used. src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 316: > 314: if (progress) { > 315: heap->notify_gc_progress(); > 316: heap->shenandoah_policy()->record_success_degenerated(_generation->is_young(), _abbreviated); On line 313 above here, we call `policy->record_degenerated` which does everything (and more) that `record_success_degenerated` does. Calling both of them here will increment the various counters twice and is probably not what we want. I think after https://github.com/openjdk/jdk/pull/28834, we shouldn't need `record_success_degenerated` for `ShenandoahCollectorPolicy` at all. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 493: > 491: ShenandoahCodeRoots::initialize(); > 492: > 493: // Initialization of controller markes use of varaibles esstablished by initialize_heuristics. Suggestion: // Initialization of controller makes use of variables established by initialize_heuristics. ------------- PR Review: https://git.openjdk.org/jdk/pull/29039#pullrequestreview-3641395012 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2673966020 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2673969760 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2674066185 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2674110503 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2674122093 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2674156144 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2674168920 From xpeng at openjdk.org Thu Jan 8 23:54:32 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 8 Jan 2026 23:54:32 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v26] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 273 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - Invalid assert - Merge branch 'openjdk:master' into cas-alloc-1 - No need to use Atomic::load to read shared alloc region in release_alloc_regions - No need to use Atomic::load to read shared alloc region in refresh_alloc_regions - Update code comments - Update assert message - Only use atomic allocation when allocate from shared alloc regions - Merge branch 'openjdk:master' into cas-alloc-1 - Fix build error after merging from tip - ... and 263 more: https://git.openjdk.org/jdk/compare/385c4f81...0b04841f ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=25 Stats: 1656 lines in 25 files changed: 1308 ins; 235 del; 113 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 9 02:03:23 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 9 Jan 2026 02:03:23 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v27] In-Reply-To: References: Message-ID: <70wd2_KAjCsqW4zFSUowSWL6ajQoRTntUcDkk5NqJmA=.90dfd7dc-3021-4200-b046-26a4f138d712@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: - Merge branch 'cas-alloc-1' of https://github.com/pengxiaolong/jdk into cas-alloc-1 - Use template parameter for ShenandoahAllocator::verify - Avoid AtomicAccess::load in attempt_allocation_in_alloc_regions when called from attempt_allocation_slow ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/0b04841f..4d5722a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=25-26 Stats: 49 lines in 2 files changed: 18 ins; 25 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 9 05:42:49 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 9 Jan 2026 05:42:49 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v28] In-Reply-To: References: Message-ID: <9GEqwfh0vWkOCxSOcMcslcvsFG-LVeD9bDaEyWqS7j8=.fc40f588-0df9-4928-b8c3-a3dea1346217@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: add include header shenandoahAllocRequest.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/4d5722a3..f299e165 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=26-27 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 9 06:02:02 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 9 Jan 2026 06:02:02 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v29] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix header file order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/f299e165..dbee5568 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=27-28 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 9 06:36:29 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 9 Jan 2026 06:36:29 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v30] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix include error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/dbee5568..6a8297ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=28-29 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 9 08:09:25 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 9 Jan 2026 08:09:25 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v31] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: 1. Allow to reserve alloc region with free space >= PLAB::min_size() 2. Other minor fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/6a8297ac..7e0a1244 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=29-30 Stats: 11 lines in 1 file changed: 1 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From sjohanss at openjdk.org Fri Jan 9 08:47:01 2026 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 Jan 2026 08:47:01 GMT Subject: RFR: 8367993: G1: Speed up ConcurrentMark initialization [v3] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 12:58:43 GMT, Leo Korinth wrote: >> This change moves almost all of the ConcurrentMark initialisation from its constructor to the method `G1ConcurrentMark::fully_initialize()`. Thus, creation time of the VM can be slightly improved by postponing creation of ConcurrentMark. Most time is saved postponing creation of statistics buffers and threads. >> >> It is not obvious that this is the best solution. I have earlier experimented with lazily allocating statistics buffers _only_. One could also initialise a little bit more eagerly (for example the concurrent mark thread) and maybe get a slightly cleaner change. However IMO it seems better to not have ConcurrentMark "half initiated" with a created mark thread, but un-initialised worker threads. >> >> This change is depending on the integration of https://bugs.openjdk.org/browse/JDK-8373253. >> >> I will be out for vacation, and will be back after new year (and will not answer questions during that time), but I thought I get the pull request out now so that you can have a look. > > Leo Korinth has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 564 commits: > > - Merge branch '8373253' into 8367993 > - Merge branch 'master' into _8373253 > - Merge branch 'master' into _8367993 > - 8366058: Outdated comment in WinCAPISeedGenerator > > Reviewed-by: mullan > - 8357258: x86: Improve receiver type profiling reliability > > Reviewed-by: kvn, vlivanov > - 8373704: Improve "SocketException: Protocol family unavailable" message > > Reviewed-by: lucy, jpai > - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently > > Reviewed-by: jiefu, jbhateja, erfang, qamai > - 8343809: Add requires tag to mark tests that are incompatible with exploded image > > Reviewed-by: alanb, dholmes > - 8374465: Spurious dot in documentation for JVMTI ClassLoad > > Reviewed-by: kbarrett > - 8374317: Change GCM IV size to 12 bytes when encrypting/decrypting TLS session ticket > > Reviewed-by: djelinski, mpowers, ascarpino > - ... and 554 more: https://git.openjdk.org/jdk/compare/2aa8aa4b...28ccbb68 Thanks for looking into this Leo. Overall I think it looks good, just some small questions and suggestions. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1637: > 1635: > 1636: bool G1CollectedHeap::concurrent_mark_is_terminating() const { > 1637: assert(_cm != nullptr, "thread must exist in order to check if mark is terminating"); I think it would make sense to add `&& _cm->is_fully_initialized()` to really make sure the thread has been created. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2427: > 2425: if (_cm->is_fully_initialized()) { > 2426: tc->do_thread(_cm->cm_thread()); > 2427: } Since the _cm_thread is now in `G1ConcurrentMark` this should be handled in `G1ConcurrentMark::threads_do()` src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2549: > 2547: void G1CollectedHeap::start_concurrent_cycle(bool concurrent_operation_is_full_mark) { > 2548: assert(!_cm->in_progress(), "Can not start concurrent operation while in progress"); > 2549: assert(_cm->is_fully_initialized(), "sanity"); Not sure this sanity assert is needed `_cm->in_progress()` will always return `false` if not fully initialized, so the above assert will cover this. If we still want it, I think it should be moved above the `in_progress()` assert. src/hotspot/share/gc/g1/g1PeriodicGCTask.cpp line 46: > 44: return false; > 45: } > 46: Why is this needed? The initial young collection will make sure concurrent marking gets initialized, right? src/hotspot/share/gc/g1/g1Policy.cpp line 744: > 742: if (!_g1h->concurrent_mark()->is_fully_initialized()) { > 743: return false; > 744: } Is this needed? The `in_progress()` check below makes sure to only check the cm_thread when fully initialized. src/hotspot/share/gc/g1/g1YoungCollector.cpp line 1127: > 1125: > 1126: void G1YoungCollector::collect() { > 1127: _g1h->_cm->fully_initialize(); I think it would make more sense to do this in `G1CollectedHeap::do_collection_pause_at_safepoint()`. There we check if we should start concurrent mark, so maybe the initialization could be done only if we are about to start concurrent mark. If we can do the initialization after the actual young collection, then we could maybe even move the initialization into `G1CollectedHeap::start_concurrent_cycle(...)` ------------- Changes requested by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28723#pullrequestreview-3639436840 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2672366755 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2675276733 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2675291347 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2675313622 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2675328503 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2675249630 From stefank at openjdk.org Fri Jan 9 12:09:22 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 9 Jan 2026 12:09:22 GMT Subject: RFR: 8367993: G1: Speed up ConcurrentMark initialization [v2] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 12:33:41 GMT, Leo Korinth wrote: >> Leo Korinth has updated the pull request incrementally with 561 additional commits since the last revision: >> >> - Merge branch 'master' into _8367993 >> - 8366058: Outdated comment in WinCAPISeedGenerator >> >> Reviewed-by: mullan >> - 8357258: x86: Improve receiver type profiling reliability >> >> Reviewed-by: kvn, vlivanov >> - 8373704: Improve "SocketException: Protocol family unavailable" message >> >> Reviewed-by: lucy, jpai >> - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently >> >> Reviewed-by: jiefu, jbhateja, erfang, qamai >> - 8343809: Add requires tag to mark tests that are incompatible with exploded image >> >> Reviewed-by: alanb, dholmes >> - 8374465: Spurious dot in documentation for JVMTI ClassLoad >> >> Reviewed-by: kbarrett >> - 8374317: Change GCM IV size to 12 bytes when encrypting/decrypting TLS session ticket >> >> Reviewed-by: djelinski, mpowers, ascarpino >> - 8374444: Fix simple -Wzero-as-null-pointer-constant warnings >> >> Reviewed-by: aboldtch >> - 8373847: Test javax/swing/JMenuItem/MenuItemTest/bug6197830.java failed because The test case automatically fails when clicking any items in the ?Nothing? menu in all four windows (Left-to-right)-Menu Item Test and (Right-to-left)-Menu Item Test >> >> Reviewed-by: serb, aivanov, dnguyen >> - ... and 551 more: https://git.openjdk.org/jdk/compare/b907b295...0ece3767 > > I will redo the merge, I have done something strange. @lkorinth Something went wrong with your merge and now there's a bunch of unrelated labels, which results in updates being sent to misc mailing lists that has no interest in this PR. Could you remove all those labels? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28723#issuecomment-3728642315 From wkemper at openjdk.org Fri Jan 9 17:52:35 2026 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Jan 2026 17:52:35 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v8] In-Reply-To: References: Message-ID: > The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. > > When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). > > To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. > > This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Heal discovered lists for any young collection coincides with old marking - Configure thread local mark closure on delegated old reference processor - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Fix idiosyncratic white space in whitebox Co-authored-by: Stefan Karlsson - Sort includes - Heal old discovered lists in parallel - Fix comment - Factor duplicate code into shared method - ... and 13 more: https://git.openjdk.org/jdk/compare/f5fa9e40...abccb8b6 ------------- Changes: https://git.openjdk.org/jdk/pull/28810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=07 Stats: 669 lines in 20 files changed: 537 ins; 84 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From wkemper at openjdk.org Fri Jan 9 17:52:53 2026 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Jan 2026 17:52:53 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v4] In-Reply-To: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: > This PR attempts to simplify the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash - Fix typo in assertion message - Take regulator thread out of STS before requesting GC The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. - Add comments - Revert back to what should be on this branch - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash - Don't know how this file got deleted - Carry over gc cancellation to gc request - Do not let allocation failure requests be overwritten by other requests - ... and 4 more: https://git.openjdk.org/jdk/compare/f5fa9e40...2e57f0ac ------------- Changes: https://git.openjdk.org/jdk/pull/28932/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28932&range=03 Stats: 95 lines in 4 files changed: 45 ins; 17 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/28932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28932/head:pull/28932 PR: https://git.openjdk.org/jdk/pull/28932 From xpeng at openjdk.org Fri Jan 9 19:06:12 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 9 Jan 2026 19:06:12 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v32] In-Reply-To: References: Message-ID: <91iJNkRmHyoVJi79n80-lc7Im_iYvLohWaK1ZnRPZy8=.65e43e65-ee59-46d0-8af4-3fb47951a3ef@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Remove necessary atomic load - Add _epoch_id to ShenandoahAllocator to trace the update of shared alloc regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/7e0a1244..5e8f5998 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=30-31 Stats: 43 lines in 3 files changed: 19 ins; 4 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 9 19:15:59 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 9 Jan 2026 19:15:59 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v33] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add comments for public methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/5e8f5998..5646d5b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=31-32 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 9 19:16:02 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 9 Jan 2026 19:16:02 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Thu, 8 Jan 2026 19:58:25 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 79: >> >>> 77: int refresh_alloc_regions(ShenandoahAllocRequest* req = nullptr, bool* in_new_region = nullptr, HeapWord** obj = nullptr); >>> 78: #ifdef ASSERT >>> 79: virtual void verify(ShenandoahAllocRequest& req) { } >> >> Need a comment to explain what verify does. Is this simply checking to make sure the req is "properly formatted"? I think the intention is to enforce that req affiliation corresponds to ALLOC_PARTITION. Would be good to clarify this in the comment. >> >> Do we need this to be virtual? It seems like a single templated implementation would suffice. > > Yes, it can be templated implementation, I'll update the implementation to use template. verify method use template parameter now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2677338346 From wkemper at openjdk.org Fri Jan 9 19:22:26 2026 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Jan 2026 19:22:26 GMT Subject: RFR: 8351892: GenShen: Remove vestigial young generation sizing options Message-ID: GenShen generally tries to keep the young generation as large as possible. The options `ShenandoahMinYoungPercentage` and `ShenandoahMaxYoungPercentage` are no longer used. ------------- Commit messages: - Remove vestigial young gen sizing options Changes: https://git.openjdk.org/jdk/pull/29144/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29144&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351892 Stats: 23 lines in 2 files changed: 0 ins; 23 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29144/head:pull/29144 PR: https://git.openjdk.org/jdk/pull/29144 From xpeng at openjdk.org Fri Jan 9 19:28:21 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 9 Jan 2026 19:28:21 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 01:28:42 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 376: > >> 374: } >> 375: >> 376: THREAD_LOCAL uint ShenandoahMutatorAllocator::_alloc_start_index = UINT_MAX; > > I raised questions about this in a previous review. Have I overlooked your response? What is the tradeoff between declaring this THREAD_LOCAL vs. creating a new field in ShenandoahThreadLocal? I believe we need to use fields of ShenandoahThreadLocal so that we do not incur an overhead on all threads when JVM is not configured for Shenandoah GC. I don't have concern to move it to ShenandoahThreadLocalData, the benefit of using THREAD_LOCAL is just better cohesive code because _alloc_start_index is defined in the same namespace where it is used. Performance wise, I don't think there is much benefits. I do see ZGC also use THREAD_LOCAL directly, I guess the overhead on all threads is not a huge concern. But given Shenandoah has ShenandoahThreadLocalData to manage all the thread locals, it make sense to not use THREAD_LOCAL directly, I'll update the PR to move _alloc_start_index to ShenandoahThreadLocalData. > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 436: > >> 434: // Make sure the old generation has room for either evacuations or promotions before trying to allocate. >> 435: auto old_gen = ShenandoahHeap::heap()->old_generation(); >> 436: if (req.is_old() && !old_gen->can_allocate(req)) { > > This test for req.is_old() appears to be unnecessary. The verify(req) assert above requires that req.is_old(). > > Perhaps the verify() method is too abstract. Add a comment there that says: "Confirm that req.is_old()" Thanks for pointing out this, it is not necessary, I have removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2677369816 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2677373200 From xpeng at openjdk.org Fri Jan 9 20:01:38 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 9 Jan 2026 20:01:38 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Fri, 9 Jan 2026 19:24:44 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 376: >> >>> 374: } >>> 375: >>> 376: THREAD_LOCAL uint ShenandoahMutatorAllocator::_alloc_start_index = UINT_MAX; >> >> I raised questions about this in a previous review. Have I overlooked your response? What is the tradeoff between declaring this THREAD_LOCAL vs. creating a new field in ShenandoahThreadLocal? I believe we need to use fields of ShenandoahThreadLocal so that we do not incur an overhead on all threads when JVM is not configured for Shenandoah GC. > > I don't have concern to move it to ShenandoahThreadLocalData, the benefit of using THREAD_LOCAL is just better cohesive code because _alloc_start_index is defined in the same namespace where it is used. Performance wise, I don't think there is much benefits. > > I do see ZGC also use THREAD_LOCAL directly, I guess the overhead on all threads is not a huge concern. > But given Shenandoah has ShenandoahThreadLocalData to manage all the thread locals, it make sense to not use THREAD_LOCAL directly, I'll update the PR to move _alloc_start_index to ShenandoahThreadLocalData. While I am trying to use ShenandoahThreadLocalData, I realized that I need to add 2 alloc start index, one for ShenandoahMutatorAllocator, one for ShenandoahCollectorAllocator. Later when I update ShenandoahOldCollectorAllocator to use CAS allocation, one more will be added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2677462219 From duke at openjdk.org Fri Jan 9 21:04:01 2026 From: duke at openjdk.org (duke) Date: Fri, 9 Jan 2026 21:04:01 GMT Subject: Withdrawn: 8367320: Sort cpu/x86 includes In-Reply-To: References: Message-ID: On Wed, 10 Sep 2025 09:17:07 GMT, Francesco Andreuzzi wrote: > Sort includes in `cpu/x86` using `SortIncludes.java`. I'm also removing a couple unnecessary ones. > > Passes `tier1` and `tier2`. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/27189 From kdnilsen at openjdk.org Fri Jan 9 21:09:58 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 9 Jan 2026 21:09:58 GMT Subject: RFR: 8351892: GenShen: Remove vestigial young generation sizing options In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 19:13:31 GMT, William Kemper wrote: > GenShen generally tries to keep the young generation as large as possible. The options `ShenandoahMinYoungPercentage` and `ShenandoahMaxYoungPercentage` are no longer used. LGTM ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/29144#pullrequestreview-3645575675 From wkemper at openjdk.org Fri Jan 9 22:24:46 2026 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Jan 2026 22:24:46 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 20:56:51 GMT, Kelvin Nilsen wrote: >> After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: >> >> 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. >> 2. Sample allocation rates more frequently than once every 100 ms. >> 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. >> 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: > > - Fix comment > - Use PROPERFMT macros > - Simplify code flow: reviewer suggestion > - Merge remote-tracking branch 'jdk/master' into accelerated-triggers > - Remove develop/debug instrumentation > - add another override > - Change type of command-line args > - fix white space > - Add override to virtual methods > - Fix race between allocation reporting and querying > - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 643: > 641: future_accelerated_planned_gc_time * 1000); > 642: } else { > 643: log_trigger("Momentary spike consumption (%zu%s) exceeds free headroom (%zu%s) at " Should the 'Momentary spike' trigger replace the 'instantaneous spike' trigger? It seems like we now have two spike detecting triggers? ------------- PR Review: https://git.openjdk.org/jdk/pull/29039#pullrequestreview-3645745671 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2677779429 From wkemper at openjdk.org Fri Jan 9 23:51:48 2026 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Jan 2026 23:51:48 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild [v4] In-Reply-To: References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: <6z8QE0tr_8b3brwl6tfIjQC7J458m9zDIcblpcjs_gc=.ecc3c716-e082-45c1-a4a4-4723ab4bcbda@github.com> On Thu, 11 Dec 2025 23:18:18 GMT, Kelvin Nilsen wrote: >> This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. >> >> This addresses a problem that results if available memory is probed while we are rebuilding the freeset. >> >> Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Add rebuild synchronization to capacity() and used() Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27612#pullrequestreview-3645962594 From ysr at openjdk.org Sat Jan 10 00:17:42 2026 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 10 Jan 2026 00:17:42 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v4] In-Reply-To: References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: On Fri, 9 Jan 2026 17:52:53 GMT, William Kemper wrote: >> This PR simplifies the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Fix typo in assertion message > - Take regulator thread out of STS before requesting GC > > The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. > - Add comments > - Revert back to what should be on this branch > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Don't know how this file got deleted > - Carry over gc cancellation to gc request > - Do not let allocation failure requests be overwritten by other requests > - ... and 4 more: https://git.openjdk.org/jdk/compare/f5fa9e40...2e57f0ac I left a few comments and my uneasiness with the code as structured, mainly because I am not sure I understand the interaction protocol between the interacting threads clearly enough or the state being protected by the locks to be certain that this all works correctly. Piecemeal these all make sense, but I don't have a sufficient overall understanding to say confidently that this code looks good. However, since this has been shown to fix an existing production issue, I'll go ahead and approve it. I would really like this interaction protocol clearly written down and reasoned through to ensure that it's right, and to structure it in a manner that makes it easier to reason about and maintain. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 95: > 93: notify_gc_waiters(); > 94: notify_alloc_failure_waiters(); > 95: set_gc_mode(stopped); Will they "observe the shutdown" if mode isn't "stopped"? Should line 95 move before line 93? Also is `gc_mode`'s state transitions protected in any manner, so they are consistent with any other observable state from the standpoint of threads that may interact with the controller? I see for example that the RegulatorThread looks at the controller thread's gc_mode to make certain GC triggering decisions, but it doesn't look like the reading of the mode and the requesting of a GC are protected by a lock that prevents races/glitches. In general such interactions lead to an increase in the state-space of interactions that the parties need to deal with correctly. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 708: > 706: > 707: void ShenandoahGenerationalControlThread::notify_control_thread(MonitorLocker& ml, GCCause::Cause cause, ShenandoahGeneration* generation) { > 708: assert(_control_lock.is_locked(), "Request lock must be held here"); What is MonitorLocker, and what is `_control_lock`? Why are we checking `_control_lock` here? If ml is being passed here for the purposes of notification on it, it must be the case that it's locked and the assert at line 708 is a leakage of abstraction? I see that your problem here is that along one path you do nothing and and don't post a notification and along another you do some work and post a notification. It sounds like what you want instead is an API of the following shape: bool should_notify_control_thread(cause, generation) { ... }; and callers might do: MonitorLocker ml(...); if (should_notify_control_thread(cause, generation)) { ml.notify(); } ml.wait(); src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 721: > 719: } > 720: > 721: void ShenandoahGenerationalControlThread::notify_control_thread(GCCause::Cause cause) { Apropos my comment above, this code has a bad smell that some versions of the method expect the lock to be held, and other methods acquire the lock. It makes reasoning about the code at an abstract level very error-prone, and potentially difficult to maintain correctly over time. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28932#pullrequestreview-3645834579 PR Review Comment: https://git.openjdk.org/jdk/pull/28932#discussion_r2677970564 PR Review Comment: https://git.openjdk.org/jdk/pull/28932#discussion_r2677861000 PR Review Comment: https://git.openjdk.org/jdk/pull/28932#discussion_r2677882435 From ysr at openjdk.org Sat Jan 10 00:17:43 2026 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 10 Jan 2026 00:17:43 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v4] In-Reply-To: References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: On Fri, 9 Jan 2026 23:57:10 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash >> - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash >> - Fix typo in assertion message >> - Take regulator thread out of STS before requesting GC >> >> The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. >> - Add comments >> - Revert back to what should be on this branch >> - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash >> - Don't know how this file got deleted >> - Carry over gc cancellation to gc request >> - Do not let allocation failure requests be overwritten by other requests >> - ... and 4 more: https://git.openjdk.org/jdk/compare/f5fa9e40...2e57f0ac > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 95: > >> 93: notify_gc_waiters(); >> 94: notify_alloc_failure_waiters(); >> 95: set_gc_mode(stopped); > > Will they "observe the shutdown" if mode isn't "stopped"? Should line 95 move before line 93? Also is `gc_mode`'s state transitions protected in any manner, so they are consistent with any other observable state from the standpoint of threads that may interact with the controller? I see for example that the RegulatorThread looks at the controller thread's gc_mode to make certain GC triggering decisions, but it doesn't look like the reading of the mode and the requesting of a GC are protected by a lock that prevents races/glitches. In general such interactions lead to an increase in the state-space of interactions that the parties need to deal with correctly. The reason I am leaving this comment here is that further above (lines 85-88), and in many of the other mode changes we seem to take care to use the control lock to protect these transitions so that other parties may observe the right state. Perhaps that is not needed in some cases, but the plurality of different forms of update of state that is modifed and may be read by other threads in conjunction with other state leaves me feeling queasy about the possible cracks in the coordination surface that may open us up to trouble. > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 708: > >> 706: >> 707: void ShenandoahGenerationalControlThread::notify_control_thread(MonitorLocker& ml, GCCause::Cause cause, ShenandoahGeneration* generation) { >> 708: assert(_control_lock.is_locked(), "Request lock must be held here"); > > What is MonitorLocker, and what is `_control_lock`? Why are we checking `_control_lock` here? If ml is being passed here for the purposes of notification on it, it must be the case that it's locked and the assert at line 708 is a leakage of abstraction? I see that your problem here is that along one path you do nothing and and don't post a notification and along another you do some work and post a notification. It sounds like what you want instead is an API of the following shape: > > > bool should_notify_control_thread(cause, generation) { ... }; > > > and callers might do: > > > MonitorLocker ml(...); > if (should_notify_control_thread(cause, generation)) { > ml.notify(); > } > ml.wait(); In addition, all of the various flavours of `notify_control_thread()` with optional parameters should ideally call down into the version that has all of the parameters specified. In the cases where these parameters aren't specified, the version of the method fills default parameters. The fully-parameterized version of the method then contains and consolidates the entire logic and any invariant/assertion checking of the various parameters that make sense, and executes the relevant logic to return a boolean result to allow the caller to notify and/or wait on the monitor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28932#discussion_r2677974288 PR Review Comment: https://git.openjdk.org/jdk/pull/28932#discussion_r2677875997 From ysr at openjdk.org Sat Jan 10 00:17:44 2026 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 10 Jan 2026 00:17:44 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v4] In-Reply-To: References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: On Sat, 10 Jan 2026 00:00:46 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 95: >> >>> 93: notify_gc_waiters(); >>> 94: notify_alloc_failure_waiters(); >>> 95: set_gc_mode(stopped); >> >> Will they "observe the shutdown" if mode isn't "stopped"? Should line 95 move before line 93? Also is `gc_mode`'s state transitions protected in any manner, so they are consistent with any other observable state from the standpoint of threads that may interact with the controller? I see for example that the RegulatorThread looks at the controller thread's gc_mode to make certain GC triggering decisions, but it doesn't look like the reading of the mode and the requesting of a GC are protected by a lock that prevents races/glitches. In general such interactions lead to an increase in the state-space of interactions that the parties need to deal with correctly. > > The reason I am leaving this comment here is that further above (lines 85-88), and in many of the other mode changes we seem to take care to use the control lock to protect these transitions so that other parties may observe the right state. Perhaps that is not needed in some cases, but the plurality of different forms of update of state that is modifed and may be read by other threads in conjunction with other state leaves me feeling queasy about the possible cracks in the coordination surface that may open us up to trouble. (Note that the "mode" parameter used in the non-generational control thread is completely internal to the control thread and unlike in the case of the generational control thread here, never leaks out of that thread to be consulted by another thread asynchronously.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28932#discussion_r2677985424 From ysr at openjdk.org Sat Jan 10 01:00:22 2026 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 10 Jan 2026 01:00:22 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v8] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 17:52:35 GMT, William Kemper wrote: >> The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. >> >> When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). >> >> To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. >> >> This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Heal discovered lists for any young collection coincides with old marking > - Configure thread local mark closure on delegated old reference processor > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Fix idiosyncratic white space in whitebox > > Co-authored-by: Stefan Karlsson > - Sort includes > - Heal old discovered lists in parallel > - Fix comment > - Factor duplicate code into shared method > - ... and 13 more: https://git.openjdk.org/jdk/compare/f5fa9e40...abccb8b6 > When the young mark encounters a young reference with an old referent, it cannot discover it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). Naive question: The basic issue appears to be that the marking state of the referent may not be visible to the gc that is processing the reference when they are in different generations. If we wait for them to both be in the same generation, the same marking will both discover the reference and know the reachability of the referent whence it can be collected. Why can't we just wait for the reference and referent to both be tenured into the old generation before they are processed? I realize this delays processing until such time that we have either a global marking, a full collection, or the reference and referent both end up in the old generation. It is possible I misunderstood the original problem. Can you explain what is causing the leak here? i.e. what causes us not to eventually discover and process the reference when both it and its referent are in the old generation? Why does the `should_discover()` on the reference in the old generation return false when its referent is also in the same genereation. Many years ago the concept of this visibility across generations was handled by means of a "span" (of marking visiblity if you will) that the reference processor carried, so that it would leave alone and not discover references whose referent was in a different generation. The same issue existed in CMS where it was dealt with by allowing the reference and referent to both migrate into the same (old) generation at which point reference processing would deal with it because it had full reachability visibility at that point on. Here we seem to be considering a leak where we are left in a state where we never discover the reference even after it and its referent are both in the old generation. How does that happen? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28810#issuecomment-3731179370 From xpeng at openjdk.org Sat Jan 10 06:11:47 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 10 Jan 2026 06:11:47 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v34] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Move thread locals to ShenandoahThreadLocalData ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/5646d5b2..c5824564 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=32-33 Stats: 91 lines in 4 files changed: 60 ins; 29 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Sat Jan 10 06:19:21 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 10 Jan 2026 06:19:21 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Fri, 9 Jan 2026 19:57:23 GMT, Xiaolong Peng wrote: >> I don't have concern to move it to ShenandoahThreadLocalData, the benefit of using THREAD_LOCAL is just better cohesive code because _alloc_start_index is defined in the same namespace where it is used. Performance wise, I don't think there is much benefits. >> >> I do see ZGC also use THREAD_LOCAL directly, I guess the overhead on all threads is not a huge concern. >> But given Shenandoah has ShenandoahThreadLocalData to manage all the thread locals, it make sense to not use THREAD_LOCAL directly, I'll update the PR to move _alloc_start_index to ShenandoahThreadLocalData. > > While I am trying to use ShenandoahThreadLocalData, I realized that I need to add 2 alloc start index, one for ShenandoahMutatorAllocator, one for ShenandoahCollectorAllocator. Later when I update ShenandoahOldCollectorAllocator to use CAS allocation, one more will be added. I have moved all the THREAD_LOCAL to ShenandoahThreadLocalData, also rewrote `alloc_start_index` function in templated impl to avoid virtual table lookup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2678364578 From kdnilsen at openjdk.org Sun Jan 11 03:04:44 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 11 Jan 2026 03:04:44 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v5] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 66 commits: - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - Fix comment - Use PROPERFMT macros - Simplify code flow: reviewer suggestion - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - Remove develop/debug instrumentation - add another override - Change type of command-line args - fix white space - Add override to virtual methods - ... and 56 more: https://git.openjdk.org/jdk/compare/659b53fe...ac0e8c57 ------------- Changes: https://git.openjdk.org/jdk/pull/29039/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=04 Stats: 1027 lines in 25 files changed: 920 ins; 35 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From kdnilsen at openjdk.org Sun Jan 11 03:08:48 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 11 Jan 2026 03:08:48 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v29] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 86 commits: - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Move rebuild free set earlier in an abbreviated GC cycle - Restore deleted assert statement - Move special handling into ShenandoahYoungHeuristics::choose_collection_set_from_regiondata() - fix another typo - Fix typo - Fix confusing comment - Add comment - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Fix whitespace and comment - ... and 76 more: https://git.openjdk.org/jdk/compare/659b53fe...27ece3e8 ------------- Changes: https://git.openjdk.org/jdk/pull/25357/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=28 Stats: 1520 lines in 41 files changed: 789 ins; 289 del; 442 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From stefank at openjdk.org Mon Jan 12 14:27:42 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Jan 2026 14:27:42 GMT Subject: RFR: 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 14:15:57 GMT, Stefan Karlsson wrote: > During the review of [JDK-8374780](https://bugs.openjdk.org/browse/JDK-8374780) it was clear that the naming of the various oop iterators, and also their comments, were somewhat misleading about if they visited the metadata or not. > > I propose that we make a clear separation and have it so that all iterators that only visit the array elements are named to contain the word "elements" to make it slightly clearer that these iterators only visits the elements. > > So, we know how the following ObjArrayKlass oop iterators: > > Iterators that also visit the metadata: > > oop_oop_iterate > oop_oop_iterate_reverse > oop_oop_iterate_bounded > > > Iterators that are not visiting the metadata: > > oop_oop_iterate_elements > oop_oop_iterate_elements_range > oop_oop_iterate_elements_bounded > > > The objArrayOopDesc class also exposes an oop iterator and that function has been renamed to mimic the above scheme to add the `_elements` to functions that does not visit the metadata: > > oop_iterate_elements_range > > > Two extra things to check in the patch: > > 1) I did some slight tweaks to the code so that `oop_oop_iterate_elements` is implemented with `oop_oop_iterate_elements_range`. > > 2) I moved the objArrayOopDesc function back to the oopArrayOop.inline.hpp. The reason is that we have now solved the problems with circular dependencies between .inline.hpp files, so this workaround isn't needed anymore. Ping #29116 ------------- PR Comment: https://git.openjdk.org/jdk/pull/29170#issuecomment-3738773609 From stefank at openjdk.org Mon Jan 12 14:27:41 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Jan 2026 14:27:41 GMT Subject: RFR: 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass Message-ID: During the review of [JDK-8374780](https://bugs.openjdk.org/browse/JDK-8374780) it was clear that the naming of the various oop iterators, and also their comments, were somewhat misleading about if they visited the metadata or not. I propose that we make a clear separation and have it so that all iterators that only visit the array elements are named to contain the word "elements" to make it slightly clearer that these iterators only visits the elements. So, we know how the following ObjArrayKlass oop iterators: Iterators that also visit the metadata: oop_oop_iterate oop_oop_iterate_reverse oop_oop_iterate_bounded Iterators that are not visiting the metadata: oop_oop_iterate_elements oop_oop_iterate_elements_range oop_oop_iterate_elements_bounded The objArrayOopDesc class also exposes an oop iterator and that function has been renamed to mimic the above scheme to add the `_elements` to functions that does not visit the metadata: oop_iterate_elements_range Two extra things to check in the patch: 1) I did some slight tweaks to the code so that `oop_oop_iterate_elements` is implemented with `oop_oop_iterate_elements_range`. 2) I moved the objArrayOopDesc function back to the oopArrayOop.inline.hpp. The reason is that we have now solved the problems with circular dependencies between .inline.hpp files, so this workaround isn't needed anymore. ------------- Commit messages: - 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass Changes: https://git.openjdk.org/jdk/pull/29170/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29170&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8375040 Stats: 62 lines in 11 files changed: 18 ins; 21 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/29170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29170/head:pull/29170 PR: https://git.openjdk.org/jdk/pull/29170 From tschatzl at openjdk.org Mon Jan 12 15:17:10 2026 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 12 Jan 2026 15:17:10 GMT Subject: RFR: 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 14:15:57 GMT, Stefan Karlsson wrote: > During the review of [JDK-8374780](https://bugs.openjdk.org/browse/JDK-8374780) it was clear that the naming of the various oop iterators, and also their comments, were somewhat misleading about if they visited the metadata or not. > > I propose that we make a clear separation and have it so that all iterators that only visit the array elements are named to contain the word "elements" to make it slightly clearer that these iterators only visits the elements. > > So, we know how the following ObjArrayKlass oop iterators: > > Iterators that also visit the metadata: > > oop_oop_iterate > oop_oop_iterate_reverse > oop_oop_iterate_bounded > > > Iterators that are not visiting the metadata: > > oop_oop_iterate_elements > oop_oop_iterate_elements_range > oop_oop_iterate_elements_bounded > > > The objArrayOopDesc class also exposes an oop iterator and that function has been renamed to mimic the above scheme to add the `_elements` to functions that does not visit the metadata: > > oop_iterate_elements_range > > > Two extra things to check in the patch: > > 1) I did some slight tweaks to the code so that `oop_oop_iterate_elements` is implemented with `oop_oop_iterate_elements_range`. > > 2) I moved the objArrayOopDesc function back to the oopArrayOop.inline.hpp. The reason is that we have now solved the problems with circular dependencies between .inline.hpp files, so this workaround isn't needed anymore. Marked as reviewed by tschatzl (Reviewer). src/hotspot/share/oops/objArrayOop.hpp line 86: > 84: > 85: public: > 86: // Special iterators for an element index range Suggestion: // Special iterators for an element index range. ------------- PR Review: https://git.openjdk.org/jdk/pull/29170#pullrequestreview-3651284570 PR Review Comment: https://git.openjdk.org/jdk/pull/29170#discussion_r2682709497 From stefank at openjdk.org Mon Jan 12 15:38:08 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Jan 2026 15:38:08 GMT Subject: RFR: 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass [v2] In-Reply-To: References: Message-ID: > During the review of [JDK-8374780](https://bugs.openjdk.org/browse/JDK-8374780) it was clear that the naming of the various oop iterators, and also their comments, were somewhat misleading about if they visited the metadata or not. > > I propose that we make a clear separation and have it so that all iterators that only visit the array elements are named to contain the word "elements" to make it slightly clearer that these iterators only visits the elements. > > So, we know how the following ObjArrayKlass oop iterators: > > Iterators that also visit the metadata: > > oop_oop_iterate > oop_oop_iterate_reverse > oop_oop_iterate_bounded > > > Iterators that are not visiting the metadata: > > oop_oop_iterate_elements > oop_oop_iterate_elements_range > oop_oop_iterate_elements_bounded > > > The objArrayOopDesc class also exposes an oop iterator and that function has been renamed to mimic the above scheme to add the `_elements` to functions that does not visit the metadata: > > oop_iterate_elements_range > > > Two extra things to check in the patch: > > 1) I did some slight tweaks to the code so that `oop_oop_iterate_elements` is implemented with `oop_oop_iterate_elements_range`. > > 2) I moved the objArrayOopDesc function back to the oopArrayOop.inline.hpp. The reason is that we have now solved the problems with circular dependencies between .inline.hpp files, so this workaround isn't needed anymore. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/oops/objArrayOop.hpp Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29170/files - new: https://git.openjdk.org/jdk/pull/29170/files/1c9a8869..ad81a47e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29170&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29170&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29170/head:pull/29170 PR: https://git.openjdk.org/jdk/pull/29170 From shade at openjdk.org Mon Jan 12 16:23:41 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Jan 2026 16:23:41 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v4] In-Reply-To: References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: On Fri, 9 Jan 2026 17:52:53 GMT, William Kemper wrote: >> This PR simplifies the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Fix typo in assertion message > - Take regulator thread out of STS before requesting GC > > The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. > - Add comments > - Revert back to what should be on this branch > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Don't know how this file got deleted > - Carry over gc cancellation to gc request > - Do not let allocation failure requests be overwritten by other requests > - ... and 4 more: https://git.openjdk.org/jdk/compare/f5fa9e40...2e57f0ac > /issue add JDK-8373100 OK, why? This issue is already resolved. Why are you linking the new PR to it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28932#issuecomment-3739387975 From eastigeevich at openjdk.org Mon Jan 12 16:50:30 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 12 Jan 2026 16:50:30 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v22] In-Reply-To: References: Message-ID: <0es2jUteLFJzSAHjGwnkoW4SDTZ7-7yQXsyloBNMs6E=.3bd217be-e5c3-4d1a-88c0-08e43acdc92b@github.com> > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic AArch64 JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Added a new diagnostic JVM flag `UseDeferredICacheInvalidation` to enable or disable defered icache invalidation. The flag is automatically enabled for AArch64 if CPU supports hardware cache coherence. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. > * Provided a default (no-op) implementation for `DefaultICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > > Testing results: linux fastdebug build > - Neoverse-N1 (Graviton 2) > - [x] tier1: passed > - [x] tier2: passed > - [x] tier3: passed > - [x] tier4: 3 failures > - `containers/docker/TestJcmdWithSideCar.java`: JDK-8341518 > - `com/sun/nio/sctp/SctpChannel/CloseDescriptors.java`: JDK-8298466 > - `java/awt/print/PrinterJob/Prin... Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: - Remove redundant code - Merge branch 'master' into JDK-8370947 - Fix linux-cross-compile riscv64 build - Restore deleted comment - Remove redundant blank line - Remove redundant include - Merge branch 'master' into JDK-8370947 - Fix SpecJVM2008 regressions - Merge branch 'master' into JDK-8370947 - Fix macos and windows aarch64 builds - ... and 23 more: https://git.openjdk.org/jdk/compare/fb13abef...7153eb5c ------------- Changes: https://git.openjdk.org/jdk/pull/28328/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=21 Stats: 819 lines in 32 files changed: 755 ins; 22 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From wkemper at openjdk.org Mon Jan 12 17:25:17 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Jan 2026 17:25:17 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v4] In-Reply-To: References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: <4qJRAn8iSmTI7TPH0Eo_herCgysyVNxyAt7EbHkRwQk=.bbc92e5b-f727-4cc3-9eda-54addc4c83c8@github.com> On Fri, 9 Jan 2026 17:52:53 GMT, William Kemper wrote: >> This PR simplifies the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Fix typo in assertion message > - Take regulator thread out of STS before requesting GC > > The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. > - Add comments > - Revert back to what should be on this branch > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Don't know how this file got deleted > - Carry over gc cancellation to gc request > - Do not let allocation failure requests be overwritten by other requests > - ... and 4 more: https://git.openjdk.org/jdk/compare/f5fa9e40...2e57f0ac We backed out the original "fix" for JDK-8873100 here: https://bugs.openjdk.org/browse/JDK-8374048. I'll remove the issue from the PR and just leave the connections in JBS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28932#issuecomment-3739644285 From wkemper at openjdk.org Mon Jan 12 18:44:32 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Jan 2026 18:44:32 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v4] In-Reply-To: References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: On Fri, 9 Jan 2026 23:05:52 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 708: >> >>> 706: >>> 707: void ShenandoahGenerationalControlThread::notify_control_thread(MonitorLocker& ml, GCCause::Cause cause, ShenandoahGeneration* generation) { >>> 708: assert(_control_lock.is_locked(), "Request lock must be held here"); >> >> What is MonitorLocker, and what is `_control_lock`? Why are we checking `_control_lock` here? If ml is being passed here for the purposes of notification on it, it must be the case that it's locked and the assert at line 708 is a leakage of abstraction? I see that your problem here is that along one path you do nothing and and don't post a notification and along another you do some work and post a notification. It sounds like what you want instead is an API of the following shape: >> >> >> bool should_notify_control_thread(cause, generation) { ... }; >> >> >> and callers might do: >> >> >> MonitorLocker ml(...); >> if (should_notify_control_thread(cause, generation)) { >> ml.notify(); >> } >> ml.wait(); > > In addition, all of the various flavours of `notify_control_thread()` with optional parameters should ideally call down into the version that has all of the parameters specified. In the cases where these parameters aren't specified, the version of the method fills default parameters. > > The fully-parameterized version of the method then contains and consolidates the entire logic and any invariant/assertion checking of the various parameters that make sense, and executes the relevant logic to return a boolean result to allow the caller to notify and/or wait on the monitor. > What is MonitorLocker, and what is _control_lock? >From the declaration of `_control_lock`: // This lock is used to coordinate setting the _requested_gc_cause, _requested generation // and _gc_mode. It is important that these be changed together and have a consistent view. Monitor _control_lock; There are a few different paths into `notify_control_thread`. Rather than expose the locking protocol to callers and duplicating the logic to notify or not, we have an entry point that acquires the lock for the caller before calling down into the method that does the actual work. On some paths, the lock is already held for other reasons, so the overload is provided for these cases (the lock is not reentrant). > optional parameters should ideally call down into the version that has all of the parameters specified. This is what is happening, except in the case of allocation failures where the caller does not "know" the generation being collected (assuming a collection is even running). The implementation here intentionally hides as much information as possible. Some callers do not need to provide the generation. In some cases, only the control thread itself can provide the correct generation. We could expose this piece of data to other threads so that could always include a generation in their request, but the control thread would still end up ignoring it. This would also be confusing on a first read of the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28932#discussion_r2683457097 From wkemper at openjdk.org Mon Jan 12 18:53:28 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Jan 2026 18:53:28 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v4] In-Reply-To: References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: <0fM9tDaIfMBZvRsmcCzIduQL83vhSyV6QiXPlaIm1NA=.687fe3bb-c154-4c77-b988-349bb937b61a@github.com> On Fri, 9 Jan 2026 23:08:11 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash >> - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash >> - Fix typo in assertion message >> - Take regulator thread out of STS before requesting GC >> >> The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. >> - Add comments >> - Revert back to what should be on this branch >> - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash >> - Don't know how this file got deleted >> - Carry over gc cancellation to gc request >> - Do not let allocation failure requests be overwritten by other requests >> - ... and 4 more: https://git.openjdk.org/jdk/compare/f5fa9e40...2e57f0ac > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 721: > >> 719: } >> 720: >> 721: void ShenandoahGenerationalControlThread::notify_control_thread(GCCause::Cause cause) { > > Apropos my comment above, this code has a bad smell that some versions of the method expect the lock to be held, and other methods acquire the lock. > > It makes reasoning about the code at an abstract level very error-prone, and potentially difficult to maintain correctly over time. The overload that does the actual work always requires the lock to be held. The other overload is a convenience method that takes the lock and provides the locker to the actual method that does the work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28932#discussion_r2683485267 From wkemper at openjdk.org Mon Jan 12 19:09:24 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Jan 2026 19:09:24 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v4] In-Reply-To: References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: On Sat, 10 Jan 2026 00:07:54 GMT, Y. Srinivas Ramakrishna wrote: >> The reason I am leaving this comment here is that further above (lines 85-88), and in many of the other mode changes we seem to take care to use the control lock to protect these transitions so that other parties may observe the right state. Perhaps that is not needed in some cases, but the plurality of different forms of update of state that is modifed and may be read by other threads in conjunction with other state leaves me feeling queasy about the possible cracks in the coordination surface that may open us up to trouble. > > (Note that the "mode" parameter used in the non-generational control thread is completely internal to the control thread and unlike in the case of the generational control thread here, never leaks out of that thread to be consulted by another thread asynchronously.) > Will they "observe the shutdown" if mode isn't "stopped"? Yes, gc waiters are waiting to observe an increment in the GC ID. Alloc failure waiters are waiting for `ShHeap::_cancelled_cause` to stop being an allocation failure. Both groups of waiters will exit their loop if the control thread itself should terminate. Only the regulator thread waits on `gc_mode`. Changing the `gc_mode` is always done when the `_control_lock` is held and will notify any waiters. The regulator thread does read the `gc_mode` without a lock, but only for deciding when to evaluate heuristic triggers. The regulator thread does take the lock and reads `gc_mode` again before making a request to start a GC cycle. > the "mode" parameter used in the non-generational control thread is completely internal to the control thread This is one of the major differences with the non-generational mode. In the non-generational mode, the control thread is responsible for running the collection cycle _and deciding when to start a cycle_. In the non-generational mode, the control thread cannot evaluation heuristics when it is running a cycle (indeed, it would make no sense to because a cycle is already running). However, in the generational mode, these responsibilities are decoupled because we want to continue evaluating heuristics while the control thread is running an old mark. This is how we decide to interrupt an old cycle and start a young cycle. The `gc_mode` is exposed to the regulator thread so that it does not futilely evaluate heuristics when the control thread is running a young or global cycle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28932#discussion_r2683533455 From wkemper at openjdk.org Mon Jan 12 19:41:47 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Jan 2026 19:41:47 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v8] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 17:52:35 GMT, William Kemper wrote: >> The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. >> >> When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). >> >> To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. >> >> This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Heal discovered lists for any young collection coincides with old marking > - Configure thread local mark closure on delegated old reference processor > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Fix idiosyncratic white space in whitebox > > Co-authored-by: Stefan Karlsson > - Sort includes > - Heal old discovered lists in parallel > - Fix comment > - Factor duplicate code into shared method > - ... and 13 more: https://git.openjdk.org/jdk/compare/f5fa9e40...abccb8b6 Consider a situation where an old and young reference both point to an old referent: young_reference old_reference \ / \ / old_referent Genshen "bootstraps" old marking with a young collection. During young generation marking, oop iteration will "encounter" `young_reference`, and ask our reference processor to "discover" it. If the reference processor _discovers_ `old_referent`, then the reference processor is responsible for it and the mark thread will _not_ mark through the referent. Conversely, if the reference processor does _not_ discover the referent, then oop iteration will mark the referent and trace through it. This behavior is baked pretty deep into oop iteration and any changes in this code would affect all collectors. The young reference processor cannot _discover_ old referents because it will not have complete mark information when it comes time to process the discovered list. By not discovering the old referent, the mark thread will mark and trace through the old referent. Following the bootstrap, when old marking encounters `old_reference`, it sees that `old_referent` is already strongly marked and so similarly does not _discover_ it. We could continue to wait for `young_reference` to die or be promoted, but you can see that a single `young_reference` to any `old_referent` can prevent the reference from being cleared. This PR has the young reference processor defer discovery to the old reference processor. This is simple enough, except that the young references now live in the old reference processor's discovered list. This list runs through the heap so these young references can be evacuated. This requires us to update references for the old processor's discovered list. Note that the list already updates the card table when young references are placed on the discovered list, so they become roots for subsequent young collections by virtue of being in the remembered set. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28810#issuecomment-3740189396 From xpeng at openjdk.org Mon Jan 12 20:55:18 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 12 Jan 2026 20:55:18 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 21:11:20 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 167: > >> 165: }; >> 166: >> 167: HeapWord* ShenandoahHeapRegion::allocate_atomic(size_t size, const ShenandoahAllocRequest& req, bool &ready_for_retire) { > > Suggest we add a fourth arg: int &contended > We initialize contended to zero What do we want to track with the `contended` arg? If we want to track the number of threads allocating simultaneously from the same region or from same shared alloc region index, we can't archive either of them with the arg `&contended `. Let's discuss this later, I need to understand the intention/idea before making this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2683872357 From xpeng at openjdk.org Mon Jan 12 21:14:16 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 12 Jan 2026 21:14:16 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 21:17:57 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 194: > >> 192: if (ShenandoahHeapRegion* r = nullptr; (r = _alloc_regions[i].address) != nullptr && r->is_active_alloc_region()) { >> 193: bool ready_for_retire = false; >> 194: HeapWord* obj = atomic_allocate_in(r, true, req, in_new_region, ready_for_retire); > > Insert before atomic_allocate_in: int contended > Pass this as 6th arg to atomic_allocate_in() > Add this code after atomic_allocate_in(): > if ((i == alloc_start_index) && (contended > 1)) { > randomize_start_index(); // I think this is realized by setting _alloc_start_index to UINT_MAX > } template template HeapWord* ShenandoahAllocator::attempt_allocation_in_alloc_regions(ShenandoahAllocRequest &req, bool &in_new_region, uint const alloc_start_index, uint ®ions_ready_for_refresh) The API attempt_allocation_in_alloc_regions is designed as above, it tries to allocate from shared alloc regions, iterating from `alloc_start_index` in one run, it doesn't make sense to change the starting index inside this method during the loop. Also, the `contended` you are suggesting is actually the number regions it has attempted in `attempt_allocation_in_alloc_regions`, It doesn't reflect the contention. There might be case we truly want to shuffle/re-random the alloc_start_index for mutator threads, e.g. most of mutator threads have same alloc_start_index causing congestion/contention in one shared alloc region, but it is a bit tricky to trace and detects, we could do it in future improvments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2683919611 From xpeng at openjdk.org Mon Jan 12 21:19:49 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 12 Jan 2026 21:19:49 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 21:30:09 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 233: > >> 231: // evacuation are not updated during evacuation. For both young and old regions r, it is essential that all >> 232: // PLABs be made parsable at the end of evacuation. This is enabled by retiring all plabs at end of evacuation. >> 233: region->concurrent_set_update_watermark(region->top()); > > There's a race here. Multiple mutators may be updating watermark in parallel. It may be that the mutator who most recently allocated is not the mutator who makes the "most recent" overwrite of set_update_watermark(). > > I think the better fix is to remove this code. Update refs should just assume that update watermark equals top for any region in the Old gen, and for any region that was in the Collector partition. It may not be easy to know which regions were "in the Collector partition". Maybe we use a Sentinel value for update_watermark on all such regions. Just overwrite update_watermark(nullptr)? And check for this in update-refs? Needs a solution, and solution needs to be documented in code comments. I don't really know how watermark is used, but based on you comments and explanations I it might be possible to use top instead of watermark if is only used in the old gen, it won't be used in other young gen we could remove concurrent_set_update_watermark, I'll try to make the change and test it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2683935723 From xpeng at openjdk.org Mon Jan 12 21:19:51 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 12 Jan 2026 21:19:51 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Wed, 7 Jan 2026 22:24:23 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 254: >> >>> 252: // Step 1: find out the alloc regions which are ready to refresh. >>> 253: for (uint i = 0; i < _alloc_region_count; i++) { >>> 254: ShenandoahAllocRegion* alloc_region = &_alloc_regions[i]; >> >> We've got the heap lock here. why does this need to be atomic? Comments in the code should make this clear. > > I believe AtomicAccess::load here is not needed, I'll remove it. I have removed the AtomicAccess::load here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2683938237 From ysr at openjdk.org Mon Jan 12 21:41:27 2026 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Jan 2026 21:41:27 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v8] In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 17:52:35 GMT, William Kemper wrote: >> The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. >> >> When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). >> >> To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. >> >> This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Heal discovered lists for any young collection coincides with old marking > - Configure thread local mark closure on delegated old reference processor > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Fix idiosyncratic white space in whitebox > > Co-authored-by: Stefan Karlsson > - Sort includes > - Heal old discovered lists in parallel > - Fix comment > - Factor duplicate code into shared method > - ... and 13 more: https://git.openjdk.org/jdk/compare/f5fa9e40...abccb8b6 Thank you for explaining clearly here. I have some related questions below: > Consider a situation where an old and young reference both point to an old referent: > > ``` > young_reference old_reference > \ / > \ / > old_referent > ``` ... > > The young reference processor cannot _discover_ old referents because it will not have complete mark information when it comes time to process the discovered list. Correct. > By not discovering the old referent, the mark thread will mark and trace through the old referent. Why? It should find the old referent to be in the old generation and leave it alone? Are you specifically talking about the case of bootstrap young that is seeding work for the old collection? In that case, it should mark old referent and leave it alone, but go no further. The old marking will find the referent marked and not clear it. > Following the bootstrap, when old marking encounters `old_reference`, it sees that `old_referent` is already strongly marked and so similarly does not _discover_ it. I would think that this is indeed the correct behaviour. > We could continue to wait for `young_reference` to die or be promoted, but you can see that a single `young_reference` to any `old_referent` can prevent the reference from being cleared. Yes, and that should be ok. Once reference and referent are both in old, it'll be cleared. I realize this is a consequence of the split local marking that each generation does. You are concerned that this will keep the referent perpetually around because of such "hand-over-hand" reference to it from each generation. One question is whether this ends up violating anything in the spec of java.lang.Reference. If so, which spec? Or is it a quality of implementation issue? Is there an example of that from an application/service where we see this? > > This PR has the young reference processor defer discovery to the old reference processor. This is simple enough, except that the young references now live in the old reference processor's discovered list. This list runs through the heap so these young references can be evacuated. This requires us to update references for the old processor's discovered list. Note that the list already updates the card table when young references are placed on the discovered list, so they become roots for subsequent young collections by virtue of being in the remembered set. I am concerned about creating this extra coordination complexity unless there is a good practical reason to fix it in this manner. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28810#issuecomment-3740590447 From kdnilsen at openjdk.org Mon Jan 12 21:43:57 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 Jan 2026 21:43:57 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild [v5] In-Reply-To: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: > This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. > > This addresses a problem that results if available memory is probed while we are rebuilding the freeset. > > Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Use appropriate locks for ShenFreeSet used() and capacity() - Revert "Add rebuild synchronization to capacity() and used()" This reverts commit 3c29dc10b17f1856203135a31b75c3afea16ba50. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27612/files - new: https://git.openjdk.org/jdk/pull/27612/files/3c29dc10..300a256a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27612&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27612&range=03-04 Stats: 16 lines in 4 files changed: 7 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/27612.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27612/head:pull/27612 PR: https://git.openjdk.org/jdk/pull/27612 From xpeng at openjdk.org Mon Jan 12 22:00:26 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 12 Jan 2026 22:00:26 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 18:09:05 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 133: > >> 131: ShenandoahHeapAccountingUpdater accounting_updater(_free_set, ALLOC_PARTITION); >> 132: >> 133: if (regions_ready_for_refresh > 0u) { > > Since we've already taken the heap lock because we failed to allocate "fast", I'm ok to go ahead and refresh any regions that are ready right now, even if it's only 1 region. > > I'm wondering if we can avoid thrashing in the case that there are no more regions available. We might want to keep a state variable that represents whether there exist free-set regions with which to refresh our cache. This could be updated whenever we "add to" or "rebuild" the free set, and whenever refresh_alloc_regions() find there is insufficient supply to demand. We would want to avoid repeated calls to refresh_alloc_regions() if there are no "refresh_regions_available". To actually track the accurate number of `refresh_regions_available`, we need to add three counters: one for each partition. Since we always retire the region is the free is less than PLAB::min_size(), we may already have it in free set partitions(_region_counts?). There is also complexity here, for Collector and OldCollector, we allow them to transfer regions from Mutator partition, we have to consider FREE regions in mutator partition to calculate "refresh_regions_available" for Collector/OldCollector. It should be rare case when the entire heap is almost filled up, I am wondering how much benefit we could get from doing that, if you are ok with it I'd suggest to consider this optimization later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2684045131 From ysr at openjdk.org Mon Jan 12 23:03:26 2026 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Jan 2026 23:03:26 GMT Subject: RFR: 8373819: Genshen: Control thread can miss allocation failure notification (redux) [v4] In-Reply-To: <4qJRAn8iSmTI7TPH0Eo_herCgysyVNxyAt7EbHkRwQk=.bbc92e5b-f727-4cc3-9eda-54addc4c83c8@github.com> References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> <4qJRAn8iSmTI7TPH0Eo_herCgysyVNxyAt7EbHkRwQk=.bbc92e5b-f727-4cc3-9eda-54addc4c83c8@github.com> Message-ID: On Mon, 12 Jan 2026 17:22:29 GMT, William Kemper wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash >> - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash >> - Fix typo in assertion message >> - Take regulator thread out of STS before requesting GC >> >> The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. >> - Add comments >> - Revert back to what should be on this branch >> - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash >> - Don't know how this file got deleted >> - Carry over gc cancellation to gc request >> - Do not let allocation failure requests be overwritten by other requests >> - ... and 4 more: https://git.openjdk.org/jdk/compare/f5fa9e40...2e57f0ac > > We backed out the original "fix" for JDK-8373100 here: https://bugs.openjdk.org/browse/JDK-8374048. I'll remove the issue from the PR and just leave the connections in JBS. Thanks for your responses/explanation, @earthling-amzn. They all make sense; we can defer the documentation and any refactor/clean-ups for a future PR, since these changes are needed to fix the existing behaviour and it makes sense to land this change sooner rather than later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28932#issuecomment-3740894838 From ysr at openjdk.org Mon Jan 12 23:06:23 2026 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Jan 2026 23:06:23 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v8] In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 21:37:32 GMT, Y. Srinivas Ramakrishna wrote: > ... unless there is a good practical reason to fix it in this manner. Got more info from William offline; I am going to take a closer look at the changes in light of that, the explanation he provided above, and the example provided by the submitter. Thanks William! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28810#issuecomment-3740900759 From wkemper at openjdk.org Mon Jan 12 23:39:23 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Jan 2026 23:39:23 GMT Subject: Integrated: 8373819: Genshen: Control thread can miss allocation failure notification (redux) In-Reply-To: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: On Fri, 19 Dec 2025 19:09:01 GMT, William Kemper wrote: > This PR simplifies the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. This pull request has now been integrated. Changeset: 15b7a425 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/15b7a4252b8d3595b7bc409e20d4c617e89240e8 Stats: 95 lines in 4 files changed: 45 ins; 17 del; 33 mod 8373819: Genshen: Control thread can miss allocation failure notification (redux) Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/28932 From wkemper at openjdk.org Tue Jan 13 01:11:55 2026 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Jan 2026 01:11:55 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v9] In-Reply-To: References: Message-ID: > The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. > > When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). > > To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. > > This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Heal discovered lists for any young collection coincides with old marking - Configure thread local mark closure on delegated old reference processor - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Fix idiosyncratic white space in whitebox Co-authored-by: Stefan Karlsson - Sort includes - Heal old discovered lists in parallel - Fix comment - ... and 14 more: https://git.openjdk.org/jdk/compare/15b7a425...88101211 ------------- Changes: https://git.openjdk.org/jdk/pull/28810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=08 Stats: 669 lines in 20 files changed: 537 ins; 84 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From kbarrett at openjdk.org Tue Jan 13 01:43:35 2026 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 13 Jan 2026 01:43:35 GMT Subject: RFR: 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass [v2] In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 15:38:08 GMT, Stefan Karlsson wrote: >> During the review of [JDK-8374780](https://bugs.openjdk.org/browse/JDK-8374780) it was clear that the naming of the various oop iterators, and also their comments, were somewhat misleading about if they visited the metadata or not. >> >> I propose that we make a clear separation and have it so that all iterators that only visit the array elements are named to contain the word "elements" to make it slightly clearer that these iterators only visits the elements. >> >> So, we know how the following ObjArrayKlass oop iterators: >> >> Iterators that also visit the metadata: >> >> oop_oop_iterate >> oop_oop_iterate_reverse >> oop_oop_iterate_bounded >> >> >> Iterators that are not visiting the metadata: >> >> oop_oop_iterate_elements >> oop_oop_iterate_elements_range >> oop_oop_iterate_elements_bounded >> >> >> The objArrayOopDesc class also exposes an oop iterator and that function has been renamed to mimic the above scheme to add the `_elements` to functions that does not visit the metadata: >> >> oop_iterate_elements_range >> >> >> Two extra things to check in the patch: >> >> 1) I did some slight tweaks to the code so that `oop_oop_iterate_elements` is implemented with `oop_oop_iterate_elements_range`. >> >> 2) I moved the objArrayOopDesc function back to the oopArrayOop.inline.hpp. The reason is that we have now solved the problems with circular dependencies between .inline.hpp files, so this workaround isn't needed anymore. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/oops/objArrayOop.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29170#pullrequestreview-3653423624 From aboldtch at openjdk.org Tue Jan 13 08:16:39 2026 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 Jan 2026 08:16:39 GMT Subject: RFR: 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass [v2] In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 15:38:08 GMT, Stefan Karlsson wrote: >> During the review of [JDK-8374780](https://bugs.openjdk.org/browse/JDK-8374780) it was clear that the naming of the various oop iterators, and also their comments, were somewhat misleading about if they visited the metadata or not. >> >> I propose that we make a clear separation and have it so that all iterators that only visit the array elements are named to contain the word "elements" to make it slightly clearer that these iterators only visits the elements. >> >> So, we know how the following ObjArrayKlass oop iterators: >> >> Iterators that also visit the metadata: >> >> oop_oop_iterate >> oop_oop_iterate_reverse >> oop_oop_iterate_bounded >> >> >> Iterators that are not visiting the metadata: >> >> oop_oop_iterate_elements >> oop_oop_iterate_elements_range >> oop_oop_iterate_elements_bounded >> >> >> The objArrayOopDesc class also exposes an oop iterator and that function has been renamed to mimic the above scheme to add the `_elements` to functions that does not visit the metadata: >> >> oop_iterate_elements_range >> >> >> Two extra things to check in the patch: >> >> 1) I did some slight tweaks to the code so that `oop_oop_iterate_elements` is implemented with `oop_oop_iterate_elements_range`. >> >> 2) I moved the objArrayOopDesc function back to the oopArrayOop.inline.hpp. The reason is that we have now solved the problems with circular dependencies between .inline.hpp files, so this workaround isn't needed anymore. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/oops/objArrayOop.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29170#pullrequestreview-3654436558 From duke at openjdk.org Tue Jan 13 09:36:46 2026 From: duke at openjdk.org (Harshit470250) Date: Tue, 13 Jan 2026 09:36:46 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v3] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 21:28:05 GMT, Dean Long wrote: >> Harshit470250 has updated the pull request incrementally with five additional commits since the last revision: >> >> - add guard to the include >> - add load_reference_barrier_Type >> - add clone_barrier_Type >> - add write_barrier_pre_Type >> - revert shenandoah changes > > Why are you trying to #include a .cpp file? Just let the linker handle it. You didn't need that for shenandoahBarrierSetC2.cpp, so what makes barrierSetC2.cpp special? @dean-long I have moved make_clone_type to barrierSetC2.cpp. Can you take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3743180187 From eastigeevich at openjdk.org Tue Jan 13 12:36:16 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 13 Jan 2026 12:36:16 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v23] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic AArch64 JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Added a new diagnostic JVM flag `UseDeferredICacheInvalidation` to enable or disable defered icache invalidation. The flag is automatically enabled for AArch64 if CPU supports hardware cache coherence. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. > * Provided a default (no-op) implementation for `DefaultICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > > Testing results: linux fastdebug build > - Neoverse-N1 (Graviton 2) > - [x] tier1: passed > - [x] tier2: passed > - [x] tier3: passed > - [x] tier4: 3 failures > - `containers/docker/TestJcmdWithSideCar.java`: JDK-8341518 > - `com/sun/nio/sctp/SctpChannel/CloseDescriptors.java`: JDK-8298466 > - `java/awt/print/PrinterJob/Prin... Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Fix macos and windows aarch64 debug builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/7153eb5c..3abb6de4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=21-22 Stats: 10 lines in 3 files changed: 6 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From kbarrett at openjdk.org Tue Jan 13 14:25:58 2026 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 13 Jan 2026 14:25:58 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v4] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 04:46:41 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request incrementally with three additional commits since the last revision: > > - move make_clone to barrierSetC2 > - move make_clone to barrier_stubc2.hpp > - move clone_type src/hotspot/share/opto/type.cpp line 33: > 31: #if INCLUDE_SHENANDOAHGC > 32: #include "gc/shenandoah/c2/shenandoahBarrierSetC2.hpp" > 33: #endif // INCLUDE_SHENANDOAHGC Conditional includes go at the end: https://github.com/openjdk/jdk/blame/49f7265894652ea243f3a531cf3f9d0b06e53565/doc/hotspot-style.md#L159-L161 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27279#discussion_r2686645613 From xpeng at openjdk.org Tue Jan 13 19:42:35 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 13 Jan 2026 19:42:35 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v35] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with five additional commits since the last revision: - Add virtual back for release_alloc_regions and reserve_alloc_regions to fix link error - Eagerly refresh alloc region if there are 1/2 or more of alloc regions ready for retire - Update comments for function alloc_start_index - UUpdate the comments on method allocate - Make release_alloc_regions and reserve_alloc_regions non-virtual methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/c5824564..73e6e8da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=33-34 Stats: 32 lines in 2 files changed: 24 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Tue Jan 13 20:10:10 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 13 Jan 2026 20:10:10 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v36] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 290 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - Add virtual back for release_alloc_regions and reserve_alloc_regions to fix link error - Eagerly refresh alloc region if there are 1/2 or more of alloc regions ready for retire - Update comments for function alloc_start_index - UUpdate the comments on method allocate - Make release_alloc_regions and reserve_alloc_regions non-virtual methods - Move thread locals to ShenandoahThreadLocalData - Add comments for public methods - Remove necessary atomic load - Add _epoch_id to ShenandoahAllocator to trace the update of shared alloc regions - ... and 280 more: https://git.openjdk.org/jdk/compare/4d0ad0a4...9e0520ba ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=35 Stats: 1724 lines in 28 files changed: 1375 ins; 235 del; 114 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Tue Jan 13 22:13:51 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 13 Jan 2026 22:13:51 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v37] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Fix assert when after eagerly refresh alloc regions after fast allocation - Remove the support of 0 for flags ShenandoahMutatorAllocRegions and ShenandoahCollectorAllocRegions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/9e0520ba..8879ec52 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=35-36 Stats: 15 lines in 2 files changed: 0 ins; 11 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From wkemper at openjdk.org Tue Jan 13 22:21:02 2026 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Jan 2026 22:21:02 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild [v5] In-Reply-To: References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: On Mon, 12 Jan 2026 21:43:57 GMT, Kelvin Nilsen wrote: >> This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. >> >> This addresses a problem that results if available memory is probed while we are rebuilding the freeset. >> >> Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - Use appropriate locks for ShenFreeSet used() and capacity() > - Revert "Add rebuild synchronization to capacity() and used()" > > This reverts commit 3c29dc10b17f1856203135a31b75c3afea16ba50. GHA failure looks unrelated to these changes. Internal tests are succeeding. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27612#pullrequestreview-3658154746 From xpeng at openjdk.org Tue Jan 13 22:56:00 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 13 Jan 2026 22:56:00 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: <3u3UesPyV_o1bI2aQFwwMOW3zOh2ES5K8OSUszdhxuo=.07a4dcff-885a-4b0e-8cb2-d236cdd5ed75@github.com> On Wed, 7 Jan 2026 14:56:15 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 114: >> >>> 112: HeapWord* obj = attempt_allocation_in_alloc_regions(req, in_new_region, alloc_start_index(), dummy); >>> 113: if (obj != nullptr) { >>> 114: return obj; >> >> Even in the case that we successfully fill our allocation request, if regions_ready_for_refresh is greater than some percentage of _alloc_region_count (e.g. > _alloc_region_count / 4), then we should grab the heap lock and refresh_alloc_regions() here. Otherwise, we will gradually degrade the number of directly_allocatable_regions until we are down to one before we refresh any of them. > > After further thought, am thinking the threshold for refresh_alloc_regions() might be if (regions_ready_for_refresh >= _alloc_region_count / 2). That would reduce the number of slow paths through the allocator. If we can re-randomize the thread-local start indexes when their original start index hits a retire-able region, this might work ok. I have added support for this, it works as: obj = fast-path-alloc(); if (obj != nullptr && regions_ready_for_refresh < _alloc_region_count / 2) { return obj; } if (obj = nullptr) { obj = slow-path-alloc(); } else { ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), false); refresh_alloc_regions(); } If fast-path-alloc succeeds but determines that there are more than 50% of alloc regions are ready for retiring, when it take heap heap lock to refresh alloc regions, it CANNOT yield to safepoint, because the thread is holding uninitialized obj. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2688380076 From xpeng at openjdk.org Tue Jan 13 22:59:31 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 13 Jan 2026 22:59:31 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v38] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: While eagerly refresh alloc regions, thread should not yield to safepoint because it is holding uninitialized new object ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/8879ec52..475bdac7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=37 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=36-37 Stats: 21 lines in 2 files changed: 9 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From wkemper at openjdk.org Tue Jan 13 23:33:13 2026 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Jan 2026 23:33:13 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v29] In-Reply-To: References: Message-ID: On Sun, 11 Jan 2026 03:08:48 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 86 commits: > > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves > - Move rebuild free set earlier in an abbreviated GC cycle > - Restore deleted assert statement > - Move special handling into ShenandoahYoungHeuristics::choose_collection_set_from_regiondata() > - fix another typo > - Fix typo > - Fix confusing comment > - Add comment > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves > - Fix whitespace and comment > - ... and 76 more: https://git.openjdk.org/jdk/compare/659b53fe...27ece3e8 src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 71: > 69: ShenandoahAdaptiveHeuristics::~ShenandoahAdaptiveHeuristics() {} > 70: > 71: size_t ShenandoahAdaptiveHeuristics::choose_collection_set_from_regiondata(ShenandoahCollectionSet* cset, It would be nice if we didn't need to change this API for every heuristic just to support the mixed evacuation case. It is perhaps not in scope for an already huge PR, but I think we should move `ShenandoahGeneration::compute_evacuation_budgets` and `ShenandoahGeneration::adjust_evacuation_budgets` into `ShenandoahGenerationalHeuristic`. Logically, both these methods are involved in choosing the collection set and both are only used in the generational mode. I think it's fine to defer this refactoring to another PR. It's hard for me to accept that a change such as this would touch 41 files. It seems we do not have the right abstractions or encapsulations here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2688445439 From wkemper at openjdk.org Tue Jan 13 23:43:12 2026 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Jan 2026 23:43:12 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v29] In-Reply-To: References: Message-ID: <9Hggsj2zW9VafwI8DdJbN_v0yTmbEpUyYE8QRFMNU5E=.71bc2233-2812-4482-94b1-9796a5c24594@github.com> On Sun, 11 Jan 2026 03:08:48 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 86 commits: > > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves > - Move rebuild free set earlier in an abbreviated GC cycle > - Restore deleted assert statement > - Move special handling into ShenandoahYoungHeuristics::choose_collection_set_from_regiondata() > - fix another typo > - Fix typo > - Fix confusing comment > - Add comment > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves > - Fix whitespace and comment > - ... and 76 more: https://git.openjdk.org/jdk/compare/659b53fe...27ece3e8 src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 688: > 686: void move_unaffiliated_regions_from_collector_to_old_collector(ssize_t regions); > 687: > 688: inline size_t global_unaffiliated_regions() { A nit, but all functions defined in the class declaration are implicitly `inline` and the keyword is unnecessary here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2688460138 From xpeng at openjdk.org Tue Jan 13 23:46:03 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 13 Jan 2026 23:46:03 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v39] In-Reply-To: References: Message-ID: <5eJEsoXB-qciYhfKa9zd-Qmty2wbphLo0-CDvasIZYk=.97281066-bb8f-4137-b9cc-f2a1890fe39a@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: More eagerly to refresh alloc regions in attempt_allocation_slow since it is holding heap lock ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/475bdac7..6cc1834b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=37-38 Stats: 19 lines in 2 files changed: 10 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From kdnilsen at openjdk.org Tue Jan 13 23:52:21 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 13 Jan 2026 23:52:21 GMT Subject: Integrated: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild In-Reply-To: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: On Thu, 2 Oct 2025 17:58:48 GMT, Kelvin Nilsen wrote: > This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. > > This addresses a problem that results if available memory is probed while we are rebuilding the freeset. > > Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. This pull request has now been integrated. Changeset: 0d19d91b Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/0d19d91b44e5232dbd99d34dcdf6500f892e3048 Stats: 113 lines in 7 files changed: 60 ins; 29 del; 24 mod 8369048: GenShen: Defer ShenFreeSet::available() during rebuild Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/jdk/pull/27612 From kdnilsen at openjdk.org Wed Jan 14 00:14:44 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Jan 2026 00:14:44 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v29] In-Reply-To: References: Message-ID: On Tue, 13 Jan 2026 23:31:03 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 86 commits: >> >> - Merge remote-tracking branch 'jdk/master' into share-collector-reserves >> - Move rebuild free set earlier in an abbreviated GC cycle >> - Restore deleted assert statement >> - Move special handling into ShenandoahYoungHeuristics::choose_collection_set_from_regiondata() >> - fix another typo >> - Fix typo >> - Fix confusing comment >> - Add comment >> - Merge remote-tracking branch 'jdk/master' into share-collector-reserves >> - Fix whitespace and comment >> - ... and 76 more: https://git.openjdk.org/jdk/compare/659b53fe...27ece3e8 > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 71: > >> 69: ShenandoahAdaptiveHeuristics::~ShenandoahAdaptiveHeuristics() {} >> 70: >> 71: size_t ShenandoahAdaptiveHeuristics::choose_collection_set_from_regiondata(ShenandoahCollectionSet* cset, > > It would be nice if we didn't need to change this API for every heuristic just to support the mixed evacuation case. It is perhaps not in scope for an already huge PR, but I think we should move `ShenandoahGeneration::compute_evacuation_budgets` and `ShenandoahGeneration::adjust_evacuation_budgets` into `ShenandoahGenerationalHeuristic`. Logically, both these methods are involved in choosing the collection set and both are only used in the generational mode. I think it's fine to defer this refactoring to another PR. It's hard for me to accept that a change such as this would touch 41 files. It seems we do not have the right abstractions or encapsulations here. > > I have made this refactoring on a branch based off https://github.com/openjdk/jdk/pull/27632, I will rebase it on this PR once it is integrated. Thanks for sorting through the bigger picture here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2688509896 From kdnilsen at openjdk.org Wed Jan 14 00:20:58 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Jan 2026 00:20:58 GMT Subject: [jdk26] RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC Message-ID: <1omJK6dwl0H8dVgBnjjqsZS0KdT1ToM9ntCPh_tRNSs=.9d2f99bb-ed9f-48f7-98be-465b90782157@github.com> Hi all, This pull request contains a backport of commit [385c4f81](https://github.com/openjdk/jdk/commit/385c4f8180d30c0e41b848eb4b2c1c8788211422) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Kelvin Nilsen on 8 Jan 2026 and was reviewed by William Kemper. Thanks! ------------- Commit messages: - Backport 385c4f8180d30c0e41b848eb4b2c1c8788211422 Changes: https://git.openjdk.org/jdk/pull/29213/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29213&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373714 Stats: 12 lines in 7 files changed: 4 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/29213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29213/head:pull/29213 PR: https://git.openjdk.org/jdk/pull/29213 From wkemper at openjdk.org Wed Jan 14 00:28:56 2026 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Jan 2026 00:28:56 GMT Subject: [jdk26] RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC In-Reply-To: <1omJK6dwl0H8dVgBnjjqsZS0KdT1ToM9ntCPh_tRNSs=.9d2f99bb-ed9f-48f7-98be-465b90782157@github.com> References: <1omJK6dwl0H8dVgBnjjqsZS0KdT1ToM9ntCPh_tRNSs=.9d2f99bb-ed9f-48f7-98be-465b90782157@github.com> Message-ID: On Wed, 14 Jan 2026 00:14:46 GMT, Kelvin Nilsen wrote: > Hi all, > > This pull request contains a backport of commit [385c4f81](https://github.com/openjdk/jdk/commit/385c4f8180d30c0e41b848eb4b2c1c8788211422) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Kelvin Nilsen on 8 Jan 2026 and was reviewed by William Kemper. > > Thanks! Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29213#pullrequestreview-3658400623 From kdnilsen at openjdk.org Wed Jan 14 00:47:19 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Jan 2026 00:47:19 GMT Subject: [jdk26] Integrated: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC In-Reply-To: <1omJK6dwl0H8dVgBnjjqsZS0KdT1ToM9ntCPh_tRNSs=.9d2f99bb-ed9f-48f7-98be-465b90782157@github.com> References: <1omJK6dwl0H8dVgBnjjqsZS0KdT1ToM9ntCPh_tRNSs=.9d2f99bb-ed9f-48f7-98be-465b90782157@github.com> Message-ID: On Wed, 14 Jan 2026 00:14:46 GMT, Kelvin Nilsen wrote: > Hi all, > > This pull request contains a backport of commit [385c4f81](https://github.com/openjdk/jdk/commit/385c4f8180d30c0e41b848eb4b2c1c8788211422) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Kelvin Nilsen on 8 Jan 2026 and was reviewed by William Kemper. > > Thanks! This pull request has now been integrated. Changeset: aae9f926 Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/aae9f9269a3755684dd6ee292ff9e2f223b62b34 Stats: 12 lines in 7 files changed: 4 ins; 0 del; 8 mod 8373714: Shenandoah: Register heuristic penalties following a degenerated GC Reviewed-by: wkemper Backport-of: 385c4f8180d30c0e41b848eb4b2c1c8788211422 ------------- PR: https://git.openjdk.org/jdk/pull/29213 From xpeng at openjdk.org Wed Jan 14 01:49:41 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 14 Jan 2026 01:49:41 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v40] In-Reply-To: References: Message-ID: <5NYvQkLEavrwqeS9kK_ez2AeZghuM7VZjpWL3LGtWk0=.bc92fb79-f29d-44f0-8d66-a760ce8ed745@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: - More accurate census noise - Code format - typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/6cc1834b..a26849bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=38-39 Stats: 18 lines in 3 files changed: 10 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Wed Jan 14 07:38:20 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 14 Jan 2026 07:38:20 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v41] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 298 commits: - Merge branch 'master' into cas-alloc-1 - More accurate census noise - Code format - typo - More eagerly to refresh alloc regions in attempt_allocation_slow since it is holding heap lock - While eagerly refresh alloc regions, thread should not yield to safepoint because it is holding uninitialized new object - Fix assert when after eagerly refresh alloc regions after fast allocation - Remove the support of 0 for flags ShenandoahMutatorAllocRegions and ShenandoahCollectorAllocRegions - Merge branch 'openjdk:master' into cas-alloc-1 - Add virtual back for release_alloc_regions and reserve_alloc_regions to fix link error - ... and 288 more: https://git.openjdk.org/jdk/compare/624d7144...6de6789f ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=40 Stats: 1739 lines in 28 files changed: 1390 ins; 236 del; 113 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From tschatzl at openjdk.org Wed Jan 14 11:43:47 2026 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 14 Jan 2026 11:43:47 GMT Subject: RFR: 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass [v2] In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 15:38:08 GMT, Stefan Karlsson wrote: >> During the review of [JDK-8374780](https://bugs.openjdk.org/browse/JDK-8374780) it was clear that the naming of the various oop iterators, and also their comments, were somewhat misleading about if they visited the metadata or not. >> >> I propose that we make a clear separation and have it so that all iterators that only visit the array elements are named to contain the word "elements" to make it slightly clearer that these iterators only visits the elements. >> >> So, we know how the following ObjArrayKlass oop iterators: >> >> Iterators that also visit the metadata: >> >> oop_oop_iterate >> oop_oop_iterate_reverse >> oop_oop_iterate_bounded >> >> >> Iterators that are not visiting the metadata: >> >> oop_oop_iterate_elements >> oop_oop_iterate_elements_range >> oop_oop_iterate_elements_bounded >> >> >> The objArrayOopDesc class also exposes an oop iterator and that function has been renamed to mimic the above scheme to add the `_elements` to functions that does not visit the metadata: >> >> oop_iterate_elements_range >> >> >> Two extra things to check in the patch: >> >> 1) I did some slight tweaks to the code so that `oop_oop_iterate_elements` is implemented with `oop_oop_iterate_elements_range`. >> >> 2) I moved the objArrayOopDesc function back to the oopArrayOop.inline.hpp. The reason is that we have now solved the problems with circular dependencies between .inline.hpp files, so this workaround isn't needed anymore. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/oops/objArrayOop.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29170#pullrequestreview-3660302101 From kdnilsen at openjdk.org Wed Jan 14 20:25:47 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Jan 2026 20:25:47 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v30] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Use PROPERFMTARGS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/27ece3e8..b8d9cca9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=28-29 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Wed Jan 14 21:59:13 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Jan 2026 21:59:13 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v6] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 68 commits: - move some post_initialize() work into subclass ShenandoahGenerationalHeuristics - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - Fix comment - Use PROPERFMT macros - Simplify code flow: reviewer suggestion - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - Remove develop/debug instrumentation - add another override - Change type of command-line args - ... and 58 more: https://git.openjdk.org/jdk/compare/49f72658...ea38ec15 ------------- Changes: https://git.openjdk.org/jdk/pull/29039/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=05 Stats: 1034 lines in 27 files changed: 927 ins; 35 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From kdnilsen at openjdk.org Wed Jan 14 22:36:53 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Jan 2026 22:36:53 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 21:27:57 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: >> >> - Fix comment >> - Use PROPERFMT macros >> - Simplify code flow: reviewer suggestion >> - Merge remote-tracking branch 'jdk/master' into accelerated-triggers >> - Remove develop/debug instrumentation >> - add another override >> - Change type of command-line args >> - fix white space >> - Add override to virtual methods >> - Fix race between allocation reporting and querying >> - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 134: > >> 132: if (_is_generational) { >> 133: _regulator_thread = ShenandoahGenerationalHeap::heap()->regulator_thread(); >> 134: size_t young_available = ShenandoahGenerationalHeap::heap()->young_generation()->max_capacity() - > > Consider pushing this down into `ShenandoahGenerationalHeuristics` Most recent commit has this change. It's a bit clumsy. Feel free to guide further. (I'm not real happy with making direct call to super of super class from ShenandoahGenerationalHeuristics. What do you think?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2692303227 From kdnilsen at openjdk.org Wed Jan 14 22:48:09 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Jan 2026 22:48:09 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: <_Z1mAtwznDVN-HPcTTr_kN0snC4i2Yrxw-e5bPdiCno=.351f037b-10c3-4391-876e-31b2db2900a6@github.com> On Thu, 8 Jan 2026 22:06:01 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: >> >> - Fix comment >> - Use PROPERFMT macros >> - Simplify code flow: reviewer suggestion >> - Merge remote-tracking branch 'jdk/master' into accelerated-triggers >> - Remove develop/debug instrumentation >> - add another override >> - Change type of command-line args >> - fix white space >> - Add override to virtual methods >> - Fix race between allocation reporting and querying >> - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 192: > >> 190: >> 191: void ShenandoahAdaptiveHeuristics::resume_idle_span() { >> 192: size_t mutator_available = _free_set->capacity() - _free_set->used(); > > This is a little confusing to me. Isn't `available` defined as `capacity - used`? Why do we not use `available` here? Agree. Making this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2692329526 From kdnilsen at openjdk.org Thu Jan 15 00:05:55 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 15 Jan 2026 00:05:55 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 22:23:31 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: >> >> - Fix comment >> - Use PROPERFMT macros >> - Simplify code flow: reviewer suggestion >> - Merge remote-tracking branch 'jdk/master' into accelerated-triggers >> - Remove develop/debug instrumentation >> - add another override >> - Change type of command-line args >> - fix white space >> - Add override to virtual methods >> - Fix race between allocation reporting and querying >> - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 715: > >> 713: >> 714: if (ShenandoahHeuristics::should_start_gc()) { >> 715: // ShenandoahHeuristics::should_start_gc() has accepted trigger, or declined it. > > return ShenandoahHeuristics::should_start_gc(); Thanks. Changing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2692488816 From stefank at openjdk.org Thu Jan 15 09:23:07 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 15 Jan 2026 09:23:07 GMT Subject: RFR: 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass [v2] In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 15:38:08 GMT, Stefan Karlsson wrote: >> During the review of [JDK-8374780](https://bugs.openjdk.org/browse/JDK-8374780) it was clear that the naming of the various oop iterators, and also their comments, were somewhat misleading about if they visited the metadata or not. >> >> I propose that we make a clear separation and have it so that all iterators that only visit the array elements are named to contain the word "elements" to make it slightly clearer that these iterators only visits the elements. >> >> So, we know how the following ObjArrayKlass oop iterators: >> >> Iterators that also visit the metadata: >> >> oop_oop_iterate >> oop_oop_iterate_reverse >> oop_oop_iterate_bounded >> >> >> Iterators that are not visiting the metadata: >> >> oop_oop_iterate_elements >> oop_oop_iterate_elements_range >> oop_oop_iterate_elements_bounded >> >> >> The objArrayOopDesc class also exposes an oop iterator and that function has been renamed to mimic the above scheme to add the `_elements` to functions that does not visit the metadata: >> >> oop_iterate_elements_range >> >> >> Two extra things to check in the patch: >> >> 1) I did some slight tweaks to the code so that `oop_oop_iterate_elements` is implemented with `oop_oop_iterate_elements_range`. >> >> 2) I moved the objArrayOopDesc function back to the oopArrayOop.inline.hpp. The reason is that we have now solved the problems with circular dependencies between .inline.hpp files, so this workaround isn't needed anymore. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/oops/objArrayOop.hpp > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> Thanks for reviewing! I ran this through our tier1 testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29170#issuecomment-3753717107 From stefank at openjdk.org Thu Jan 15 09:27:03 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 15 Jan 2026 09:27:03 GMT Subject: Integrated: 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 14:15:57 GMT, Stefan Karlsson wrote: > During the review of [JDK-8374780](https://bugs.openjdk.org/browse/JDK-8374780) it was clear that the naming of the various oop iterators, and also their comments, were somewhat misleading about if they visited the metadata or not. > > I propose that we make a clear separation and have it so that all iterators that only visit the array elements are named to contain the word "elements" to make it slightly clearer that these iterators only visits the elements. > > So, we know how the following ObjArrayKlass oop iterators: > > Iterators that also visit the metadata: > > oop_oop_iterate > oop_oop_iterate_reverse > oop_oop_iterate_bounded > > > Iterators that are not visiting the metadata: > > oop_oop_iterate_elements > oop_oop_iterate_elements_range > oop_oop_iterate_elements_bounded > > > The objArrayOopDesc class also exposes an oop iterator and that function has been renamed to mimic the above scheme to add the `_elements` to functions that does not visit the metadata: > > oop_iterate_elements_range > > > Two extra things to check in the patch: > > 1) I did some slight tweaks to the code so that `oop_oop_iterate_elements` is implemented with `oop_oop_iterate_elements_range`. > > 2) I moved the objArrayOopDesc function back to the oopArrayOop.inline.hpp. The reason is that we have now solved the problems with circular dependencies between .inline.hpp files, so this workaround isn't needed anymore. This pull request has now been integrated. Changeset: bf0da3dd Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/bf0da3dd5c20410aceab8e6f7a7a31432d17b96d Stats: 62 lines in 11 files changed: 18 ins; 21 del; 23 mod 8375040: Clearer names for non-metadata oop iterators in ObjArrayKlass Reviewed-by: tschatzl, kbarrett, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/29170 From eastigeevich at openjdk.org Thu Jan 15 13:48:32 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 15 Jan 2026 13:48:32 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v24] In-Reply-To: References: Message-ID: <4hKdtUvRkeh2Y4slayxr8RXerBau9DSqjsBP6FPpWdU=.30d43b6b-e99c-4bc9-92a7-af9fef12ce01@github.com> > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic AArch64 JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Added a new diagnostic JVM flag `UseDeferredICacheInvalidation` to enable or disable defered icache invalidation. The flag is automatically enabled for AArch64 if CPU supports hardware cache coherence. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. > * Provided a default (no-op) implementation for `DefaultICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > > **Testing results: linux fastdebug build** > - Neoverse-N1 (Graviton 2) > - [x] tier1: passed > - [x] tier2: passed > - [x] tier3: passed > - [x] tier4: 3 failures > - `containers/docker/TestJcmdWithSideCar.java`: JDK-8341518 > - `com/sun/nio/sctp/SctpChannel/CloseDescriptors.java`: JDK-8298466 > - `java/awt/print/PrinterJob/... Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Use SingleShotTime mode with multiple iterations for GCPatchingNmethodCost ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28328/files - new: https://git.openjdk.org/jdk/pull/28328/files/3abb6de4..086b1bf4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=22-23 Stats: 16 lines in 1 file changed: 7 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From eastigeevich at openjdk.org Thu Jan 15 13:51:04 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 15 Jan 2026 13:51:04 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 18:50:44 GMT, Aleksey Shipilev wrote: >> The current algorithm: >> - Create an object used in Java methods. >> - Run the methods in the interpreter. >> - Compile the methods. >> - Make the object garbage collectable. >> - Run GC (we measure this). >> >> There are not many things to warm-up. And setting up everything for multiple iterations of GC runs might be expensive. Instead we use forks. >> >> IMO, Yes it is `@BenchmarkMode(OneShot)`. > > Yeah, but first GC would likely be slower, because it would have more real work to do. So you probably want OneShot with the default number of iterations. It will warmup by doing a few GCs, and then do a few other GCs for measurement. @shipilev I updated the microbenchmark to use `Mode.SingleShotTime` and to have multiple iterations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2694464012 From eastigeevich at openjdk.org Thu Jan 15 13:54:01 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 15 Jan 2026 13:54:01 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v25] In-Reply-To: References: Message-ID: > Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. > > Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: > - Disable coherent icache. > - Trap IC IVAU instructions. > - Execute: > - `tlbi vae3is, xzr` > - `dsb sy` > > `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. > > As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: > > "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." > > This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. > > Changes include: > > * Added a new diagnostic AArch64 JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. > * Added a new diagnostic JVM flag `UseDeferredICacheInvalidation` to enable or disable defered icache invalidation. The flag is automatically enabled for AArch64 if CPU supports hardware cache coherence. > * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. > * Provided a default (no-op) implementation for `DefaultICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. > > **Testing results: linux fastdebug build** > - Neoverse-N1 (Graviton 2) > - [x] tier1: passed > - [x] tier2: passed > - [x] tier3: passed > - [x] tier4: 3 failures > - `containers/docker/TestJcmdWithSideCar.java`: JDK-8341518 > - `com/sun/nio/sctp/SctpChannel/CloseDescriptors.java`: JDK-8298466 > - `java/awt/print/PrinterJob/... Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: - Merge branch 'master' into JDK-8370947 - Use SingleShotTime mode with multiple iterations for GCPatchingNmethodCost - Fix macos and windows aarch64 debug builds - Remove redundant code - Merge branch 'master' into JDK-8370947 - Fix linux-cross-compile riscv64 build - Restore deleted comment - Remove redundant blank line - Remove redundant include - Merge branch 'master' into JDK-8370947 - ... and 26 more: https://git.openjdk.org/jdk/compare/78a106ff...b0ede0a8 ------------- Changes: https://git.openjdk.org/jdk/pull/28328/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28328&range=24 Stats: 826 lines in 32 files changed: 762 ins; 22 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/28328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28328/head:pull/28328 PR: https://git.openjdk.org/jdk/pull/28328 From eastigeevich at openjdk.org Thu Jan 15 13:57:47 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 15 Jan 2026 13:57:47 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v13] In-Reply-To: References: Message-ID: <4x96d2l_emyQYWXaRWJq5lKPo5-fa1i9Ps1RysUmVDM=.7999c815-7452-4e3e-aa89-7195fe3c060f@github.com> On Wed, 3 Dec 2025 16:11:14 GMT, Aleksey Shipilev wrote: >> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Fix linux-cross-compile build aarch64 >> - Merge branch 'master' into JDK-8370947 >> - Remove trailing whitespaces >> - Add support of deferred icache invalidation to other GCs and JIT >> - Add UseDeferredICacheInvalidation to defer invalidation on CPU with hardware cache coherence >> - Add jtreg test >> - Fix linux-cross-compile aarch64 build >> - Fix regressions for Java methods without field accesses >> - Fix code style >> - Correct ifdef; Add dsb after ic >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3d54a802...4b04496f > > Interesting work! I was able to look through it very briefly: @shipilev @theRealAph @fisk Could you please review the PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3755001247 From kdnilsen at openjdk.org Thu Jan 15 17:24:12 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 15 Jan 2026 17:24:12 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v7] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Changes requested by reviewers 1. Change coordination between ShenandoahAdaptiveHeuristics and ShenandoahController/ShenandoahRegulator. Before, ShenandoahAdaptiveHeuristics would poll ShenandoahController or ShenandoahRegulator to obtain most recent wake time and planned sleep time. Now ShnandoahController and and ShenandoahRegulator notify ShenandoahAdaptiveHeuristics each time the values of these variables change. 2. Use available() instead of capacity() - used() when recalculating trigger threshold from within ShenandoahAdaptiveHeuristcs::resume_idle_span(). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29039/files - new: https://git.openjdk.org/jdk/pull/29039/files/ea38ec15..70b7ebf8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=05-06 Stats: 81 lines in 9 files changed: 25 ins; 49 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From kdnilsen at openjdk.org Thu Jan 15 17:34:49 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 15 Jan 2026 17:34:49 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 21:29:23 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: >> >> - Fix comment >> - Use PROPERFMT macros >> - Simplify code flow: reviewer suggestion >> - Merge remote-tracking branch 'jdk/master' into accelerated-triggers >> - Remove develop/debug instrumentation >> - add another override >> - Change type of command-line args >> - fix white space >> - Add override to virtual methods >> - Fix race between allocation reporting and querying >> - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 145: > >> 143: } >> 144: >> 145: double ShenandoahAdaptiveHeuristics::get_most_recent_wake_time() const { > > This introduces a cyclic dependency between control/regulator threads and the heuristics. Since control/regulator threads already _know_ about heuristics, could we instead have the threads invoke setters on the heuristics to provide these values? I've refactored this code now according to your suggestion. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2695322992 From kdnilsen at openjdk.org Thu Jan 15 17:40:57 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 15 Jan 2026 17:40:57 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: <7gJpIsl8hdcq9FAXD_ar0TQPC6QvX7R0jrsq_Jj9tZ4=.6605e23e-c200-4266-b96b-7f05a3fa5ef3@github.com> On Thu, 8 Jan 2026 22:28:41 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: >> >> - Fix comment >> - Use PROPERFMT macros >> - Simplify code flow: reviewer suggestion >> - Merge remote-tracking branch 'jdk/master' into accelerated-triggers >> - Remove develop/debug instrumentation >> - add another override >> - Change type of command-line args >> - fix white space >> - Add override to virtual methods >> - Fix race between allocation reporting and querying >> - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.hpp line 73: > >> 71: bool is_spiking(double rate, double threshold) const; >> 72: >> 73: double interval() const { > > Not seeing where these new methods are used. You are correct. Removing interval() and last_sample_time(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2695341304 From kdnilsen at openjdk.org Thu Jan 15 17:55:33 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 15 Jan 2026 17:55:33 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: On Thu, 8 Jan 2026 22:46:20 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: >> >> - Fix comment >> - Use PROPERFMT macros >> - Simplify code flow: reviewer suggestion >> - Merge remote-tracking branch 'jdk/master' into accelerated-triggers >> - Remove develop/debug instrumentation >> - add another override >> - Change type of command-line args >> - fix white space >> - Add override to virtual methods >> - Fix race between allocation reporting and querying >> - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e > > src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 316: > >> 314: if (progress) { >> 315: heap->notify_gc_progress(); >> 316: heap->shenandoah_policy()->record_success_degenerated(_generation->is_young(), _abbreviated); > > On line 313 above here, we call `policy->record_degenerated` which does everything (and more) that `record_success_degenerated` does. Calling both of them here will increment the various counters twice and is probably not what we want. I think after https://github.com/openjdk/jdk/pull/28834, we shouldn't need `record_success_degenerated` for `ShenandoahCollectorPolicy` at all. Good catch. Thanks. Remove record_success_degenerated(). > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 493: > >> 491: ShenandoahCodeRoots::initialize(); >> 492: >> 493: // Initialization of controller markes use of varaibles esstablished by initialize_heuristics. > > Suggestion: > > // Initialization of controller makes use of variables established by initialize_heuristics. Thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2695380538 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2695384440 From kdnilsen at openjdk.org Thu Jan 15 18:31:51 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 15 Jan 2026 18:31:51 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v8] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Respond to reviewer feedback - Remove unneeded functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29039/files - new: https://git.openjdk.org/jdk/pull/29039/files/70b7ebf8..1ed1ba66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=06-07 Stats: 21 lines in 6 files changed: 0 ins; 19 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From xpeng at openjdk.org Thu Jan 15 18:37:22 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 15 Jan 2026 18:37:22 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v41] In-Reply-To: References: Message-ID: On Tue, 8 Jul 2025 18:26:37 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 298 commits: >> >> - Merge branch 'master' into cas-alloc-1 >> - More accurate census noise >> - Code format >> - typo >> - More eagerly to refresh alloc regions in attempt_allocation_slow since it is holding heap lock >> - While eagerly refresh alloc regions, thread should not yield to safepoint because it is holding uninitialized new object >> - Fix assert when after eagerly refresh alloc regions after fast allocation >> - Remove the support of 0 for flags ShenandoahMutatorAllocRegions and ShenandoahCollectorAllocRegions >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Add virtual back for release_alloc_regions and reserve_alloc_regions to fix link error >> - ... and 288 more: https://git.openjdk.org/jdk/compare/624d7144...6de6789f > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 615: > >> 613: size_t capacity = _free_set->alloc_capacity(i); >> 614: bool is_empty = (capacity == _region_size_bytes); >> 615: // TODO remove assert, not possible to pass when allow mutator to allocate w/o lock. > > Probably the preferred approach here is to "pre-retire" regions when they are made directly allocatable. When the region is pre-retired, it is taken out of the partition, so assert_bounds no longer applies to this region. The new impl always pre-retire when reserve a region from free set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2695516917 From kdnilsen at openjdk.org Thu Jan 15 18:43:01 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 15 Jan 2026 18:43:01 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: <3DyscJVMFCspRM4nseoxI0KASi1uWFAnxWZ6n9dFw0k=.72ddc436-3b2b-489b-ade9-1efbf6ee4a52@github.com> On Fri, 9 Jan 2026 22:20:51 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: >> >> - Fix comment >> - Use PROPERFMT macros >> - Simplify code flow: reviewer suggestion >> - Merge remote-tracking branch 'jdk/master' into accelerated-triggers >> - Remove develop/debug instrumentation >> - add another override >> - Change type of command-line args >> - fix white space >> - Add override to virtual methods >> - Fix race between allocation reporting and querying >> - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 643: > >> 641: future_accelerated_planned_gc_time * 1000); >> 642: } else { >> 643: log_trigger("Momentary spike consumption (%zu%s) exceeds free headroom (%zu%s) at " > > Should the 'Momentary spike' trigger replace the 'instantaneous spike' trigger? It seems like we now have two spike detecting triggers? We could maybe. The conditions under which each triggers are slightly different. I have seen situations where the new Momentary spike consumption triggers when the old "instantaneous spike" did not trigger. Potentially, the other could also happen, though I have not observed it, where the old "instantaneous spike" triggers but momentary spike does not. (Momentary spike is evaluated every 15 ms, whereas instantaneous spike is evaluated every 100 ms. At the time we evaluate a momentary spike, there might be an abundance of runway so we do not trigger. A less extreme instantaneous spike might be observed at a later time, when runway is less plentiful, and that would trigger. Out of an abundance of caution (belts and suspenders), I was thinking to keep both triggers in place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2695536563 From xpeng at openjdk.org Thu Jan 15 18:43:11 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 15 Jan 2026 18:43:11 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v5] In-Reply-To: References: Message-ID: <3VD5OaZ_TbI3IXCLwJms4bhGAqeH90YCX5E4-b-4kew=.f6e1e8be-5bf9-4151-b3b5-b4614b210408@github.com> On Fri, 7 Nov 2025 19:45:46 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2204: >> >>> 2202: i++; >>> 2203: } >>> 2204: return obj; >> >> I think obj always equals nullptr at this point. Seems the code would be easier to understand (and would depend less on effective compiler optimization) if we just made that explicit. Can we just say: >> >> return nullptr? > > Yes, it is always `nullptr`, `return nullptr` will make the code more readable. Resolving this, the code has been refactored, all allocation related codes are in ShenandoahAllocator ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2695539186 From kdnilsen at openjdk.org Thu Jan 15 18:43:19 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 15 Jan 2026 18:43:19 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v29] In-Reply-To: <9Hggsj2zW9VafwI8DdJbN_v0yTmbEpUyYE8QRFMNU5E=.71bc2233-2812-4482-94b1-9796a5c24594@github.com> References: <9Hggsj2zW9VafwI8DdJbN_v0yTmbEpUyYE8QRFMNU5E=.71bc2233-2812-4482-94b1-9796a5c24594@github.com> Message-ID: On Tue, 13 Jan 2026 23:39:46 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 86 commits: >> >> - Merge remote-tracking branch 'jdk/master' into share-collector-reserves >> - Move rebuild free set earlier in an abbreviated GC cycle >> - Restore deleted assert statement >> - Move special handling into ShenandoahYoungHeuristics::choose_collection_set_from_regiondata() >> - fix another typo >> - Fix typo >> - Fix confusing comment >> - Add comment >> - Merge remote-tracking branch 'jdk/master' into share-collector-reserves >> - Fix whitespace and comment >> - ... and 76 more: https://git.openjdk.org/jdk/compare/659b53fe...27ece3e8 > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 688: > >> 686: void move_unaffiliated_regions_from_collector_to_old_collector(ssize_t regions); >> 687: >> 688: inline size_t global_unaffiliated_regions() { > > A nit, but all functions defined in the class declaration are implicitly `inline` and the keyword is unnecessary here. Should I remove inline keyword from all such functions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2695532295 From xpeng at openjdk.org Thu Jan 15 19:01:09 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 15 Jan 2026 19:01:09 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v17] In-Reply-To: <1MZQLDhJsqK5ZoPIVDYYRyVg0po67A6wVfIpsAl7Qa0=.d0bfa7e4-f448-4bb1-a386-b8226133e6a7@github.com> References: <2YVE3uR8bRJ_8qAtXN4WoRmeY0Y9xhzsmKbxqv5oL2M=.2811f02e-35b8-46cf-863a-db4006ca1a78@github.com> <4Qs9QlvWQ7gX6RW-rwPVJa-Ndhtx883aImTtiXyCYGk=.1c03a208-d6e5-42bc-bfc3-d1f95e968929@github.com> <6J2xtZ1DLytiwflQ0wbQCtg8tsRAHadSdkZGZllLAxY=.af212eeb-44f7-4955-96ab-d069febe4e0e@github.com> <1MZQLDhJsqK5ZoPIVDYYRyVg0po67A6wVfIpsAl7Qa0=.d0bfa7e4-f448-4bb1-a386-b8226133e6a7@github.com> Message-ID: On Wed, 7 Jan 2026 21:09:55 GMT, Xiaolong Peng wrote: >> Please document the results of any experiments as rationale for the final design. > > I did run some experiments and didn't see significant difference, I will keep keep current code using PaddedArray, meanwhile keep this conversation open and make a decision based metrics later after I address the other comments. Here is specjbb results I got, heap size is 8G, 8 cores: | | PaddedArray | Raw array | | | ------------------- | -------------- | --------- | ------- | | 1st | Max jOPS | 17893 | 18092 | 1.11% | | Critical jOPS | 15490 | 15765 | 1.78% | | 2nd | Max jOPS | 18219 | 18092 | -0.70% | | Critical jOPS | 15625 | 15498 | -0.81% | | Average | Max jOPS | 18056 | 18092 | 0.20% | | Critical jOPS | 15557.5 | 15631.5 | 0.48% | | Standard deviation | Max jOPS Stdev | 163 | 0 | | | Critical jOPS Stdev | 67.5 | 133.5 | | I didn't see huge difference, using raw array tends to be slightly better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2695586099 From ysr at openjdk.org Thu Jan 15 22:27:11 2026 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 15 Jan 2026 22:27:11 GMT Subject: RFR: 8351892: GenShen: Remove vestigial young generation sizing options In-Reply-To: References: Message-ID: <2LFLZ5iFczKaAQtZHaD31768YrQaM0rwPrK0j-XmlnI=.b10ebb78-3672-4512-8124-c51a2115f8c3@github.com> On Fri, 9 Jan 2026 19:13:31 GMT, William Kemper wrote: > GenShen generally tries to keep the young generation as large as possible. The options `ShenandoahMinYoungPercentage` and `ShenandoahMaxYoungPercentage` are no longer used. Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29144#pullrequestreview-3667884276 From xpeng at openjdk.org Thu Jan 15 22:29:55 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 15 Jan 2026 22:29:55 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v42] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Update code as suggested during the review meeting - Use simple array instead of PaddedArray to store alloc regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/6de6789f..3e8ab717 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=40-41 Stats: 59 lines in 3 files changed: 16 ins; 26 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From wkemper at openjdk.org Thu Jan 15 22:38:05 2026 From: wkemper at openjdk.org (William Kemper) Date: Thu, 15 Jan 2026 22:38:05 GMT Subject: Integrated: 8351892: GenShen: Remove vestigial young generation sizing options In-Reply-To: References: Message-ID: On Fri, 9 Jan 2026 19:13:31 GMT, William Kemper wrote: > GenShen generally tries to keep the young generation as large as possible. The options `ShenandoahMinYoungPercentage` and `ShenandoahMaxYoungPercentage` are no longer used. This pull request has now been integrated. Changeset: 87cbcada Author: William Kemper URL: https://git.openjdk.org/jdk/commit/87cbcadacfa20b24e9ba0bf8374ecbcd331d2b35 Stats: 23 lines in 2 files changed: 0 ins; 23 del; 0 mod 8351892: GenShen: Remove vestigial young generation sizing options Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/29144 From xpeng at openjdk.org Thu Jan 15 22:42:45 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 15 Jan 2026 22:42:45 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Wed, 7 Jan 2026 23:09:16 GMT, Xiaolong Peng wrote: >> Put the comments describing functions in the .hpp file, where they are currently. But we need to enhance those comments. > > I have added comments on those functions, I'll keep adding more for those missing comments; meanwhile I am trying to avoid excessive comment, pleas point out if any of the comments is not clear. I have removed the branch handling _alloc_region_count == 0, also updated comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2696214206 From xpeng at openjdk.org Thu Jan 15 22:58:53 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 15 Jan 2026 22:58:53 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v43] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Only to set update_watermark when allocate from collector/old collector partition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/3e8ab717..42770e08 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=41-42 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From wkemper at openjdk.org Fri Jan 16 00:50:45 2026 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Jan 2026 00:50:45 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v8] In-Reply-To: References: Message-ID: On Mon, 12 Jan 2026 21:37:32 GMT, Y. Srinivas Ramakrishna wrote: > It should find the old referent to be in the old generation and leave it alone? It cannot though. Once the reference is encountered it must either be _discovered_ (i.e., added to a discovered list for later processing), or it must be marked strongly. > Is there an example of that from an application/service where we see this? The [original bug report](https://mail.openjdk.org/pipermail/shenandoah-dev/2025-December/028724.html) referenced an issue with the [Undertow](https://undertow.io/) web server. Not being able to clear these weak references prevents the finalizers from running and resulted in a memory leak. This prevented them from evaluating generational Shenandoah. The change here does add some complexity, but it's not egregious and I feel it does address a real world problem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28810#issuecomment-3757550024 From xpeng at openjdk.org Fri Jan 16 00:56:03 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 00:56:03 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v44] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Code format - Add _volatile_top for atomic allocation in heap region w/o heap lock to address potential race condition when refresh alloc regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/42770e08..18aa3a38 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=42-43 Stats: 65 lines in 6 files changed: 35 ins; 7 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 16 00:59:04 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 00:59:04 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v45] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Remove refreshed_regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/18aa3a38..65770cfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=44 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=43-44 Stats: 6 lines in 1 file changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 16 01:20:36 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 01:20:36 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v17] In-Reply-To: References: <2YVE3uR8bRJ_8qAtXN4WoRmeY0Y9xhzsmKbxqv5oL2M=.2811f02e-35b8-46cf-863a-db4006ca1a78@github.com> <4Qs9QlvWQ7gX6RW-rwPVJa-Ndhtx883aImTtiXyCYGk=.1c03a208-d6e5-42bc-bfc3-d1f95e968929@github.com> <6J2xtZ1DLytiwflQ0wbQCtg8tsRAHadSdkZGZllLAxY=.af212eeb-44f7-4955-96ab-d069febe4e0e@github.com> <1MZQLDhJsqK5ZoPIVDYYRyVg0po67A6wVfIpsAl7Qa0=.d0bfa7e4-f448-4bb1-a386-b8226133e6a7@github.com> Message-ID: On Thu, 15 Jan 2026 18:57:28 GMT, Xiaolong Peng wrote: >> I did run some experiments and didn't see significant difference, I will keep keep current code using PaddedArray, meanwhile keep this conversation open and make a decision based metrics later after I address the other comments. > > Here is specjbb results I got, heap size is 8G, 8 cores: > > | | | PaddedArray | Raw array | | > | ------------------ | ------------- | ----------- | --------- | ------- | > | 1st | Max jOPS | 17893 | 18092 | 1.11% | > | | Critical jOPS | 15490 | 15765 | 1.78% | > | 2nd | Max jOPS | 18219 | 18092 | -0.70% | > | | Critical jOPS | 15625 | 15498 | -0.81% | > | Average | Max jOPS | 18056 | 18092 | 0.20% | > | | Critical jOPS | 15557.5 | 15631.5 | 0.48% | > | Standard deviation | Max jOPS | 163 | 0 | | > | | Critical jOPS | 67.5 | 133.5 | | > > > I didn't see huge difference, using raw array tends to be slightly better. Based on the test result, I have updated the PR to use raw array type instead of PaddedArray. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2696511915 From xpeng at openjdk.org Fri Jan 16 01:20:37 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 01:20:37 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References: <_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Wed, 7 Jan 2026 22:54:13 GMT, Xiaolong Peng wrote: >> I will update the PR and not use atomic version here, and also another place in refresh_alloc_regions. >> >> Having volatile_top and nonvolatile_top seems necessary, it will make the code more complicated w/o much performance benefits, with CAS allocator, most of alloc request will be handled by the atomic code path, in only few >> cases we need non-atomic allocation: >> * After reserving alloc regions from free set before storing to alloc region, it performs obj allocation if the alloc request has not been satisfied yet. >> * After trying atomic allocation, refresh alloc regions fails, it will try to find a region in free set with enough space for the allocation request. >> >> Yes, all the _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs are volatile now, out of these fields, I believe I can maybe remove volatile for _age and _youth(?), but the update of the rest must be atomic because mutators will increase the values in the CAS allocation code path w/o heap lock. > > I have updated the method `atomic_allocate_in` with a template parameter IS_SHARED_ALLOC_REGION, now only when allocating from shared alloc regions the IS_SHARED_ALLOC_REGION parameter is set to true to use atomic operations. I have updated the code to use _volatile_top for CAS allocation, and keep _top nonvolatile, this addresses your concern about the race condition between threads doing CAS allocation and the thread trying to refresh alloc regions, thanks for the suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2696510211 From xpeng at openjdk.org Fri Jan 16 01:20:42 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 01:20:42 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 01:19:29 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 91: > >> 89: virtual HeapWord* allocate(ShenandoahAllocRequest& req, bool& in_new_region); >> 90: virtual void release_alloc_regions(); >> 91: virtual void reserve_alloc_regions(); > > Need comments on these functions. Clarify pre-conditions and post-conditions. I think the intention is: > > 1. allocate(): Caller does not hold the heap lock. All allocations by mutator or GC are fulfilled by this function. This function tries to perform a CAS allocation without obtaining the global heap lock. If that fails, it will obtain the global heap lock and do a free-set allocation. As a side effect of doing a free-set allocation, some number of directly allocatable regions may be retired and replaced with new directly allocatable regions. > 2. release_alloc_regions(): Caller must hold the heap lock. This causes all directly allocatable regions to be placed into the appropriate ShenandoahFreeSet partition. We do this in preparation for choosing a collection set and/or rebuilding the freeset. > 3. reserve_alloc_regions(): Caller must hold the heap lock. This causes us to set aside N regions as directly allocatable by removing these regions from the relevant ShenandoahFreeSet partitions. Explain what happens if there are not N regions available. > > Clarify: these three function represent the entirety of the "public mutation API" that is exercised by mutators and GC workers as they interact with the free set? (There is another set of functions that could be characterized as the read-only API for obtaining state information about the free set. This provides information such as available memory, allocated bytes since GC start, etc.) Thanks, I have updated the comments on these public APIs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2696512842 From xpeng at openjdk.org Fri Jan 16 01:25:51 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 01:25:51 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 22:37:02 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 192: > >> 190: uint i = alloc_start_index; >> 191: do { >> 192: if (ShenandoahHeapRegion* r = nullptr; (r = _alloc_regions[i].address) != nullptr && r->is_active_alloc_region()) { > > Note that there is a race (and performance overhead) with checking r->is_active_alloc_region(). Though a region might be active when we check it here, it may be inactive by the time we attempt to atomic_allocate_in(). > > This is one reason I prefer to use "volatile_top == end" to denote !is_active_alloc_region. This way, you only have to check once (rather than checking is_active() and then checking has_available()). And there is no race between when you check and when you attempt to allocate. I have updated the code to use volatile_top in fast path to allocate from shared alloc regions, it won't test is_active_alloc_region any more, as you suggested, volatile_top is set to the end of region if is it not an active alloc region. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2696520518 From xpeng at openjdk.org Fri Jan 16 01:39:18 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 01:39:18 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: <0GOCAyyQx8bolR-axUuJIUCGqVPqVfYDwvFvQbDBnJg=.d52a66c0-b8ca-42a6-8539-c4886e391b0a@github.com> On Tue, 6 Jan 2026 23:11:19 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 280: > >> 278: // Step 2: allocate region from FreeSets to fill the alloc regions or satisfy the alloc request. >> 279: ShenandoahHeapRegion* reserved[MAX_ALLOC_REGION_COUNT]; >> 280: int reserved_regions = _free_set->reserve_alloc_regions(ALLOC_PARTITION, refreshable_alloc_regions, > > I request we get rid of the min_free_words argument to free_set->reserve_alloc_regions(). See comments in the called function. I have replied in previous comments, we keep it but passing PLAB::min_size() value for now. > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 304: > >> 302: log_debug(gc, alloc)("%sAllocator: Storing heap region %li to alloc region %i", >> 303: _alloc_partition_name, reserved[i]->index(), refreshable[i]->alloc_region_index); >> 304: AtomicAccess::store(&refreshable[i]->address, reserved[i]); > > Should not need to perform AtomicAccess because we hold the heap lock here. When we store a region to alloc region array, AtomicAccess::store will ensure other threads can see it immediately, but the read after holding heap lock has been updated to not AtomicAccess::load. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 3045: > >> 3043: } >> 3044: >> 3045: int ShenandoahFreeSet::reserve_alloc_regions(ShenandoahFreeSetPartitionId partition, int regions_to_reserve, size_t min_free_words, ShenandoahHeapRegion** reserved_regions) { > > I request that we not enforce min_free_words when reserving allocation regions. This defeats the purpose of allocation bias. The objective is to consume fragmented memory early in the GC cycle (when we have more mitigation options if an allocation request ever fails). Note that every region that is in any partition has at least PLAB::min_size() available memory. > > By requiring that MUTATOR regions have PLAB::max_size() words, we are forcing ourselves to never consume the fragmented memory regions. (Towards the end of GC, when memory is in short supply, we will be unable to find directly allocatable MUTATOR regions. This will force ourselves to obtain the heap lock for every allocation. And these allocations will be inefficient because the remaining memory is highly fragmented.) Thanks, I have updated the PR to pass PLAB::min_size() when reserving allocation regions, basically any region with more than PLAB::min_size() can be reserved as shared alloc region, which is same behavior as you are suggesting. But I will keep the argument `min_free_words` just in case want to change the behavior in future, if a region has very small amount of memory to barely fit one smallest TLAB/GCLAB, most of allocation in the region may not succeed, we may want to avoid putting regions which are almost ready to retire into CAS allocator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2696536658 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2696541519 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2696534165 From xpeng at openjdk.org Fri Jan 16 08:19:31 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 08:19:31 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v46] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Fix assert in recycle_internal - Set _volatile_top to nullptr(instead of end of region) for region that is not active for CAS alloc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/65770cfd..6c1a5c40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=44-45 Stats: 49 lines in 3 files changed: 29 ins; 4 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 16 09:22:45 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 09:22:45 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v47] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Load _volatile_top once in free_bytes_for_atomic_alloc - Fix assert in concurrent_set_update_watermark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/6c1a5c40..d8f7e51f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=45-46 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From kdnilsen at openjdk.org Fri Jan 16 15:48:05 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 16 Jan 2026 15:48:05 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v9] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 73 commits: - Fix compile errors following merge from master But there are still many correctness failures following this merge. Still debugging. - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - Respond to reviewer feedback - Remove unneeded functions - Changes requested by reviewers 1. Change coordination between ShenandoahAdaptiveHeuristics and ShenandoahController/ShenandoahRegulator. Before, ShenandoahAdaptiveHeuristics would poll ShenandoahController or ShenandoahRegulator to obtain most recent wake time and planned sleep time. Now ShnandoahController and and ShenandoahRegulator notify ShenandoahAdaptiveHeuristics each time the values of these variables change. 2. Use available() instead of capacity() - used() when recalculating trigger threshold from within ShenandoahAdaptiveHeuristcs::resume_idle_span(). - move some post_initialize() work into subclass ShenandoahGenerationalHeuristics - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - Fix comment - Use PROPERFMT macros - ... and 63 more: https://git.openjdk.org/jdk/compare/34705a77...ce54e38d ------------- Changes: https://git.openjdk.org/jdk/pull/29039/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=08 Stats: 1004 lines in 24 files changed: 893 ins; 41 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From kdnilsen at openjdk.org Fri Jan 16 17:30:30 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 16 Jan 2026 17:30:30 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v10] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix full gc bug introduced by merge from master ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29039/files - new: https://git.openjdk.org/jdk/pull/29039/files/ce54e38d..e2c0ab1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=08-09 Stats: 4 lines in 1 file changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From xpeng at openjdk.org Fri Jan 16 19:37:36 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 19:37:36 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v48] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix new race condition in unset_active_alloc_region which sync _atomic_top back to _top, other threads may see stale _top if not holding heap lock ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/d8f7e51f..83fe3bc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=46-47 Stats: 36 lines in 5 files changed: 3 ins; 0 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Jan 16 19:40:25 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 16 Jan 2026 19:40:25 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v49] In-Reply-To: References: Message-ID: <9zKqYUxiHA4avkGfUq2TeSMbc43_qYcn9IZo8jQlUlI=.1cccb57b-0d28-444f-a87f-c3870816a748@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add more code comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/83fe3bc2..e6192824 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=47-48 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From kdnilsen at openjdk.org Fri Jan 16 19:43:16 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 16 Jan 2026 19:43:16 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data (Original revision) Message-ID: <7JK4muPbya1kL85_zTdJly1jGLlkv39bISYDeKhWtOo=.a16ac8ec-f411-4a2e-b5a4-62f999fe78e1@github.com> This is the originally proposed PR to address https://bugs.openjdk.org/browse/JDK-8353115 At the time this proposed PR was reviewed, a suggestion for refactoring was proposed and explored, resulting in https://github.com/openjdk/jdk/pull/24319 Though the refactored implementation may have some desirable characteristics, this original implementation appears to have better performance. The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. ------------- Commit messages: - reset _mixed_candidate_garbage_words in recycle_internal() - Merge remote-tracking branch 'jdk/master' into revA-fix-live-data-for-mixed-evac-candidates - touch file to force retests - Merge remote-tracking branch 'jdk/master' into revA-fix-live-data-for-mixed-evac-candidates - fix asserts - Use shenandoah_assert_safepoint() instead of is_at_safepoint() - Track live and garbage for mixed-evac regions Changes: https://git.openjdk.org/jdk/pull/29127/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29127&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353115 Stats: 52 lines in 5 files changed: 52 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29127.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29127/head:pull/29127 PR: https://git.openjdk.org/jdk/pull/29127 From wkemper at openjdk.org Fri Jan 16 19:43:17 2026 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Jan 2026 19:43:17 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data (Original revision) In-Reply-To: <7JK4muPbya1kL85_zTdJly1jGLlkv39bISYDeKhWtOo=.a16ac8ec-f411-4a2e-b5a4-62f999fe78e1@github.com> References: <7JK4muPbya1kL85_zTdJly1jGLlkv39bISYDeKhWtOo=.a16ac8ec-f411-4a2e-b5a4-62f999fe78e1@github.com> Message-ID: On Thu, 8 Jan 2026 21:05:36 GMT, Kelvin Nilsen wrote: > This is the originally proposed PR to address https://bugs.openjdk.org/browse/JDK-8353115 > At the time this proposed PR was reviewed, a suggestion for refactoring was proposed and explored, resulting in https://github.com/openjdk/jdk/pull/24319 > > Though the refactored implementation may have some desirable characteristics, this original implementation appears to have better performance. > > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 78: > 76: _live_data(0), > 77: _critical_pins(0), > 78: _mixed_candidate_garbage_words(0), Can we zero this out in `ShHeapRegion::recycle_internal`? Perhaps not strictly necessary, but seems like good hygiene. ------------- PR Review: https://git.openjdk.org/jdk/pull/29127#pullrequestreview-3658125456 PR Review Comment: https://git.openjdk.org/jdk/pull/29127#discussion_r2688278457 From kdnilsen at openjdk.org Fri Jan 16 19:43:19 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 16 Jan 2026 19:43:19 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data (Original revision) In-Reply-To: References: <7JK4muPbya1kL85_zTdJly1jGLlkv39bISYDeKhWtOo=.a16ac8ec-f411-4a2e-b5a4-62f999fe78e1@github.com> Message-ID: <2cXg1cIZMsioz60GjE16T4sf4mTf8OL66JdVOBFeObw=.12835667-63ba-438a-a231-6411cb41fe4f@github.com> On Tue, 13 Jan 2026 22:06:48 GMT, William Kemper wrote: >> This is the originally proposed PR to address https://bugs.openjdk.org/browse/JDK-8353115 >> At the time this proposed PR was reviewed, a suggestion for refactoring was proposed and explored, resulting in https://github.com/openjdk/jdk/pull/24319 >> >> Though the refactored implementation may have some desirable characteristics, this original implementation appears to have better performance. >> >> The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. >> >> However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. >> >> This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 78: > >> 76: _live_data(0), >> 77: _critical_pins(0), >> 78: _mixed_candidate_garbage_words(0), > > Can we zero this out in `ShHeapRegion::recycle_internal`? Perhaps not strictly necessary, but seems like good hygiene. Done. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29127#discussion_r2699733050 From kdnilsen at openjdk.org Fri Jan 16 20:04:18 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 16 Jan 2026 20:04:18 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v11] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: reorder trigger evaulation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29039/files - new: https://git.openjdk.org/jdk/pull/29039/files/e2c0ab1c..19cd3dd6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=09-10 Stats: 102 lines in 1 file changed: 52 ins; 50 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From wkemper at openjdk.org Fri Jan 16 21:00:35 2026 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 Jan 2026 21:00:35 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data (Original revision) In-Reply-To: <7JK4muPbya1kL85_zTdJly1jGLlkv39bISYDeKhWtOo=.a16ac8ec-f411-4a2e-b5a4-62f999fe78e1@github.com> References: <7JK4muPbya1kL85_zTdJly1jGLlkv39bISYDeKhWtOo=.a16ac8ec-f411-4a2e-b5a4-62f999fe78e1@github.com> Message-ID: On Thu, 8 Jan 2026 21:05:36 GMT, Kelvin Nilsen wrote: > This is the originally proposed PR to address https://bugs.openjdk.org/browse/JDK-8353115 > At the time this proposed PR was reviewed, a suggestion for refactoring was proposed and explored, resulting in https://github.com/openjdk/jdk/pull/24319 > > Though the refactored implementation may have some desirable characteristics, this original implementation appears to have better performance. > > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. LGTM, sorry for the churn. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29127#pullrequestreview-3672702229