From aph at openjdk.org Thu Jan 1 13:15:59 2026 From: aph at openjdk.org (Andrew Haley) Date: Thu, 1 Jan 2026 13:15:59 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References:

<15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com> Message-ID: On Wed, 31 Dec 2025 16:06:53 GMT, Evgeny Astigeevich wrote: > > Is there any reason not to do this by default on all AArch64? > > It will be turned on if AArch64 has `ctr_el0.IDC` and `ctr_el0.DIC` set. See https://github.com/openjdk/jdk/pull/28328/changes#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R663 Sure, I can see that, but is there any reason not to do this by default on all AArch64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3703679652 From eastigeevich at openjdk.org Thu Jan 1 20:38:08 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 1 Jan 2026 20:38:08 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References:

<15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com>

Message-ID: On Thu, 1 Jan 2026 13:13:07 GMT, Andrew Haley wrote: > Sure, I can see that, but is there any reason not to do this by default on all AArch64? Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3704085326 From aph at openjdk.org Fri Jan 2 12:11:55 2026 From: aph at openjdk.org (Andrew Haley) Date: Fri, 2 Jan 2026 12:11:55 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References:

<15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com>

Message-ID: <2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com> On Thu, 1 Jan 2026 20:35:25 GMT, Evgeny Astigeevich wrote: > > Sure, I can see that, but is there any reason not to do this by default on all AArch64? > > Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? In a perfect world we'd do this for all AArch64. But Linux-only would be good too. But is there any reason not to do this on all Linux systems? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3705195536 From kbarrett at openjdk.org Fri Jan 2 13:54:02 2026 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 2 Jan 2026 13:54:02 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References:

Message-ID: <1HoFAoxwMTDW1GJteYe1Bl3X9erLJtcdjdY7kEqOMgE=.f34e8855-352e-4654-9297-6af29b5f17de@github.com> On Mon, 29 Dec 2025 21:51:20 GMT, Evgeny Astigeevich wrote: >> Arm Neoverse N1 erratum 1542419: "The core might fetch a stale instruction from memory which violates the ordering of instruction fetches". It is fixed in Neoverse N1 r4p1. >> >> Neoverse-N1 implementations mitigate erratum 1542419 with a workaround: >> - Disable coherent icache. >> - Trap IC IVAU instructions. >> - Execute: >> - `tlbi vae3is, xzr` >> - `dsb sy` >> >> `tlbi vae3is, xzr` invalidates translations for all address spaces (global for address). It waits for all memory accesses using in-scope old translation information to complete before it is considered complete. >> >> As this workaround has significant overhead, Arm Neoverse N1 (MP050) Software Developer Errata Notice version 29.0 suggests: >> >> "Since one TLB inner-shareable invalidation is enough to avoid this erratum, the number of injected TLB invalidations should be minimized in the trap handler to mitigate the performance impact due to this workaround." >> >> This PR introduces a mechanism to defer instruction cache (ICache) invalidation for AArch64 to address the Arm Neoverse N1 erratum 1542419, which causes significant performance overhead if ICache invalidation is performed too frequently. The implementation includes detection of affected Neoverse N1 CPUs and automatic enabling of the workaround for relevant Neoverse N1 revisions. >> >> Changes include: >> >> * Added a new diagnostic JVM flag `NeoverseN1Errata1542419` to enable or disable the workaround for the erratum. The flag is automatically enabled for Neoverse N1 CPUs prior to r4p1, as detected during VM initialization. >> * Introduced the `ICacheInvalidationContext` class to manage deferred ICache invalidation, with platform-specific logic for AArch64. This context is used to batch ICache invalidations, reducing performance impact. As the address for icache invalidation is not relevant, we use the nmethod's code start address. >> * Provided a default (no-op) implementation for `ICacheInvalidationContext` on platforms where the workaround is not needed, ensuring portability and minimal impact on other architectures. >> * Modified barrier patching and relocation logic (`ZBarrierSetAssembler`, `ZNMethod`, `RelocIterator`, and related code) to accept a `defer_icache_invalidation` parameter, allowing ICache invalidation to be deferred and later performed in bulk. >> >> Testing results: linux fastdebug build >> - Neoverse-N1 (Graviton 2) >> - [x] tier1: passed >> - [x] tier2: passed >> - [x] tier3: passed >> - [x] tier4: 3 failu... > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Fix linux-cross-compile riscv64 build src/hotspot/share/runtime/icache.hpp line 139: > 137: class DefaultICacheInvalidationContext : StackObj { > 138: private: > 139: NONCOPYABLE(DefaultICacheInvalidationContext); Not a review, just a drive-by comment. @xmas92 suggested moving the `NONCOPYABLE` to the private part of the class, as a style issue. It used to be that `NONCOPYABLE` was best used in the private part of a class, because of how it was implemented. But with the change to using deleted definitions, it's actually better to have it in the public part. That way you get an "attempt to use a deleted function" error rather than possibly getting an "attempt to use an inaccessible function" error. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28328#discussion_r2657751265 From eastigeevich at openjdk.org Fri Jan 2 15:43:15 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 2 Jan 2026 15:43:15 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: <2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com> References:

<15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com>

<2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com> Message-ID: On Fri, 2 Jan 2026 12:07:57 GMT, Andrew Haley wrote: > > > Sure, I can see that, but is there any reason not to do this by default on all AArch64? > > > > > > Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? > > In a perfect world we'd do this for all AArch64. But Linux-only would be good too. But is there any reason not to do this on all Linux systems? IMO, a reason is that not all AArch64 might have both `ctr_el0.IDC` and `ctr_el0.DIC` set. If either of `ctr_el0.IDC`/`ctr_el0.DIC` or both are not set, we will need to use `DC`/`IC` with real addresses to clean and to invalidate caches. We will need to choose between invalidating modified instructions or the whole nmethod's code. Invalidating modified instructions will need tracking of modified instructions. Invalidating the whole nmethod does not need tracking but it can be expensive vs invalidating particular instructions. IMO cases of Java running on AArch64 CPU where both `ctr_el0.IDC` and `ctr_el0.DIC` are not set, might be rare. Jacob Bramley from Arm has nice blog posts about Caches and Self-Modifying Code: - [Caches and Self-Modifying Code: Implementing `__clear_cache`](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-implementing-clear-cache) - [Caches and Self-Modifying Code: Working with Threads](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads) In this PR we optimize two parts invalidating caches: 1. GCs patching code. This is invalidation of modified instructions. 2. Generation and installation of code. This is invalidation of the whole code. The second case can be optimized for all AArch64. Is there anything I am missing? And we can do optimized cache invalidation on all AArch64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3705607781 From aph at openjdk.org Fri Jan 2 18:06:58 2026 From: aph at openjdk.org (Andrew Haley) Date: Fri, 2 Jan 2026 18:06:58 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References:

<15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com>

<2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com> Message-ID: On Fri, 2 Jan 2026 15:39:50 GMT, Evgeny Astigeevich wrote: > > > > Sure, I can see that, but is there any reason not to do this by default on all AArch64? > > > > > > > > > Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? > > > > > > In a perfect world we'd do this for all AArch64. But Linux-only would be good too. But is there any reason not to do this on all Linux systems? > > IMO, a reason is that not all AArch64 might have both `ctr_el0.IDC` and `ctr_el0.DIC` set. If either of `ctr_el0.IDC`/`ctr_el0.DIC` or both are not set, we will need to use `DC`/`IC` with real addresses to clean and to invalidate caches. We will need to choose between invalidating modified instructions or the whole nmethod's code. Invalidating modified instructions will need tracking of modified instructions. Invalidating the whole nmethod does not need tracking but it can be expensive vs invalidating particular instructions. Ah, I see. So it looks like we'll have to maintain two entirely different bodies of code to do the cache management. That will be a recurring pain, and is disappointing. > > IMO cases of Java running on AArch64 CPU where both `ctr_el0.IDC` and `ctr_el0.DIC` are not set, might be rare. > > Jacob Bramley from Arm has nice blog posts about Caches and Self-Modifying Code: > > * [Caches and Self-Modifying Code: Implementing `__clear_cache`](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-implementing-clear-cache) That's useful. It's worth taking advantage of cache-coherent implementations (when they're not broken!) by not emitting unnecessary instructions. > * [Caches and Self-Modifying Code: Working with Threads](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads) Thanks. That's a good read, but no surprises. I'm fairly sure we've been doing most of that for as long as the port has existed. > In this PR we optimize two parts invalidating caches: > > 1. GCs patching code. This is invalidation of modified instructions. > > 2. Generation and installation of code. This is invalidation of the whole code. > > > The second case can be optimized for all AArch64. > > Is there anything I am missing? And we can do optimized cache invalidation on all AArch64? Probably not, but I've been working on a patch to minimize the invalidation we do today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3705936075 From eastigeevich at openjdk.org Fri Jan 2 22:07:00 2026 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 2 Jan 2026 22:07:00 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References:

<15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com>

<2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com>

Message-ID: On Fri, 2 Jan 2026 18:02:49 GMT, Andrew Haley wrote: >>> > > Sure, I can see that, but is there any reason not to do this by default on all AArch64? >>> > >>> > >>> > Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? >>> >>> In a perfect world we'd do this for all AArch64. But Linux-only would be good too. But is there any reason not to do this on all Linux systems? >> >> IMO, a reason is that not all AArch64 might have both `ctr_el0.IDC` and `ctr_el0.DIC` set. If either of `ctr_el0.IDC`/`ctr_el0.DIC` or both are not set, we will need to use `DC`/`IC` with real addresses to clean and to invalidate caches. We will need to choose between invalidating modified instructions or the whole nmethod's code. Invalidating modified instructions will need tracking of modified instructions. Invalidating the whole nmethod does not need tracking but it can be expensive vs invalidating particular instructions. >> >> IMO cases of Java running on AArch64 CPU where both `ctr_el0.IDC` and `ctr_el0.DIC` are not set, might be rare. >> >> Jacob Bramley from Arm has nice blog posts about Caches and Self-Modifying Code: >> - [Caches and Self-Modifying Code: Implementing `__clear_cache`](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-implementing-clear-cache) >> - [Caches and Self-Modifying Code: Working with Threads](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads) >> >> In this PR we optimize two parts invalidating caches: >> 1. GCs patching code. This is invalidation of modified instructions. >> 2. Generation and installation of code. This is invalidation of the whole code. >> >> The second case can be optimized for all AArch64. >> >> Is there anything I am missing? And we can do optimized cache invalidation on all AArch64? > >> > > > Sure, I can see that, but is there any reason not to do this by default on all AArch64? >> > > >> > > >> > > Do you mean to do this for all AArch64 OSes, not only for Linux AArch64? >> > >> > >> > In a perfect world we'd do this for all AArch64. But Linux-only would be good too. But is there any reason not to do this on all Linux systems? >> >> IMO, a reason is that not all AArch64 might have both `ctr_el0.IDC` and `ctr_el0.DIC` set. If either of `ctr_el0.IDC`/`ctr_el0.DIC` or both are not set, we will need to use `DC`/`IC` with real addresses to clean and to invalidate caches. We will need to choose between invalidating modified instructions or the whole nmethod's code. Invalidating modified instructions will need tracking of modified instructions. Invalidating the whole nmethod does not need tracking but it can be expensive vs invalidating particular instructions. > > Ah, I see. So it looks like we'll have to maintain two entirely different bodies of code to do the cache management. That will be a recurring pain, and is disappointing. > >> >> IMO cases of Java running on AArch64 CPU where both `ctr_el0.IDC` and `ctr_el0.DIC` are not set, might be rare. >> >> Jacob Bramley from Arm has nice blog posts about Caches and Self-Modifying Code: >> >> * [Caches and Self-Modifying Code: Implementing `__clear_cache`](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-implementing-clear-cache) > > That's useful. It's worth taking advantage of cache-coherent implementations (when they're not broken!) by not emitting unnecessary instructions. > >> * [Caches and Self-Modifying Code: Working with Threads](https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-self-modifying-code-working-with-threads) > > Thanks. That's a good read, but no surprises. I'm fairly sure we've been doing most of that for as long as the port has existed. > >> In this PR we optimize two parts invalidating caches: >> >> 1. GCs patching code. This is invalidation of modified instructions. >> >> 2. Generation and installation of code. This is invalidation of the whole code. >> >> >> The second case can be optimized for all AArch64. >> >> Is there anything I am missing? And we can do optimized cache invalidation on all AArch64? > > Probably not, but I've been working on a patch to minimize the invalidation we do today. @theRealAph > ... > Probably not, but I've been working on a patch to minimize the invalidation we do today. Does this mean we don't need this PR or need to rework it? Could you please provide more details? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3706299227 From aph at openjdk.org Sat Jan 3 09:59:05 2026 From: aph at openjdk.org (Andrew Haley) Date: Sat, 3 Jan 2026 09:59:05 GMT Subject: RFR: 8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance [v21] In-Reply-To: References:

<15wAKoik-66gfbUzGQEROBTv_cTV_I_6jq96S2ErOyA=.22e19b52-220c-4c35-9aaa-9f08719e16fa@github.com>

<2b8w_8NwTITCCqqyirLgPcfxyQd35yS-MmPlO8rEwS0=.74b73b12-1b0a-4d73-b18f-1f6ac0e5f18d@github.com>

Message-ID: <8hQOV9QhPL_j5g7WpcwzJc_8QPeAzLSIFk3ASRlCXa8=.3e1ac93f-2f6a-4dce-bead-31241240cb6e@github.com> On Fri, 2 Jan 2026 22:04:00 GMT, Evgeny Astigeevich wrote: > > Probably not, but I've been working on a patch to minimize the invalidation we do today. > > Does this mean we don't need this PR or need to rework it? Could you please provide more details? It makes no difference to this patch. I'm still experimenting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28328#issuecomment-3706937169 From roland at openjdk.org Mon Jan 5 09:34:24 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 09:34:24 GMT Subject: [jdk26] RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: > Hi all, > > This pull request contains a backport of commit [00068a80](https://github.com/openjdk/jdk/commit/00068a80304a809297d0df8698850861e9a1c5e9) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 10 Dec 2025 and was reviewed by Christian Hagedorn, Quan Anh Mai, Galder Zamarre?o and Emanuel Peter. > > Thanks! Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'jdk26' into backport-rwestrel-00068a80-jdk26 - Backport 00068a80304a809297d0df8698850861e9a1c5e9 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28892/files - new: https://git.openjdk.org/jdk/pull/28892/files/ceb2ac15..4121d277 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28892&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28892&range=00-01 Stats: 1087 lines in 32 files changed: 746 ins; 238 del; 103 mod Patch: https://git.openjdk.org/jdk/pull/28892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28892/head:pull/28892 PR: https://git.openjdk.org/jdk/pull/28892 From chagedorn at openjdk.org Mon Jan 5 14:50:30 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jan 2026 14:50:30 GMT Subject: [jdk26] RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References:

Message-ID: On Mon, 5 Jan 2026 09:34:24 GMT, Roland Westrelin wrote: >> Hi all, >> >> This pull request contains a backport of commit [00068a80](https://github.com/openjdk/jdk/commit/00068a80304a809297d0df8698850861e9a1c5e9) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Roland Westrelin on 10 Dec 2025 and was reviewed by Christian Hagedorn, Quan Anh Mai, Galder Zamarre?o and Emanuel Peter. >> >> Thanks! > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'jdk26' into backport-rwestrel-00068a80-jdk26 > - Backport 00068a80304a809297d0df8698850861e9a1c5e9 Looks good! I submitted some testing which passed. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28892#pullrequestreview-3627129917 From roland at openjdk.org Mon Jan 5 14:50:31 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 14:50:31 GMT Subject: [jdk26] RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References:

Message-ID: On Mon, 5 Jan 2026 14:42:05 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'jdk26' into backport-rwestrel-00068a80-jdk26 >> - Backport 00068a80304a809297d0df8698850861e9a1c5e9 > > Looks good! I submitted some testing which passed. @chhagedorn thanks for review and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/28892#issuecomment-3710731279 From roland at openjdk.org Mon Jan 5 14:50:32 2026 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 5 Jan 2026 14:50:32 GMT Subject: [jdk26] Integrated: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 08:30:52 GMT, Roland Westrelin wrote: > Hi all, > > This pull request contains a backport of commit [00068a80](https://github.com/openjdk/jdk/commit/00068a80304a809297d0df8698850861e9a1c5e9) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 10 Dec 2025 and was reviewed by Christian Hagedorn, Quan Anh Mai, Galder Zamarre?o and Emanuel Peter. > > Thanks! This pull request has now been integrated. Changeset: d8a1c1d0 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/d8a1c1d04cab940b4a6cbe82fa2e445102aa9896 Stats: 367 lines in 13 files changed: 266 ins; 27 del; 74 mod 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs Reviewed-by: chagedorn Backport-of: 00068a80304a809297d0df8698850861e9a1c5e9 ------------- PR: https://git.openjdk.org/jdk/pull/28892 From wkemper at openjdk.org Mon Jan 5 17:03:08 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Jan 2026 17:03:08 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v6] In-Reply-To: References: Message-ID: > The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. > > When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). > > To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. > > This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Heal discovered lists for any young collection coincides with old marking - Configure thread local mark closure on delegated old reference processor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28810/files - new: https://git.openjdk.org/jdk/pull/28810/files/f621b70c..d5b17d79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=04-05 Stats: 8 lines in 2 files changed: 4 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From wkemper at openjdk.org Mon Jan 5 17:03:12 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Jan 2026 17:03:12 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v5] In-Reply-To: References:

Message-ID: On Fri, 19 Dec 2025 19:02:13 GMT, William Kemper wrote: >> The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. >> >> When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). >> >> To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. >> >> This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Fix idiosyncratic white space in whitebox > > Co-authored-by: Stefan Karlsson > - Sort includes > - Heal old discovered lists in parallel > - Fix comment > - Factor duplicate code into shared method > - Heal discovered oops in common place for degen and concurrent update refs > - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing > - Clear bootstrap mode for full GC that might have bypassed degenerated cycle > - Do not bypass card barrier when healing discovered list > - ... and 9 more: https://git.openjdk.org/jdk/compare/400d8cfb...f621b70c This change has now passed internal testing pipelines several times. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28810#issuecomment-3711294483 From wkemper at openjdk.org Mon Jan 5 17:11:15 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Jan 2026 17:11:15 GMT Subject: RFR: 8373203: Genshen: Non-strong reference leak in old gen [v7] In-Reply-To: References: Message-ID: > The generational mode for Shenandoah will collect _referents_ for the generation being collected. For example, if we have a young reference pointing to an old referent, that young reference will be processed after we finish marking the old generation. This presents a problem for discovery. > > When the young mark _encounters_ a young reference with an old referent, it cannot _discover_ it because old marking hasn't finished. However, if it does not discover it, the old referent will be strongly marked. This, in turn, will prevent the old generation from clearing the referent (if it even reaches it again during old marking). > > To solve this, we let young reference processing discover the old reference by having it use the old generation reference processor to do so. This means the old reference processor can have a discovered list that contains young weak references. If any of these young references reside in a region that is collected, old reference processing will crash when it processes such a reference. Therefore, we add a method `heal_discovered_lists` to traverse the discovered lists after young evacuation is complete. The method will replace any forwarded entries in the discovered list with the forwardee. > > This PR also extends whitebox testing support for Shenandoah, giving us the ability to trigger young/old collections and interrogate some properties of heaps and regions. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Heal discovered lists for any young collection coincides with old marking - Configure thread local mark closure on delegated old reference processor - Merge remote-tracking branch 'jdk/master' into fix-old-reference-processing - Fix idiosyncratic white space in whitebox Co-authored-by: Stefan Karlsson - Sort includes - Heal old discovered lists in parallel - Fix comment - Factor duplicate code into shared method - Heal discovered oops in common place for degen and concurrent update refs - ... and 12 more: https://git.openjdk.org/jdk/compare/4458cab4...ed0d0272 ------------- Changes: https://git.openjdk.org/jdk/pull/28810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28810&range=06 Stats: 669 lines in 20 files changed: 537 ins; 84 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/28810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28810/head:pull/28810 PR: https://git.openjdk.org/jdk/pull/28810 From wkemper at openjdk.org Mon Jan 5 17:13:08 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Jan 2026 17:13:08 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v3] In-Reply-To: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: > This PR attempts to simplify the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash - Fix typo in assertion message - Take regulator thread out of STS before requesting GC The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. - Add comments - Revert back to what should be on this branch - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash - Don't know how this file got deleted - Carry over gc cancellation to gc request - Do not let allocation failure requests be overwritten by other requests - Fix degen point handling - ... and 3 more: https://git.openjdk.org/jdk/compare/4458cab4...8f4f55db ------------- Changes: https://git.openjdk.org/jdk/pull/28932/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28932&range=02 Stats: 95 lines in 4 files changed: 45 ins; 17 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/28932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28932/head:pull/28932 PR: https://git.openjdk.org/jdk/pull/28932 From kdnilsen at openjdk.org Mon Jan 5 19:31:14 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 19:31:14 GMT Subject: RFR: 8312116: JDK GenShen: make instantaneous allocation rate triggers more timely Message-ID: After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. 2. Sample allocation rates more frequently than once every 100 ms. 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. 4. When we detect acceleration of workload, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. ------------- Commit messages: - Change type of command-line args - fix white space - Add override to virtual methods - Fix race between allocation reporting and querying - add debug instrumentation - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - add instrumentation and fix bugs - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - some debug instrumentation - Merge remote-tracking branch 'origin/accelerated-triggers' into accelerated-triggers-gh - ... and 49 more: https://git.openjdk.org/jdk/compare/400d8cfb...c7046b5c Changes: https://git.openjdk.org/jdk/pull/29039/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312116 Stats: 1529 lines in 26 files changed: 1423 ins; 34 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From kdnilsen at openjdk.org Mon Jan 5 19:31:14 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 19:31:14 GMT Subject: RFR: 8312116: JDK GenShen: make instantaneous allocation rate triggers more timely In-Reply-To: References: Message-ID: On Mon, 5 Jan 2026 15:10:52 GMT, Kelvin Nilsen wrote: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of workload, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. This PR shows very slight improvements on specjbb tests: ~/github/jdk.accelerated-triggers/build/linux-x86_64-server-release/jdk/bin/java \ -XX:+UnlockExperimentalVMOptions \ -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms10g -Xmx10g -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \ -javaagent:/home/kdnilsen/lib/jHiccup-2.0.10/jHiccup.jar=-l,results/specjbb2015-master-jhiccup.log,-i,1000,-a \ -Xlog:async \ -Xlog:gc*=info \ -Xlog:safepoint*=info \ -Xlog:handshake*=info \ -jar /home/kdnilsen/lib/specjbb2015/specjbb2015.jar \ -m composite -ikv \ -p /home/kdnilsen/lib/specjbb2015/config/specjbb2015.props \ -raw /home/kdnilsen/lib/specjbb2015/config/template-C.raw >$t.accelerated-trigger.specjbb2015.out 2>$t.accelerated-triggers.specjbb2015.err

We have tested this new PR out with several different heap sizes on a particular Extremem workload and provide the results here. With 16GB heap size, both master and accelerated-triggers perform poorly. We consider the JVM to be under provisioned for this workload, and the behavior of accelerated-triggers is considered acceptable compared to master in this configuration. Accelerated-triggers has 0.24% to 30.5% worse latency across reported response-time percentiles. On average, it performs 57% more GC cycles, resulting in 50% fewer degenerated cycles (due to earlier triggers). CPU utilization is 0.60% higher.

With 20GB heap size, the benefits of accelerated-triggers are demonstrated in improved p50, p95, and p99 latencies. Note that accelerated-triggers is able to complete an average of 120% more old GCs than master. In this configuration, master is more vulnerable to starvation of old generation processing. Accelerated-triggers performed 30% fewer degenerated cycles and 30% fewer full GC cycles than master.

With 24GB heap size, both master and accelerated-triggers experienced degraded performance on one of five trials. This appears to have resulted from starvation of old-gen processing in both cases. Even so, the accelerated-triggers run was able to complete 5 old collections vs. only 4 completed old collections with master. For this configuration, we report both average results and trimmed average results. Average results favor accelerated-triggers at most percentiles. Trimmed average results favor master at most percentiles.

At 28GB heap size, accelerated-triggers shows signifcant strength compared to master. Three of five trials with master experienced degenerated cycles, and two of five trials with master experienced full GC. None of the five trials with accelerated-triggers experienced degenerated or full GC cycles. This manifests in generally better latency across all percentiles.

With the 31GB heap size, latencies are very similar between master and accelerated-triggers. Accelerated-triggers consumes 15% more CPU as it is performing 103% more GCs. Note that accelerated-triggers completes one more old GC than master, demonstrating that it is less vulnerable than master to starvation of old-gen processing.

Note that typical service deployments tend to be provisioned with excess resources. This allows the services to operate more reliably under transient spikes in client workload, and avoids "rare" triggering missteps that cause unwanted degenerated and full GC cycles. This particular workload would most typically be deployed today with a 31G heap if it were a production service. A goal of the GenShen engineering team is to enable more frugal use of CPU and memory resources. In the longer term, we would hope to enable reliable production deployment of this workload in 28GB or 24GB of memory. We have observed for some workload that accelerated-triggers increases contention between young-generation and old-generation GC activities, because it often forces more frequent young-generation activities. In practice, this is often balanced by more timely collection of young, which reduces "urgent" young collection efforts that occur when the JVM is under duress. Other development efforts are under way to allow more graceful cooperation between young-generation and old-generation concurrent activities when both feel the need to contend for CPU time. The workload used in the above tests is represented by this script: ~/github/jdk.accelerated-triggers/build/linux-x86_64-server-release/images/jdk/bin/java \ -XX:ActiveProcessorCount=16 \ -XX:+UnlockExperimentalVMOptions \ -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms$m -Xmx$m \ -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \ -XX:ShenandoahFullGCThreshold=1024 \ -XX:ShenandoahGuaranteedOldGCInterval=0 \ -XX:ShenandoahGuaranteedYoungGCInterval=0 \ -Xlog:"gc*=info,ergo" \ -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \ -XX:+UnlockDiagnosticVMOptions \ -jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \ -dDictionarySize=3000000 \ -dNumCustomers=9000000 \ -dNumProducts=240000 \ -dCustomerThreads=800 \ -dAllowAnyMatch=false \ -dCustomerPeriod=2s \ -dCustomerThinkTime=300ms \ -dKeywordSearchCount=4 \ -dSelectionCriteriaCount=2 \ -dProductReviewLength=12 \ -dServerThreads=5 \ -dServerPeriod=10s \ -dProductNameLength=10 \ -dBrowsingHistoryQueueCount=5 \ -dSalesTransactionQueueCount=5 \ -dProductDescriptionLength=320 \ -dProductReplacementPeriod=60s \ -dProductReplacementCount=25 \ -dCustomerReplacementPeriod=60s \ -dCustomerReplacementCount=1500 \ -dBrowsingExpiration=1m \ -dPhasedUpdates=true \ -dPhasedUpdateInterval=60s \ -dSimulationDuration=25m \ -dResponseTimeMeasurements=100000 \ >$t.$m.genshen.medium.accelerated.out \ 2>$t.$m.genshen.medium.accelerated.err & job_pid=$! sleep 1500 cpu_percent=$(ps -o cputime -o etime -p $job_pid) rss_kb=$(ps -o rss= -p $job_pid) rss_mb=$((rss_kb / 1024)) wait $job_pid echo "RSS: $rss_mb MB" >>$t.$m.genshen.medium.accelerated.out echo "$cpu_percent" >>$t.$m.genshen.medium.accelerated.out gzip $t.$m.genshen.medium.accelerated.out $t.$m.genshen.medium.accelerated.err ------------- PR Comment: https://git.openjdk.org/jdk/pull/29039#issuecomment-3710878539 PR Comment: https://git.openjdk.org/jdk/pull/29039#issuecomment-3711727740 From wkemper at openjdk.org Mon Jan 5 19:57:27 2026 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Jan 2026 19:57:27 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v2] In-Reply-To: References:

Message-ID: <2eKXF6_uIhKFz0g0791S9sJfRhorjlm2ssVUvCpClMc=.4ad1e73b-b0e4-416b-9c33-2cb9d4bceda2@github.com> On Mon, 5 Jan 2026 19:54:04 GMT, Kelvin Nilsen wrote: >> After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: >> >> 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. >> 2. Sample allocation rates more frequently than once every 100 ms. >> 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. >> 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > add another override Can we remove the `KELVIN_*` macros? Perhaps fine tune some of the logging to `log_trace(gc, ergo)` or `log_debug(gc, ergo)` where appropriate? ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29039#pullrequestreview-3628222792 From kdnilsen at openjdk.org Mon Jan 5 19:57:25 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 19:57:25 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v2] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: add another override ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29039/files - new: https://git.openjdk.org/jdk/pull/29039/files/c7046b5c..43664d66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From kdnilsen at openjdk.org Mon Jan 5 20:25:21 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 20:25:21 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v2] In-Reply-To: <2eKXF6_uIhKFz0g0791S9sJfRhorjlm2ssVUvCpClMc=.4ad1e73b-b0e4-416b-9c33-2cb9d4bceda2@github.com> References:

<2eKXF6_uIhKFz0g0791S9sJfRhorjlm2ssVUvCpClMc=.4ad1e73b-b0e4-416b-9c33-2cb9d4bceda2@github.com> Message-ID: On Mon, 5 Jan 2026 19:54:04 GMT, William Kemper wrote: > Can we remove the `KELVIN_*` macros? Perhaps fine tune some of the logging to `log_trace(gc, ergo)` or `log_debug(gc, ergo)` where appropriate? So sorry. Forgot I still had all of that in there. Coming out now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29039#issuecomment-3711960984 From kdnilsen at openjdk.org Mon Jan 5 20:39:03 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 20:39:03 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove develop/debug instrumentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29039/files - new: https://git.openjdk.org/jdk/pull/29039/files/43664d66..959b274c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=01-02 Stats: 498 lines in 10 files changed: 0 ins; 497 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From xpeng at openjdk.org Mon Jan 5 21:04:45 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 5 Jan 2026 21:04:45 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: Message-ID: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - Fix build error after merging from tip - Merge branch 'master' into cas-alloc-1 - Merge branch 'master' into cas-alloc-1 - Some comments updates as suggested in PR review - Fix build failure after merge - Expend promoted from ShenandoahOldCollectorAllocator - Merge branch 'master' into cas-alloc-1 - Address PR comments - Merge branch 'openjdk:master' into cas-alloc-1 - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=19 Stats: 1644 lines in 25 files changed: 1296 ins; 235 del; 113 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From kdnilsen at openjdk.org Mon Jan 5 21:36:11 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 21:36:11 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v5] In-Reply-To: References: Message-ID: > Add a triggering penalty when we execute degenerated GC cycle. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: touch file to force tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28834/files - new: https://git.openjdk.org/jdk/pull/28834/files/87b41568..7b0efb3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28834/head:pull/28834 PR: https://git.openjdk.org/jdk/pull/28834 From kdnilsen at openjdk.org Mon Jan 5 21:49:35 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 5 Jan 2026 21:49:35 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v15] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: touch file to force retest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/7b9c4d64..6480fef2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From dholmes at openjdk.org Tue Jan 6 02:02:16 2026 From: dholmes at openjdk.org (David Holmes) Date: Tue, 6 Jan 2026 02:02:16 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed [v4] In-Reply-To: References:

Message-ID: On Sun, 28 Dec 2025 03:56:39 GMT, Sergey Bylokhov wrote: >> The copyright year in hotspot files updated in 2025 has been bumped to 2025. (to minimize... the patch...for now, all files modified by the commits in src/hotspot have been updated only.) >> >> The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: >> >> ~~`git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done `~~ >> >> `git diff origin/master --name-only | while read f; do git log origin/master --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` > > Sergey Bylokhov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into copy_hotspot > - 8374316: Update copyright year to 2025 for hotspot in files where it was missed Just be aware that if a file was created as part of a refactoring and the code was taken as-is from an existing file, then the copyright year range should have remained the same as the original file. I don't know if any of the files you modified fall into that category but just wanted to point out that looking at the commit date is not always correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28970#issuecomment-3712798915 From wkemper at openjdk.org Tue Jan 6 20:49:25 2026 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Jan 2026 20:49:25 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v2] In-Reply-To: <_QnV10ychv2AQj3TN6gch1p8B-OGMTsN6FTcbBJSn9U=.cba48ddd-cb45-4e25-9633-5ef7e9cfa4ea@github.com> References:

<_QnV10ychv2AQj3TN6gch1p8B-OGMTsN6FTcbBJSn9U=.cba48ddd-cb45-4e25-9633-5ef7e9cfa4ea@github.com> Message-ID: On Tue, 16 Dec 2025 23:30:58 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 321: >> >>> 319: op_degenerated_futile(); >>> 320: } else { >>> 321: _generation->heuristics()->record_unsuccessful_degenerated(); >> >> Suggestion: >> >> _generation->heuristics()->record_successful_degenerated(); >> >> I think the confusion here is that we are conflating `progress` and `success`. The "progress" notion here is about triggering a full GC or giving up entirely. The degenerated cycle is "successful" because it did not run a full GC. Maybe we should rename `record_successful_degenerated` to `record_degenerated` (or, perhaps even `apply_degenerated_penalty`). I was about to suggest we pull `record_success_degenerated` out of the logic entirely, but that would mean upgraded degen cycles would be penalized again when the full GC completes. > > May be let the heuristics (or the policy) track progress as well, and inform the actuator (i.e. op degenerated) whether it should upgrade to a full gc. It almost feels like heuristics and policy and actuator are leaking abstractions. It feels like heuristics keep track of the model parameters and learn from sensors, and the policy consults a specific heuristic to inform actuator (i.e. actions). > > By that model, you'd have the actuator sending the sensor information to the heuristics and asking the policy (or the heuristics, if you conflate heuristics and policy) to decide which step to take next. It would seem that evaluation of the notion of progress then moves to the policy too. @kdnilsen , what do you think about having a single method called `record_degenerated`. It's a matter of fact without conflating progress and success. I don't like having duplicated code between `record_success_degenerated` and `record_unsuccessful_degenerated`. I understand what @ysramakrishna is saying, and I agree, but I think a change like that is beyond the scope of this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2666242735 From wkemper at openjdk.org Tue Jan 6 22:31:38 2026 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Jan 2026 22:31:38 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References:

Message-ID: On Mon, 5 Jan 2026 20:39:03 GMT, Kelvin Nilsen wrote: >> After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: >> >> 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. >> 2. Sample allocation rates more frequently than once every 100 ms. >> 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. >> 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove develop/debug instrumentation Took another look over this. There is a lot to get through. I'll have more later. src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 179: > 177: " after adjusting for spike_headroom: %zu%s" > 178: " and penalties: %zu%s", _is_generational? _space_info->name(): "Global", > 179: byte_size_in_proper_unit(mutator_available), proper_unit_for_byte_size(mutator_available), Can we use the `PROPERFMT/PROPERFMTARGS` macros for these? I find they really improve readability. src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 199: > 197: > 198: // There is no headroom during evacuation and update refs. This information is not used to trigger the next GC. > 199: // Rather, it is made available to support throttling of allocations during GC. Is that true? or is allocation throttling part of another change? src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 275: > 273: } > 274: > 275: void ShenandoahAdaptiveHeuristics::add_gc_time(double timestamp, double gc_time) { Could we use `TruncatedSeq::predict_next` here? src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1218: > 1216: } else { > 1217: heap->heuristics()->start_idle_span(); > 1218: } Suggestion: _generation->heuristics()->start_idle_span(); ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29039#pullrequestreview-3632535527 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2666483213 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2666485239 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2666489076 PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2666310682 From wkemper at openjdk.org Tue Jan 6 23:18:02 2026 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Jan 2026 23:18:02 GMT Subject: RFR: 8314599: [GenShen] Couple adaptive tenuring and generation size budgeting [v14] In-Reply-To: References: Message-ID: > Notable changes: > * Improvements to logging > * More accurate tracking of promotion failures > * Use shared allocation for promotions only when the size is above the maximum plab size (not the minimum size) > * Use census information gathered during mark to size promotion reserves and old generation > > With these changes, GenShen is expected to have fewer promotion failures and this is indeed the case. As a result of this, we expect less time to be spent in concurrent marking and update refs for young collections. We may also expect shorter concurrent evacuation phases because GenShen will have fewer densely packed regions stuck in the young generation. With more objects being promoted, we also expect to see longer remembered set scan times. This is generally the case across all benchmarks, but we do also see some counter-intuitive results. > > Here we are comparing 20 executions (10 on x86, 10 on aarch64) of the changes in the PR (experiment) against 20 executions of the same benchmarks results from tip. This is a summary of statistically significant changes of more than 5% across all benchmarks: > > > Concurrent Evacuation: 7 improvements, 3 regressions > ? Best improvements: extremem-large-45g (-29.6%), neo4j-analytics (-26.9%) > ? Worst regression: xalan (+53.7%) > > Concurrent Marking: 15 improvements, 1 regression > ? Best improvements: hyperalloc_a2048_o4096 (-30.1%), crypto.rsa (-27.3%) > ? Only regression: serial (+8.9%) > > Concurrent Scan Remembered Set: 7 improvements, 2 regressions > ? Best improvements: xalan (-49.4%), pmd (-49.0%), crypto.rsa (-41.8%) > ? Worst regression: extremem-phased (+52.4%) > > Concurrent Update Refs: 5 improvements, 4 regressions > ? Best improvements: crypto.rsa (-36.4%), mnemonics (-28.4%) > ? Worst regression: xalan (+89.4%) William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 81 commits: - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Fix comments, add back an assertion - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Accommodate behavior of global heuristic - Restore missing update for inplace promotion padding - Remove reference to adaptive tuning flag - Remove commented out assertion - Merge remote-tracking branch 'jdk/master' into promotion-budget-improvements - Adaptive tenuring is no longer optional We are using age census data to compute promotion reserves. The tenuring threshold may still be fixed by setting the min/max threshold to the same value. - ... and 71 more: https://git.openjdk.org/jdk/compare/7c979c14...f460f115 ------------- Changes: https://git.openjdk.org/jdk/pull/27632/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27632&range=13 Stats: 398 lines in 11 files changed: 158 ins; 173 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/27632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27632/head:pull/27632 PR: https://git.openjdk.org/jdk/pull/27632 From kdnilsen at openjdk.org Wed Jan 7 00:36:14 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:14 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Mon, 5 Jan 2026 21:04:45 GMT, Xiaolong Peng wrote: >> Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: >> >> * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. >> * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. >> * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. >> * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` >> >> I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: >> >> 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. >> >> java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" >> >> >> Openjdk TIP: >> >> ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== >> ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== >> ===== DaCapo tail ... > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: > > - Merge branch 'openjdk:master' into cas-alloc-1 > - Fix build error after merging from tip > - Merge branch 'master' into cas-alloc-1 > - Merge branch 'master' into cas-alloc-1 > - Some comments updates as suggested in PR review > - Fix build failure after merge > - Expend promoted from ShenandoahOldCollectorAllocator > - Merge branch 'master' into cas-alloc-1 > - Address PR comments > - Merge branch 'openjdk:master' into cas-alloc-1 > - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 This is a huge PR. Thanks for working through all the details to get this working. I've identified several issues that I believe require some further attention. We can discuss in a meeting if that would be helpful. src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 80: > 78: for (size_t i = 0; i < num_regions; i++) { > 79: ShenandoahHeapRegion* region = heap->get_region(i); > 80: assert(!region->is_active_alloc_region(), "Not expecting any active alloc region at the time"); Might change comment to: "Should be no active alloc regions when choosing collection set" src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 102: > 100: for (size_t i = 0; i < num_regions; i++) { > 101: ShenandoahHeapRegion* region = heap->get_region(i); > 102: assert(!region->is_active_alloc_region(), "Not expecting any active alloc region at the time"); Same suggestion here as with shenandoahGenerationalHeuristics.cpp. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 110: > 108: } > 109: > 110: uint dummy = 0; Don't call this "dummy". Call it regions_ready_for_refresh. Remember the value and pass it in as a new argument to attempt_allocation_slow() so that we don't have to recompute it later. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 114: > 112: HeapWord* obj = attempt_allocation_in_alloc_regions(req, in_new_region, alloc_start_index(), dummy); > 113: if (obj != nullptr) { > 114: return obj; Even in the case that we successfully fill our allocation request, if regions_ready_for_refresh is greater than some percentage of _alloc_region_count (e.g. > _alloc_region_count / 4), then we should grab the heap lock and refresh_alloc_regions() here. Otherwise, we will gradually degrade the number of directly_allocatable_regions until we are down to one before we refresh any of them. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 133: > 131: ShenandoahHeapAccountingUpdater accounting_updater(_free_set, ALLOC_PARTITION); > 132: > 133: if (regions_ready_for_refresh > 0u) { Since we've already taken the heap lock because we failed to allocate "fast", I'm ok to go ahead and refresh any regions that are ready right now, even if it's only 1 region. I'm wondering if we can avoid thrashing in the case that there are no more regions available. We might want to keep a state variable that represents whether there exist free-set regions with which to refresh our cache. This could be updated whenever we "add to" or "rebuild" the free set, and whenever refresh_alloc_regions() find there is insufficient supply to demand. We would want to avoid repeated calls to refresh_alloc_regions() if there are no "refresh_regions_available". src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 192: > 190: uint i = alloc_start_index; > 191: do { > 192: if (ShenandoahHeapRegion* r = nullptr; (r = _alloc_regions[i].address) != nullptr && r->is_active_alloc_region()) { Note that there is a race (and performance overhead) with checking r->is_active_alloc_region(). Though a region might be active when we check it here, it may be inactive by the time we attempt to atomic_allocate_in(). This is one reason I prefer to use "volatile_top == end" to denote !is_active_alloc_region. This way, you only have to check once (rather than checking is_active() and then checking has_available()). And there is no race between when you check and when you attempt to allocate. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 194: > 192: if (ShenandoahHeapRegion* r = nullptr; (r = _alloc_regions[i].address) != nullptr && r->is_active_alloc_region()) { > 193: bool ready_for_retire = false; > 194: HeapWord* obj = atomic_allocate_in(r, true, req, in_new_region, ready_for_retire); Insert before atomic_allocate_in: int contended Pass this as 6th arg to atomic_allocate_in() Add this code after atomic_allocate_in(): if ((i == alloc_start_index) && (contended > 1)) { randomize_start_index(); // I think this is realized by setting _alloc_start_index to UINT_MAX } src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 203: > 201: } > 202: } else if (r == nullptr || !r->is_active_alloc_region()) { > 203: regions_ready_for_refresh++; Add this code: if (i == alloc_start_index) { randomize_start_index(); } src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 214: > 212: > 213: template > 214: HeapWord* ShenandoahAllocator::atomic_allocate_in(ShenandoahHeapRegion* region, bool const is_alloc_region, ShenandoahAllocRequest &req, bool &in_new_region, bool &ready_for_retire) { Add argument: int &contended src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 219: > 217: size_t actual_size = req.size(); > 218: if (req.is_lab_alloc()) { > 219: obj = region->allocate_lab_atomic(req, actual_size, ready_for_retire); Pass contended arg to allocate_lab_atomic() src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 221: > 219: obj = region->allocate_lab_atomic(req, actual_size, ready_for_retire); > 220: } else { > 221: obj = region->allocate_atomic(actual_size, req, ready_for_retire); Pass contended arg to allocate_lab_atomic() src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 233: > 231: // evacuation are not updated during evacuation. For both young and old regions r, it is essential that all > 232: // PLABs be made parsable at the end of evacuation. This is enabled by retiring all plabs at end of evacuation. > 233: region->concurrent_set_update_watermark(region->top()); There's a race here. Multiple mutators may be updating watermark in parallel. It may be that the mutator who most recently allocated is not the mutator who makes the "most recent" overwrite of set_update_watermark(). I think the better fix is to remove this code. Update refs should just assume that update watermark equals top for any region in the Old gen, and for any region that was in the Collector partition. It may not be easy to know which regions were "in the Collector partition". Maybe we use a Sentinel value for update_watermark on all such regions. Just overwrite update_watermark(nullptr)? And check for this in update-refs? Needs a solution, and solution needs to be documented in code comments. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 254: > 252: // Step 1: find out the alloc regions which are ready to refresh. > 253: for (uint i = 0; i < _alloc_region_count; i++) { > 254: ShenandoahAllocRegion* alloc_region = &_alloc_regions[i]; We've got the heap lock here. why does this need to be atomic? Comments in the code should make this clear. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 263: > 261: } > 262: if (ALLOC_PARTITION == ShenandoahFreeSetPartitionId::Mutator) { > 263: if (free_bytes > 0) { We should have counted the entire region's available bytes as allocated when we made this a directly allocatable region. We should not need to further increase bytes allocated here. I would like to see an assert(free_bytes < PLAB::min_size() * HeapWordSize) here. Eventually, I'd want to generalize this code so that we could refresh regions that are not yet ready to be retired. In this case, we would want to unretire the region here. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 280: > 278: // Step 2: allocate region from FreeSets to fill the alloc regions or satisfy the alloc request. > 279: ShenandoahHeapRegion* reserved[MAX_ALLOC_REGION_COUNT]; > 280: int reserved_regions = _free_set->reserve_alloc_regions(ALLOC_PARTITION, refreshable_alloc_regions, I request we get rid of the min_free_words argument to free_set->reserve_alloc_regions(). See comments in the called function. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 304: > 302: log_debug(gc, alloc)("%sAllocator: Storing heap region %li to alloc region %i", > 303: _alloc_partition_name, reserved[i]->index(), refreshable[i]->alloc_region_index); > 304: AtomicAccess::store(&refreshable[i]->address, reserved[i]); Should not need to perform AtomicAccess because we hold the heap lock here. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 316: > 314: HeapWord* ShenandoahAllocator::allocate(ShenandoahAllocRequest &req, bool &in_new_region) { > 315: #ifdef ASSERT > 316: verify(req); Insert a comment above verify(): "Conform that req corresponds to ALLOC_PARTITION" src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 338: > 336: for (uint i = 0; i < _alloc_region_count; i++) { > 337: ShenandoahAllocRegion& alloc_region = _alloc_regions[i]; > 338: ShenandoahHeapRegion* r = AtomicAccess::load(&alloc_region.address); We've got heap lock and at safepoint. Do not need AtomicAccess here. That is more costly than necessary. I prefer to use regular fetch. If you prefer to keep AtomicAccess, please provide a comment in the code explaining why and we will revist. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 345: > 343: r->unset_active_alloc_region(); > 344: } > 345: AtomicAccess::store(&alloc_region.address, static_cast(nullptr)); Same here. We do not need AtomicAccess. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 350: > 348: total_free_bytes += free_bytes; > 349: total_regions_to_unretire++; > 350: _free_set->partitions()->unretire_to_partition(r, ALLOC_PARTITION); When we reserved this directly allocatable region, we increased bytes allocated() if the ALLOC_PARTITION was mutator. Here, we need to undo that: if (ALLOC_PARTITION == ShenandoahFreeSetPartitionId::Mutator) { decrease_bytes_allocated(free_bytes); } src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 353: > 351: if (!r->has_allocs()) { > 352: log_debug(gc, alloc)("%sAllocator: Reverting heap region %li to FREE due to no alloc in the region", > 353: _alloc_partition_name, r->index()); This code looks suspect to me. Maybe it works as is only because we are currently doing this only immediately before rebuilding free set. If that's the case, there should be some documentation and maybe even some asserts that confirm it is true. When we release_alloc_regions(), we should be adjusting the range for the associated partitions. The code that most closely resembles this functionality is in ShenandoahFreeSet::move_regions_from_collector_to_mutator(). This is the code that moves collector and old-collector partitions to the mutator partition after evacuation is done. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 360: > 358: } > 359: } > 360: assert(AtomicAccess::load(&alloc_region.address) == nullptr, "Alloc region is set to nullptr after release"); Do not need AtomicAccess here src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 364: > 362: _free_set->partitions()->decrease_used(ALLOC_PARTITION, total_free_bytes); > 363: _free_set->partitions()->increase_region_counts(ALLOC_PARTITION, total_regions_to_unretire); > 364: accounting_updater._need_update = true; Here is where you know which tallies have been affected by this operation. This is where you should specialize the calls to freeset recompute_total_used() and recompute_total_affiliated(). Either call those from here, or add parameters to your accounting_updater object so that you do not have to overcompute each operation. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 376: > 374: } > 375: > 376: THREAD_LOCAL uint ShenandoahMutatorAllocator::_alloc_start_index = UINT_MAX; I raised questions about this in a previous review. Have I overlooked your response? What is the tradeoff between declaring this THREAD_LOCAL vs. creating a new field in ShenandoahThreadLocal? I believe we need to use fields of ShenandoahThreadLocal so that we do not incur an overhead on all threads when JVM is not configured for Shenandoah GC. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 423: > 421: _yield_to_safepoint = false; > 422: } > 423: I suppose ShenandoahCollectorAllocator::randomize_start_index() might be a no-op. On the other hand, it would probably be better to use a random index for ShenandoahCollectorAllocator as well. We don't want to hobble one GC worker more than the others just because its preferred start index happens to hold a retire-ready region. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 428: > 426: } > 427: > 428: HeapWord* ShenandoahOldCollectorAllocator::allocate(ShenandoahAllocRequest& req, bool& in_new_region) { Confer with William Kemper about this. He is working on a change that may simplify the handling of PLABs, in which case ShenandoahOldCollectorAllocator can behave the same as ShenandoahCollector. src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 436: > 434: // Make sure the old generation has room for either evacuations or promotions before trying to allocate. > 435: auto old_gen = ShenandoahHeap::heap()->old_generation(); > 436: if (req.is_old() && !old_gen->can_allocate(req)) { This test for req.is_old() appears to be unnecessary. The verify(req) assert above requires that req.is_old(). Perhaps the verify() method is too abstract. Add a comment there that says: "Confirm that req.is_old()" src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 56: > 54: virtual uint alloc_start_index() { return 0u; } > 55: > 56: // Attempt to allocate Comment needs to make clear that this is the main entry point for fast-path allocation from a directly allocatable region. This function delegates to slow-path allocation if it is unable to allocate from the directly allocatable regions. Not sure I like the name "attempt_allocation()". All of our allocation routines attempt to allocate and return a sentinel value (nullptr) if the allocation fails. This is no different. Just call it allocate_work(), and clarify that this is the helper routine of allocate() which does the work of allocating from a directly allocatable region without acquiring the heap lock if that is possible, and otherwise does a slow-path allocation which requires acquisition of the heap lock. I see that your comments are trying to say this. But the comments as written are not easy to understand. src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 69: > 67: > 68: // Attempt to allocate in a shared alloc region using atomic operation without holding the heap lock. > 69: // Returns nullptr and overwrites regions_ready_for_refresh with the number of shared alloc regions that are ready Suggest this edit: // Overwrites regions_ready_for_refresh with a lower bound on the number of shared alloc regions that are ready // to be retired during execution of this "do_fast_allocation" function. Returns nullptr if the allocation request could // not be fulfilled after a single traversal of directly allocatable regions. src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 79: > 77: int refresh_alloc_regions(ShenandoahAllocRequest* req = nullptr, bool* in_new_region = nullptr, HeapWord** obj = nullptr); > 78: #ifdef ASSERT > 79: virtual void verify(ShenandoahAllocRequest& req) { } Need a comment to explain what verify does. Is this simply checking to make sure the req is "properly formatted"? I think the intention is to enforce that req affiliation corresponds to ALLOC_PARTITION. Would be good to clarify this in the comment. Do we need this to be virtual? It seems like a single templated implementation would suffice. src/hotspot/share/gc/shenandoah/shenandoahAllocator.hpp line 91: > 89: virtual HeapWord* allocate(ShenandoahAllocRequest& req, bool& in_new_region); > 90: virtual void release_alloc_regions(); > 91: virtual void reserve_alloc_regions(); Need comments on these functions. Clarify pre-conditions and post-conditions. I think the intention is: 1. allocate(): Caller does not hold the heap lock. All allocations by mutator or GC are fulfilled by this function. This function tries to perform a CAS allocation without obtaining the global heap lock. If that fails, it will obtain the global heap lock and do a free-set allocation. As a side effect of doing a free-set allocation, some number of directly allocatable regions may be retired and replaced with new directly allocatable regions. 2. release_alloc_regions(): Caller must hold the heap lock. This causes all directly allocatable regions to be placed into the appropriate ShenandoahFreeSet partition. We do this in preparation for choosing a collection set and/or rebuilding the freeset. 3. reserve_alloc_regions(): Caller must hold the heap lock. This causes us to set aside N regions as directly allocatable by removing these regions from the relevant ShenandoahFreeSet partitions. Explain what happens if there are not N regions available. Clarify: these three function represent the entirety of the "public mutation API" that is exercised by mutators and GC workers as they interact with the free set? (There is another set of functions that could be characterized as the read-only API for obtaining state information about the free set. This provides information such as available memory, allocated bytes since GC start, etc.) src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 3045: > 3043: } > 3044: > 3045: int ShenandoahFreeSet::reserve_alloc_regions(ShenandoahFreeSetPartitionId partition, int regions_to_reserve, size_t min_free_words, ShenandoahHeapRegion** reserved_regions) { I request that we not enforce min_free_words when reserving allocation regions. This defeats the purpose of allocation bias. The objective is to consume fragmented memory early in the GC cycle (when we have more mitigation options if an allocation request ever fails). Note that every region that is in any partition has at least PLAB::min_size() available memory. By requiring that MUTATOR regions have PLAB::max_size() words, we are forcing ourselves to never consume the fragmented memory regions. (Towards the end of GC, when memory is in short supply, we will be unable to find directly allocatable MUTATOR regions. This will force ourselves to obtain the heap lock for every allocation. And these allocations will be inefficient because the remaining memory is highly fragmented.) src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 167: > 165: }; > 166: > 167: HeapWord* ShenandoahHeapRegion::allocate_atomic(size_t size, const ShenandoahAllocRequest& req, bool &ready_for_retire) { Suggest we add a fourth arg: int &contended We initialize contended to zero src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 187: > 185: return nullptr; > 186: } > 187: } Before iterating, increment contended by 1 src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 190: > 188: } > 189: > 190: HeapWord* ShenandoahHeapRegion::allocate_lab_atomic(const ShenandoahAllocRequest& req, size_t &actual_size, bool &ready_for_retire) { Suggest we add a fourth arg: int &contended We initialize contended to zero src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 218: > 216: return nullptr; > 217: } > 218: } Before we iterate, we increment contended by 1 src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 304: > 302: } > 303: > 304: inline void ShenandoahHeapRegion::concurrent_set_update_watermark(HeapWord* w) { See comment elsewhere in my feedback. I think we may want to use a special sentinel value to denote that watermark for Collector and OldCollector regions. For both of these, there is essentially not watermark value. If we try to set the value to top() from within a CAS-allocating mutator thread, we can end up setting watermark to the not-most-recent value of top(), which would result in misbehavior during update refs. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 567: > 565: "0 will allow back to back young collections to run during old " \ > 566: "collections.") \ > 567: \ once we resolve the various issues identified in feedback comments, I would be interested in results of experimenting with different values of these two parameters... ------------- Changes requested by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/26171#pullrequestreview-3628853514 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663181721 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663183357 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665709301 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665818148 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665800328 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666506691 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666328083 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666332994 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666334248 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666334844 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666335529 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666360404 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666366038 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666526965 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666564758 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666566961 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666567671 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663324871 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663327493 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666583027 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666637281 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666628228 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663337002 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663279917 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666642051 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666643974 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665511567 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665632758 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666273440 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663265276 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663261232 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666553882 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666309782 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666309888 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666310835 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666311617 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666683738 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666691665 From kdnilsen at openjdk.org Wed Jan 7 00:36:17 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:17 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v17] In-Reply-To: <2YVE3uR8bRJ_8qAtXN4WoRmeY0Y9xhzsmKbxqv5oL2M=.2811f02e-35b8-46cf-863a-db4006ca1a78@github.com> References:

<2YVE3uR8bRJ_8qAtXN4WoRmeY0Y9xhzsmKbxqv5oL2M=.2811f02e-35b8-46cf-863a-db4006ca1a78@github.com> Message-ID: <4Qs9QlvWQ7gX6RW-rwPVJa-Ndhtx883aImTtiXyCYGk=.1c03a208-d6e5-42bc-bfc3-d1f95e968929@github.com> On Tue, 9 Dec 2025 21:03:21 GMT, Xiaolong Peng wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Some comments updates as suggested in PR review > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 41: > >> 39: _alloc_region_count(alloc_region_count), _free_set(free_set), _alloc_partition_name(ShenandoahRegionPartitions::partition_name(ALLOC_PARTITION)) { >> 40: if (alloc_region_count > 0) { >> 41: _alloc_regions = PaddedArray::create_unfreeable(alloc_region_count); > > Rethinking about the the PaddedArray used here, we may not really need it. > Allocator has multiple shared alloc regions for CAS, and only refreshes them when all of them run out of usable memory, so _alloc_regions won't be frequently updated, the PaddedArray here should have a negative performance impact. Are you running any experiments (on different hardware configurations) to test your assumptions about this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2663185400 From kdnilsen at openjdk.org Wed Jan 7 00:36:17 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:17 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v17] In-Reply-To: <4Qs9QlvWQ7gX6RW-rwPVJa-Ndhtx883aImTtiXyCYGk=.1c03a208-d6e5-42bc-bfc3-d1f95e968929@github.com> References:

<_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com> Message-ID: On Wed, 3 Dec 2025 01:09:34 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 100: >> >>> 98: HeapWord* ShenandoahAllocator::attempt_allocation(ShenandoahAllocRequest& req, bool& in_new_region) { >>> 99: if (_alloc_region_count == 0u) { >>> 100: ShenandoahHeapLocker locker(ShenandoahHeap::heap()->lock(), _yield_to_safepoint); >> >> Looking for more comments here as well. What does it mean that _alloc_region_count == 0? Does this mean we have not yet initialized the directly allocatable regions (following a particular GC event)? Or does it mean that we have depleted all of the available regions and we are out of memory? In the first case, it seems we would want to replenish our supply of directly allocatable regions while we hold the GC lock. In the second case, it seems there's really no value in even attempting a slow allocation. (If we were unable to refresh our directly allocatable regions, then it will not find allocatable memory even on the other side of the heap lock...) > > I'll add comments on this, _alloc_region_count == 0 means we don't want to use any shared alloc region, it will always allocate with a heap lock, ideally the performance should be same as before, so it always simply find a region with enough space and allocate in the region. Put the comments describing functions in the .hpp file, where they are currently. But we need to enhance those comments. >> src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 158: >> >>> 156: if (r != nullptr) { >>> 157: bool ready_for_retire = false; >>> 158: obj = atomic_allocate_in(r, false, req, in_new_region, ready_for_retire); >> >> Not sure why we use atomic_allocate_in() here. We hold the heap lock so we don't need to use atomic operations. >> We should clarify with comments. > > It is not really necessary to `atomic_allocate_in` here, but I wanted reuse some of the codes in atomic_allocate_in, we can discuss this later, I can change it back to non-atomic version. Would prefer not to use the atomic_allocate code here. If you want to reuse code, maybe you can refactor allocate_in with an template argument. I notice that this PR makes lots of ShenandoahHeapRegion variables volatile: _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs. That will cause less efficient code to be generated whenever we are accessing this data from "behind" the GC heap lock, which should be rare, I guess. I raised some concerns/issues about the race that happens when we move a region from the directly-allocatable set into the global-heap-lock-protected ShenandoahFreeSet partition. Here's the scenario that I'm concerned about: 1. A mutator obtains pointer to directly allocatable region R 2. A second mutator performs a refresh, moving region R out of directly allocatable set (for whatever reason) 3. Region R is now eligible to satisfy allocations from behind the global heap lock 4. Some third mutator thread acquires the heap lock and fetches top for region $ 5. The first mutator performs its allocation within the same region R, not recognizing a CAS conflict 6. This third mutator allocates from region R at top, without using CAS. So both mutators think they own the same object I think this is not a problem if the "only" reason at step 2 above that we move region R out of directly allocatable set is because R is ready to be retired. In that case, there will be no subsequent heap-locked allocations in regions R. However, I anticipate the day in not-too-distant future when we will want to refresh regions even when they are not ready to be retired. Specifically, as we move rebuild-freeset out of safepoints, we will want to refresh regions before we acquire heap-lock to do rebuild, with the goal of making sure there is sufficient directly allocatable memory available that no mutator will be stalled because it needs to allocate during the time that the heap remains locked for the rebuild operation. So I suppose that if we always use atomic_allocate() even for allocations that happen while holding the heap lock, we won't have this problem. If we decide to keep this architecture, there should be comments explaining why we are doing it this way. (I am not real happy that we have to "pay the cost" of CAS in addition to paying the cost of global heap lock, but I think these allocations should be very rare. It seems this would only come up if, for example, a mutator wanted to allocate an object that is 1/2 the heap region size, and none of the directly allocatable regions have that much available memory.) My original proposal was to have a volatile_top which is used by CAS allocation and a nonvolatile_top that is used by heap-lock allocation. When we make a region directly allocatable, we copy its nonvolatile_top to the volatile_top. When we take a directly allocatable region and move it into the heap-locked freeset, we use CAS to set its volatile_top to end before we place the region into the freeset partition, assigning to nonvolatile_top the value held in volatile_top before the CAS operation. Whatever solution is used for this needs to be documented in the code. Feel free to copy and paste from this github comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665714073 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2666236427 From kdnilsen at openjdk.org Wed Jan 7 00:36:21 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:21 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References:

<_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com>

Message-ID: <4BVpGlI0mK12KTI6b6l1UoH26vEwytUP1vlYX1Z-UJQ=.374ea1dd-38d7-46ea-b6a5-a99453171059@github.com> On Wed, 3 Dec 2025 01:15:03 GMT, Xiaolong Peng wrote: >> It is not an error, before calling into attempt_allocation_slow, it already called attempt_allocation_in_alloc_regions once and failed to allocate, slow path is always with heap lock. >> >> After taking the lock, we should try the attempt_allocation_in_alloc_regions right away, because other mutator thread may have refreshed the alloc regions while holding the lock. > > accounting_update is required for slow path, but you are right, it can be moved to somewhere later, e.g. line 128. My mistake on first read here. I see now that we only come into this function if fast-allocation failed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665689002 From kdnilsen at openjdk.org Wed Jan 7 00:36:21 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:21 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <4BVpGlI0mK12KTI6b6l1UoH26vEwytUP1vlYX1Z-UJQ=.374ea1dd-38d7-46ea-b6a5-a99453171059@github.com> References:

<_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com>

<4BVpGlI0mK12KTI6b6l1UoH26vEwytUP1vlYX1Z-UJQ=.374ea1dd-38d7-46ea-b6a5-a99453171059@github.com> Message-ID: On Tue, 6 Jan 2026 17:28:27 GMT, Kelvin Nilsen wrote: >> accounting_update is required for slow path, but you are right, it can be moved to somewhere later, e.g. line 128. > > My mistake on first read here. I see now that we only come into this function if fast-allocation failed. But part of the reason for my confusion is that you are trying to do fast allocations while holding the heap lock! The reason we came into attempt_allocation_slow() is because we already failed to attempt_allocation_in_alloc_regions(). There's no need to call this a second time. You should have remembered regions_ready_for_refresh and passed this in as an argument to attempt_allocation_slow(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665698471 From kdnilsen at openjdk.org Wed Jan 7 00:36:22 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 00:36:22 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References:

<_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com>

<4BVpGlI0mK12KTI6b6l1UoH26vEwytUP1vlYX1Z-UJQ=.374ea1dd-38d7-46ea-b6a5-a99453171059@github.com> Message-ID: <0IfaybNazrHlrdcGhk-0080a-KqpdUwN_WXpYVe_fjc=.d68c5098-02b7-4a91-8ad3-4e8b310cba0b@github.com> On Tue, 6 Jan 2026 17:31:57 GMT, Kelvin Nilsen wrote: >> My mistake on first read here. I see now that we only come into this function if fast-allocation failed. > > But part of the reason for my confusion is that you are trying to do fast allocations while holding the heap lock! > > The reason we came into attempt_allocation_slow() is because we already failed to attempt_allocation_in_alloc_regions(). There's no need to call this a second time. You should have remembered regions_ready_for_refresh and passed this in as an argument to attempt_allocation_slow(). I'm not concerned that the count of regions_ready_for_refresh might be stale. If this count is getting incremented "during" our allocation, we will see this result soon enough. If multiple mutators fail fast-path allocation simultaneously, they will each acquire heap lock either way (existing implementation vs. new implementation that does not retry the allocation). Acquiring the heap lock is the "expensive" operation. If the first one refreshes allocation regions, then subsequent invocations will not find any regions to be refreshed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2665704698 From serb at openjdk.org Wed Jan 7 02:24:25 2026 From: serb at openjdk.org (Sergey Bylokhov) Date: Wed, 7 Jan 2026 02:24:25 GMT Subject: RFR: 8374316: Update copyright year to 2025 for hotspot in files where it was missed [v4] In-Reply-To: References:

Message-ID: On Tue, 6 Jan 2026 01:58:22 GMT, David Holmes wrote: >Just be aware that if a file was created as part of a refactoring and the code was taken as-is from an existing file, then the copyright year range should have remained the same as the original file. I don't know if any of the files you modified fall into that category but just wanted to point out that looking at the commit date is not always correct. I tried to catch rename/move-only or copyright-only changes, but I?m not 100% sure I filtered all of them out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28970#issuecomment-3717068454 From lkorinth at openjdk.org Wed Jan 7 12:35:42 2026 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 7 Jan 2026 12:35:42 GMT Subject: RFR: 8367993: G1: Speed up ConcurrentMark initialization [v2] In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 10:02:41 GMT, Leo Korinth wrote: >> This change moves almost all of the ConcurrentMark initialisation from its constructor to the method `G1ConcurrentMark::fully_initialize()`. Thus, creation time of the VM can be slightly improved by postponing creation of ConcurrentMark. Most time is saved postponing creation of statistics buffers and threads. >> >> It is not obvious that this is the best solution. I have earlier experimented with lazily allocating statistics buffers _only_. One could also initialise a little bit more eagerly (for example the concurrent mark thread) and maybe get a slightly cleaner change. However IMO it seems better to not have ConcurrentMark "half initiated" with a created mark thread, but un-initialised worker threads. >> >> This change is depending on the integration of https://bugs.openjdk.org/browse/JDK-8373253. >> >> I will be out for vacation, and will be back after new year (and will not answer questions during that time), but I thought I get the pull request out now so that you can have a look. > > Leo Korinth has updated the pull request incrementally with 561 additional commits since the last revision: > > - Merge branch 'master' into _8367993 > - 8366058: Outdated comment in WinCAPISeedGenerator > > Reviewed-by: mullan > - 8357258: x86: Improve receiver type profiling reliability > > Reviewed-by: kvn, vlivanov > - 8373704: Improve "SocketException: Protocol family unavailable" message > > Reviewed-by: lucy, jpai > - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently > > Reviewed-by: jiefu, jbhateja, erfang, qamai > - 8343809: Add requires tag to mark tests that are incompatible with exploded image > > Reviewed-by: alanb, dholmes > - 8374465: Spurious dot in documentation for JVMTI ClassLoad > > Reviewed-by: kbarrett > - 8374317: Change GCM IV size to 12 bytes when encrypting/decrypting TLS session ticket > > Reviewed-by: djelinski, mpowers, ascarpino > - 8374444: Fix simple -Wzero-as-null-pointer-constant warnings > > Reviewed-by: aboldtch > - 8373847: Test javax/swing/JMenuItem/MenuItemTest/bug6197830.java failed because The test case automatically fails when clicking any items in the ?Nothing? menu in all four windows (Left-to-right)-Menu Item Test and (Right-to-left)-Menu Item Test > > Reviewed-by: serb, aivanov, dnguyen > - ... and 551 more: https://git.openjdk.org/jdk/compare/b907b295...0ece3767 I will redo the merge, I have done something strange. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28723#issuecomment-3718660595 From lkorinth at openjdk.org Wed Jan 7 12:58:43 2026 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 7 Jan 2026 12:58:43 GMT Subject: RFR: 8367993: G1: Speed up ConcurrentMark initialization [v3] In-Reply-To: References: Message-ID: > This change moves almost all of the ConcurrentMark initialisation from its constructor to the method `G1ConcurrentMark::fully_initialize()`. Thus, creation time of the VM can be slightly improved by postponing creation of ConcurrentMark. Most time is saved postponing creation of statistics buffers and threads. > > It is not obvious that this is the best solution. I have earlier experimented with lazily allocating statistics buffers _only_. One could also initialise a little bit more eagerly (for example the concurrent mark thread) and maybe get a slightly cleaner change. However IMO it seems better to not have ConcurrentMark "half initiated" with a created mark thread, but un-initialised worker threads. > > This change is depending on the integration of https://bugs.openjdk.org/browse/JDK-8373253. > > I will be out for vacation, and will be back after new year (and will not answer questions during that time), but I thought I get the pull request out now so that you can have a look. Leo Korinth has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 564 commits: - Merge branch '8373253' into 8367993 - Merge branch 'master' into _8373253 - Merge branch 'master' into _8367993 - 8366058: Outdated comment in WinCAPISeedGenerator Reviewed-by: mullan - 8357258: x86: Improve receiver type profiling reliability Reviewed-by: kvn, vlivanov - 8373704: Improve "SocketException: Protocol family unavailable" message Reviewed-by: lucy, jpai - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently Reviewed-by: jiefu, jbhateja, erfang, qamai - 8343809: Add requires tag to mark tests that are incompatible with exploded image Reviewed-by: alanb, dholmes - 8374465: Spurious dot in documentation for JVMTI ClassLoad Reviewed-by: kbarrett - 8374317: Change GCM IV size to 12 bytes when encrypting/decrypting TLS session ticket Reviewed-by: djelinski, mpowers, ascarpino - ... and 554 more: https://git.openjdk.org/jdk/compare/2aa8aa4b...28ccbb68 ------------- Changes: https://git.openjdk.org/jdk/pull/28723/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28723&range=02 Stats: 130308 lines in 3967 files changed: 83803 ins; 29735 del; 16770 mod Patch: https://git.openjdk.org/jdk/pull/28723.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28723/head:pull/28723 PR: https://git.openjdk.org/jdk/pull/28723 From kdnilsen at openjdk.org Wed Jan 7 14:54:07 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 14:54:07 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Wed, 7 Jan 2026 00:04:41 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 428: > >> 426: } >> 427: >> 428: HeapWord* ShenandoahOldCollectorAllocator::allocate(ShenandoahAllocRequest& req, bool& in_new_region) { > > Confer with William Kemper about this. He is working on a change that may simplify the handling of PLABs, in which case ShenandoahOldCollectorAllocator can behave the same as ShenandoahCollector. Alternatively, I don't think it would be too terribly difficult to implement try_allocate_aligned() function to support fast (CAS) allocation of aligned PLABs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2668774560 From kdnilsen at openjdk.org Wed Jan 7 14:58:37 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 14:58:37 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 18:13:09 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 114: > >> 112: HeapWord* obj = attempt_allocation_in_alloc_regions(req, in_new_region, alloc_start_index(), dummy); >> 113: if (obj != nullptr) { >> 114: return obj; > > Even in the case that we successfully fill our allocation request, if regions_ready_for_refresh is greater than some percentage of _alloc_region_count (e.g. > _alloc_region_count / 4), then we should grab the heap lock and refresh_alloc_regions() here. Otherwise, we will gradually degrade the number of directly_allocatable_regions until we are down to one before we refresh any of them. After further thought, am thinking the threshold for refresh_alloc_regions() might be if (regions_ready_for_refresh >= _alloc_region_count / 2). That would reduce the number of slow paths through the allocator. If we can re-randomize the thread-local start indexes when their original start index hits a retire-able region, this might work ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2668794471 From kdnilsen at openjdk.org Wed Jan 7 16:37:22 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 16:37:22 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v22] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 76 commits: - Remove note to self - Slight expansion of promo reserve - Remove bad assertion - Merge remote-tracking branch 'jdk/master' into share-collector-reserves-restart-gh - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - fix unsigned arithmetic underflow - Attempt fix for assertion failures - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Remove debug instrumentation - ... and 66 more: https://git.openjdk.org/jdk/compare/2d092840...d0a692ff ------------- Changes: https://git.openjdk.org/jdk/pull/25357/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=21 Stats: 1457 lines in 29 files changed: 777 ins; 286 del; 394 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Wed Jan 7 17:38:09 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 17:38:09 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v2] In-Reply-To: References:

<_QnV10ychv2AQj3TN6gch1p8B-OGMTsN6FTcbBJSn9U=.cba48ddd-cb45-4e25-9633-5ef7e9cfa4ea@github.com> Message-ID: On Tue, 6 Jan 2026 20:46:23 GMT, William Kemper wrote: >> May be let the heuristics (or the policy) track progress as well, and inform the actuator (i.e. op degenerated) whether it should upgrade to a full gc. It almost feels like heuristics and policy and actuator are leaking abstractions. It feels like heuristics keep track of the model parameters and learn from sensors, and the policy consults a specific heuristic to inform actuator (i.e. actions). >> >> By that model, you'd have the actuator sending the sensor information to the heuristics and asking the policy (or the heuristics, if you conflate heuristics and policy) to decide which step to take next. It would seem that evaluation of the notion of progress then moves to the policy too. > > @kdnilsen , what do you think about having a single method called `record_degenerated`. It's a matter of fact without conflating progress and success. I don't like having duplicated code between `record_success_degenerated` and `record_unsuccessful_degenerated`. I understand what @ysramakrishna is saying, and I agree, but I think a change like that is beyond the scope of this PR. I like this idea. I'll try to make that work without breaking anything... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2669431913 From kdnilsen at openjdk.org Wed Jan 7 18:19:35 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 18:19:35 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v6] In-Reply-To: References: Message-ID: > Add a triggering penalty when we execute degenerated GC cycle. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Combine successful and unsuccessful into single method: report_degenerated() - remove gratuitous blank line - Merge remote-tracking branch 'jdk/master' into add-degen-penalty - touch file to force tests - Merge remote-tracking branch 'jdk/master' into add-degen-penalty - Merge remote-tracking branch 'jdk/master' into add-degen-penalty - refactor for reviewer requests - remove redundant code - Increase heuristic penalties following degenerated GC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28834/files - new: https://git.openjdk.org/jdk/pull/28834/files/7b0efb3e..888f92a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28834&range=04-05 Stats: 20369 lines in 1812 files changed: 4196 ins; 2531 del; 13642 mod Patch: https://git.openjdk.org/jdk/pull/28834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28834/head:pull/28834 PR: https://git.openjdk.org/jdk/pull/28834 From shade at openjdk.org Wed Jan 7 19:06:03 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Jan 2026 19:06:03 GMT Subject: RFR: 8373266: Strengthen constant CardTable base accesses In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 18:45:04 GMT, Aleksey Shipilev wrote: > Shenandoah and G1 are using CardTable for most of its infrastructure, but flip the card tables as they go, and maintain the actual card table reference in TLS. As such, accessing card table base from assembler and compilers runs into risk of accidentally encoding the wrong card table base in generated code. > > Most of the current code avoids this trouble by carefully implementing their GC barriers to avoid touching shared parts where card table base constness is assumed. _Except_ for JVMCI, that reads the card table base for G1 barrier set, and that is wrong. The JVMCI users would need to rectify this downstream. > > Shenandoah added a few asserts to catch these errors: > SHENANDOAHGC_ONLY(assert(!UseShenandoahGC, "Shenandoah byte_map_base is not constant.");) > > ...but G1 would also benefit from the similar safety mechanism. > > This PR strengthens the code to prevent future accidents. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc` > - [x] Linux x86_64 server fastdebug, `all` with Serial, Parallel, G1, Shenandoah, Z > - [x] Linux AArch64 server fastdebug, `all` with Serial, Parallel, G1, Shenandoah, Z > - [x] GHA, cross-compilation only Still waiting for reviews. @tschatzl, you might be interested in this from G1 side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28703#issuecomment-3720310765 From shade at openjdk.org Wed Jan 7 19:06:02 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Jan 2026 19:06:02 GMT Subject: RFR: 8373266: Strengthen constant CardTable base accesses [v2] In-Reply-To: References: Message-ID: > Shenandoah and G1 are using CardTable for most of its infrastructure, but flip the card tables as they go, and maintain the actual card table reference in TLS. As such, accessing card table base from assembler and compilers runs into risk of accidentally encoding the wrong card table base in generated code. > > Most of the current code avoids this trouble by carefully implementing their GC barriers to avoid touching shared parts where card table base constness is assumed. _Except_ for JVMCI, that reads the card table base for G1 barrier set, and that is wrong. The JVMCI users would need to rectify this downstream. > > Shenandoah added a few asserts to catch these errors: > SHENANDOAHGC_ONLY(assert(!UseShenandoahGC, "Shenandoah byte_map_base is not constant.");) > > ...but G1 would also benefit from the similar safety mechanism. > > This PR strengthens the code to prevent future accidents. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc` > - [x] Linux x86_64 server fastdebug, `all` with Serial, Parallel, G1, Shenandoah, Z > - [x] Linux AArch64 server fastdebug, `all` with Serial, Parallel, G1, Shenandoah, Z > - [x] GHA, cross-compilation only Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge branch 'master' into JDK-8373266-cardtable-asserts - Another build fix - Fix Minimal builds - Shenandoah non-generational can have nullptr card table - Also simplify CTBS builder - CI should also mention "const" - Fix JVMCI by answering proper things - Merge branch 'master' into JDK-8373266-cardtable-asserts - More fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28703/files - new: https://git.openjdk.org/jdk/pull/28703/files/26b6b071..040a84d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28703&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28703&range=00-01 Stats: 25810 lines in 2653 files changed: 14810 ins; 3456 del; 7544 mod Patch: https://git.openjdk.org/jdk/pull/28703.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28703/head:pull/28703 PR: https://git.openjdk.org/jdk/pull/28703 From kdnilsen at openjdk.org Wed Jan 7 19:08:27 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 19:08:27 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v2] In-Reply-To: References:

<_QnV10ychv2AQj3TN6gch1p8B-OGMTsN6FTcbBJSn9U=.cba48ddd-cb45-4e25-9633-5ef7e9cfa4ea@github.com>

Message-ID: On Wed, 7 Jan 2026 17:35:49 GMT, Kelvin Nilsen wrote: >> @kdnilsen , what do you think about having a single method called `record_degenerated`. It's a matter of fact without conflating progress and success. I don't like having duplicated code between `record_success_degenerated` and `record_unsuccessful_degenerated`. I understand what @ysramakrishna is saying, and I agree, but I think a change like that is beyond the scope of this PR. > > I like this idea. I'll try to make that work without breaking anything... I've committed this change and it is running through the CI pipeline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28834#discussion_r2669739585 From wkemper at openjdk.org Wed Jan 7 19:20:29 2026 From: wkemper at openjdk.org (William Kemper) Date: Wed, 7 Jan 2026 19:20:29 GMT Subject: RFR: 8373714: Shenandoah: Register heuristic penalties following a degenerated GC [v6] In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 18:19:35 GMT, Kelvin Nilsen wrote: >> Add a triggering penalty when we execute degenerated GC cycle. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Combine successful and unsuccessful into single method: report_degenerated() > - remove gratuitous blank line > - Merge remote-tracking branch 'jdk/master' into add-degen-penalty > - touch file to force tests > - Merge remote-tracking branch 'jdk/master' into add-degen-penalty > - Merge remote-tracking branch 'jdk/master' into add-degen-penalty > - refactor for reviewer requests > - remove redundant code > - Increase heuristic penalties following degenerated GC Looks good to integrate, assuming testing pipelines pass. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28834#pullrequestreview-3636442278 From xpeng at openjdk.org Wed Jan 7 19:41:07 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 19:41:07 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References:

<_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com>

Message-ID: On Tue, 6 Jan 2026 20:43:39 GMT, Kelvin Nilsen wrote: >> It is not really necessary to `atomic_allocate_in` here, but I wanted reuse some of the codes in atomic_allocate_in, we can discuss this later, I can change it back to non-atomic version. > > Would prefer not to use the atomic_allocate code here. If you want to reuse code, maybe you can refactor allocate_in with an template argument. > > I notice that this PR makes lots of ShenandoahHeapRegion variables volatile: _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs. That will cause less efficient code to be generated whenever we are accessing this data from "behind" the GC heap lock, which should be rare, I guess. > > I raised some concerns/issues about the race that happens when we move a region from the directly-allocatable set into the global-heap-lock-protected ShenandoahFreeSet partition. > > Here's the scenario that I'm concerned about: > > 1. A mutator obtains pointer to directly allocatable region R > 2. A second mutator performs a refresh, moving region R out of directly allocatable set (for whatever reason) > 3. Region R is now eligible to satisfy allocations from behind the global heap lock > 4. Some third mutator thread acquires the heap lock and fetches top for region $ > 5. The first mutator performs its allocation within the same region R, not recognizing a CAS conflict > 6. This third mutator allocates from region R at top, without using CAS. So both mutators think they own the same object > > I think this is not a problem if the "only" reason at step 2 above that we move region R out of directly allocatable set is because R is ready to be retired. In that case, there will be no subsequent heap-locked allocations in regions R. However, I anticipate the day in not-too-distant future when we will want to refresh regions even when they are not ready to be retired. Specifically, as we move rebuild-freeset out of safepoints, we will want to refresh regions before we acquire heap-lock to do rebuild, with the goal of making sure there is sufficient directly allocatable memory available that no mutator will be stalled because it needs to allocate during the time that the heap remains locked for the rebuild operation. > > So I suppose that if we always use atomic_allocate() even for allocations that happen while holding the heap lock, we won't have this problem. If we decide to keep this architecture, there should be comments explaining why we are doing it this way. (I am not real happy that we have to "pay the cost" of CAS in addition to paying the cost of global heap lock, but I think these allocations should be very rare. It seems this would only come up if, for example, a mutator wanted to allocate an object that is 1/2 the heap regio... I will update the PR and not use atomic version here, and also another place in refresh_alloc_regions. Having volatile_top and nonvolatile_top seems necessary, it will make the code more complicated w/o much performance benefits, with CAS allocator, most of alloc request will be handled by the atomic code path, in only few cases we need non-atomic allocation: * After reserving alloc regions from free set before storing to alloc region, it performs obj allocation if the alloc request has not been satisfied yet. * After trying atomic allocation, refresh alloc regions fails, it will try to find a region in free set with enough space for the allocation request. Yes, all the _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs are volatile now, out of these fields, I believe I can maybe remove volatile for _age and _youth(?), but the update of the rest must be atomic because mutators will increase the values in the CAS allocation code path w/o heap lock. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2669831418 From kdnilsen at openjdk.org Wed Jan 7 19:58:58 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 19:58:58 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v23] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 78 commits: - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Fix whitespace and comment - Remove note to self - Slight expansion of promo reserve - Remove bad assertion - Merge remote-tracking branch 'jdk/master' into share-collector-reserves-restart-gh - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - fix unsigned arithmetic underflow - Attempt fix for assertion failures - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - ... and 68 more: https://git.openjdk.org/jdk/compare/dd20e915...9aa4a3e2 ------------- Changes: https://git.openjdk.org/jdk/pull/25357/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=22 Stats: 1456 lines in 29 files changed: 776 ins; 286 del; 394 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Wed Jan 7 20:32:30 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 20:32:30 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References:

Message-ID: On Tue, 6 Jan 2026 21:11:42 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove develop/debug instrumentation > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1218: > >> 1216: } else { >> 1217: heap->heuristics()->start_idle_span(); >> 1218: } > > Suggestion: > > _generation->heuristics()->start_idle_span(); Very nice. Thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2669982434 From kdnilsen at openjdk.org Wed Jan 7 20:38:39 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 20:38:39 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References:

Message-ID: On Tue, 6 Jan 2026 22:24:44 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove develop/debug instrumentation > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 179: > >> 177: " after adjusting for spike_headroom: %zu%s" >> 178: " and penalties: %zu%s", _is_generational? _space_info->name(): "Global", >> 179: byte_size_in_proper_unit(mutator_available), proper_unit_for_byte_size(mutator_available), > > Can we use the `PROPERFMT/PROPERFMTARGS` macros for these? I find they really improve readability. Agreed. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2669994974 From kdnilsen at openjdk.org Wed Jan 7 20:49:04 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 20:49:04 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References:

Message-ID: On Tue, 6 Jan 2026 22:25:46 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove develop/debug instrumentation > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 199: > >> 197: >> 198: // There is no headroom during evacuation and update refs. This information is not used to trigger the next GC. >> 199: // Rather, it is made available to support throttling of allocations during GC. > > Is that true? or is allocation throttling part of another change? Sorry. Not true. Fixing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2670019293 From kdnilsen at openjdk.org Wed Jan 7 20:56:51 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 20:56:51 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v4] In-Reply-To: References: Message-ID: > After studying large numbers of GC logs with degenerated cycles that have resulted from "late" triggers, we propose the following general improvements: > > 1. Track trends in GC times rather than always using the average GC time plus standard deviation. In many situations, GC times trend upward due to, for example, increasing amounts of live data that must be marked as a workload builds up its working set of memory. > 2. Sample allocation rates more frequently than once every 100 ms. > 3. Track trends in allocation rates. In some situations, the allocation rate trends upwards due to, for example, the start of a new phase of execution or a spike in client workload. > 4. When we detect acceleration of allocation rate, predict consumption of memory based on accelerated allocation rates rather than assuming constant allocation rate. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 65 commits: - Fix comment - Use PROPERFMT macros - Simplify code flow: reviewer suggestion - Merge remote-tracking branch 'jdk/master' into accelerated-triggers - Remove develop/debug instrumentation - add another override - Change type of command-line args - fix white space - Add override to virtual methods - Fix race between allocation reporting and querying - ... and 55 more: https://git.openjdk.org/jdk/compare/dd20e915...7f3a6d1e ------------- Changes: https://git.openjdk.org/jdk/pull/29039/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29039&range=03 Stats: 1028 lines in 25 files changed: 921 ins; 35 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/29039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29039/head:pull/29039 PR: https://git.openjdk.org/jdk/pull/29039 From kdnilsen at openjdk.org Wed Jan 7 20:56:54 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 20:56:54 GMT Subject: RFR: 8312116: GenShen: make instantaneous allocation rate triggers more timely [v3] In-Reply-To: References:

Message-ID: On Tue, 6 Jan 2026 22:27:41 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove develop/debug instrumentation > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 275: > >> 273: } >> 274: >> 275: void ShenandoahAdaptiveHeuristics::add_gc_time(double timestamp, double gc_time) { > > Could we use `TruncatedSeq::predict_next` here? In this PR, we keep TruncatedSeq::predict_next() functionality as that has proven to be "right" most of the time. TruncatedSeq::predict_next() assumes the next GC time is most effectively predicted as an average over a noisy history of previously measured GC times. This new function adds a new prediction mechanism which kicks in when we observe a "linearly increasing trend in GC times". This has been observed to occur during initialization and startup of new phases of a service workload, where GC(N) takes 400 ms, GC(N+1) takes 425 ms, GC(N+2) takes 465 ms, etc. The typical reason is because the workload is building up data structures and thus requires increasing amounts of time to mark and evacuate and update the increasing amounts of live data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29039#discussion_r2670036503 From xpeng at openjdk.org Wed Jan 7 21:13:08 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 21:13:08 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v17] In-Reply-To: <6J2xtZ1DLytiwflQ0wbQCtg8tsRAHadSdkZGZllLAxY=.af212eeb-44f7-4955-96ab-d069febe4e0e@github.com> References:

<2YVE3uR8bRJ_8qAtXN4WoRmeY0Y9xhzsmKbxqv5oL2M=.2811f02e-35b8-46cf-863a-db4006ca1a78@github.com> <4Qs9QlvWQ7gX6RW-rwPVJa-Ndhtx883aImTtiXyCYGk=.1c03a208-d6e5-42bc-bfc3-d1f95e968929@github.com> <6J2xtZ1DLytiwflQ0wbQCtg8tsRAHadSdkZGZllLAxY=.af212eeb-44f7-4955-96ab-d069febe4e0e@github.com> Message-ID: <1MZQLDhJsqK5ZoPIVDYYRyVg0po67A6wVfIpsAl7Qa0=.d0bfa7e4-f448-4bb1-a386-b8226133e6a7@github.com> On Tue, 6 Jan 2026 21:00:03 GMT, Kelvin Nilsen wrote: >> Are you running any experiments (on different hardware configurations) to test your assumptions about this? > > Please document the results of any experiments as rationale for the final design. I did run some experiments and didn't see significant difference, I will keep keep current code using PaddedArray, meanwhile keep this conversation open and make a decision based metrics later after I address the other comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670084604 From kdnilsen at openjdk.org Wed Jan 7 21:26:52 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 21:26:52 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v24] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/9aa4a3e2..026e34df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=22-23 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From xpeng at openjdk.org Wed Jan 7 21:47:46 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 21:47:46 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: <3CDzFBUPf-x7Cp4HiQdzwytCJk9kdpDHGB0SjEtD5Kg=.d9833d73-f579-4ddb-bb3e-e7a9ce0743d0@github.com> On Tue, 6 Jan 2026 17:35:56 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 110: > >> 108: } >> 109: >> 110: uint dummy = 0; > > Don't call this "dummy". Call it regions_ready_for_refresh. Remember the value and pass it in as a new argument to attempt_allocation_slow() so that we don't have to recompute it later. The values from fast path won't be used anyway, that why I called it dummy. attempt_allocation_slow has to recompute it after acquiring heap lock. Imaging that have have two mutators, 8 shared alloc regions, both try to allocate the same time: 1. Both threads tried the fast path (attempt_allocation_in_alloc_regions) and failed, both see 8 alloc regions are ready to retire. 2. Both threads will call into attempt_allocation_slow 3. The first thread acquired heap lock refresh all the 8 alloc regions and allocate in one of the region. the thread release heap lock, 4. The 2nd thread acquires heap lock successfully after 1st thread released it, now the regions_ready_for_refresh it saw in fast path is stale and has to be recomputed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670164993 From kdnilsen at openjdk.org Wed Jan 7 21:54:47 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 21:54:47 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v25] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix confusing comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/026e34df..b064ecc5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=23-24 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Wed Jan 7 22:16:55 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 22:16:55 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v26] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - fix another typo - Fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/b064ecc5..a8520190 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=24-25 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Wed Jan 7 22:16:59 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 22:16:59 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v23] In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 19:58:58 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 78 commits: > > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves > - Fix whitespace and comment > - Remove note to self > - Slight expansion of promo reserve > - Remove bad assertion > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves-restart-gh > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves > - fix unsigned arithmetic underflow > - Attempt fix for assertion failures > - Merge remote-tracking branch 'jdk/master' into share-collector-reserves > - ... and 68 more: https://git.openjdk.org/jdk/compare/dd20e915...9aa4a3e2 src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 183: > 181: // Call the subclasses to add young-gen regions into the collection set. > 182: choose_collection_set_from_regiondata(collection_set, candidates, cand_idx, immediate_garbage + free); > 183: The general idea here is to see give young-gen first dibs at its reserves. But if young does not consumes its reserves, we'll see if we can repurpose some of those reserves to expand our old-gen evacuation efforts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670067635 From kdnilsen at openjdk.org Wed Jan 7 22:17:01 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 22:17:01 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v26] In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 22:14:21 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - fix another typo > - Fix typo src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 213: > 211: // Entire region will be promoted, This region does not impact young-gen or old-gen evacuation reserve. > 212: // This region has been pre-selected and its impact on promotion reserve is already accounted for. > 213: I think this comment is obsolete. The line of code that it describes was removed in a previous PR. IIRC, we used to increment cur_young_garbage by r->garbage() plus r->get_live_data_bytes(). src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 127: > 125: byte_size_in_proper_unit(old_evacuation_budget), proper_unit_for_byte_size(old_evacuation_budget), > 126: unprocessed_old_collection_candidates()); > 127: This code is now used twice for mixed evacuation cycle, so I bundled the code into add_old_regions_to_cset(). The first time is when we prime the collection set. This is called to place certain old-gen regions into the cset before we chose the young-gen regions that are going to be collected. The second time is when we top-off the old collection set. This happens after young-gen regions have been placed into the cset. If there is unused reserve from young generation, we consider repurposing those reserves for old and try to expand the old collection set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670076558 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670138961 From kdnilsen at openjdk.org Wed Jan 7 22:17:05 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Jan 2026 22:17:05 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v24] In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 21:26:52 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Add comment src/hotspot/share/gc/shenandoah/heuristics/shenandoahGlobalHeuristics.cpp line 50: > 48: > 49: void ShenandoahGlobalHeuristics::choose_global_collection_set(ShenandoahCollectionSet* cset, > 50: const ShenandoahHeuristics::RegionData* data, The general idea here: For a global GC, our collection set is based on garbage-first heuristic across all of young and all of old. We combine our old and young reserves into a shared pool of reserves. We choose cset regions in garbage-first order. Our choices of which regions to evacuate cause us to dedicate reserves to either old or young. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 279: > 277: } > 278: > 279: // After an abbreviated cycle, we reclaim immediate garbage. Rebuild the freeset in order to establish With this PR, some apportionment of reserves is done before the idle span. And each idle span is preceded by a freeset rebuild. At the time of rebuild, we make use of information gleaned from recent GC activities to decide how to balance the old and young reserves, such as: 1. Are there candidates for mixed evacuation? 2. What is the potential for promotion? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1219: > 1217: } > 1218: > 1219: void ShenandoahFreeSet::move_unaffiliated_regions_from_collector_to_old_collector(ssize_t count) { This allows us to "share" from young reserve to old reserve. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2571: > 2569: } > 2570: > 2571: Before this PR, we only "have_evacuation_reserves" when we rebuild at start of evacuation. With this PR, we always have_evacuation_reserves. That's because at the start of idle span, we are already anticipating what sort of evacuation will take place during the next GC cycle ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670125957 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670155851 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670166464 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670192097 From xpeng at openjdk.org Wed Jan 7 22:17:34 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:17:34 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: <0IfaybNazrHlrdcGhk-0080a-KqpdUwN_WXpYVe_fjc=.d68c5098-02b7-4a91-8ad3-4e8b310cba0b@github.com> References:

<_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com>

<4BVpGlI0mK12KTI6b6l1UoH26vEwytUP1vlYX1Z-UJQ=.374ea1dd-38d7-46ea-b6a5-a99453171059@github.com> <0IfaybNazrHlrdcGhk-0080a-KqpdUwN_WXpYVe_fjc=.d68c5098-02b7-4a91-8ad3-4e8b310cba0b@github.com> Message-ID: On Tue, 6 Jan 2026 17:34:06 GMT, Kelvin Nilsen wrote: >> But part of the reason for my confusion is that you are trying to do fast allocations while holding the heap lock! >> >> The reason we came into attempt_allocation_slow() is because we already failed to attempt_allocation_in_alloc_regions(). There's no need to call this a second time. You should have remembered regions_ready_for_refresh and passed this in as an argument to attempt_allocation_slow(). > > I'm not concerned that the count of regions_ready_for_refresh might be stale. If this count is getting incremented "during" our allocation, we will see this result soon enough. If multiple mutators fail fast-path allocation simultaneously, they will each acquire heap lock either way (existing implementation vs. new implementation that does not retry the allocation). Acquiring the heap lock is the "expensive" operation. If the first one refreshes allocation regions, then subsequent invocations will not find any regions to be refreshed. The concern is the not "this count is getting incremented "during" our allocation", it is the the case when it get decremented because other mutators may have already refreshed all alloc regions before current mutator getting heap lock, we have call attempt_allocation_in_alloc_regions again after successfully acquiring heap lock because of this. Same design can be also found in G1, Parallel and Serial GC's CAS allocator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670238256 From xpeng at openjdk.org Wed Jan 7 22:27:56 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:27:56 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 21:32:30 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 254: > >> 252: // Step 1: find out the alloc regions which are ready to refresh. >> 253: for (uint i = 0; i < _alloc_region_count; i++) { >> 254: ShenandoahAllocRegion* alloc_region = &_alloc_regions[i]; > > We've got the heap lock here. why does this need to be atomic? Comments in the code should make this clear. I believe AtomicAccess::load here is not needed, I'll remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670262454 From xpeng at openjdk.org Wed Jan 7 22:53:26 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:53:26 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v21] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: - Update code comments - Update assert message - Only use atomic allocation when allocate from shared alloc regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/cf13b7b5..61d86546 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=19-20 Stats: 33 lines in 5 files changed: 13 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Wed Jan 7 22:53:29 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:53:29 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: On Tue, 6 Jan 2026 00:32:24 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 102: > >> 100: for (size_t i = 0; i < num_regions; i++) { >> 101: ShenandoahHeapRegion* region = heap->get_region(i); >> 102: assert(!region->is_active_alloc_region(), "Not expecting any active alloc region at the time"); > > Same suggestion here as with shenandoahGenerationalHeuristics.cpp. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670313055 From xpeng at openjdk.org Wed Jan 7 22:56:19 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:56:19 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References:

<_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com>

Message-ID: On Wed, 7 Jan 2026 19:37:13 GMT, Xiaolong Peng wrote: >> Would prefer not to use the atomic_allocate code here. If you want to reuse code, maybe you can refactor allocate_in with an template argument. >> >> I notice that this PR makes lots of ShenandoahHeapRegion variables volatile: _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs. That will cause less efficient code to be generated whenever we are accessing this data from "behind" the GC heap lock, which should be rare, I guess. >> >> I raised some concerns/issues about the race that happens when we move a region from the directly-allocatable set into the global-heap-lock-protected ShenandoahFreeSet partition. >> >> Here's the scenario that I'm concerned about: >> >> 1. A mutator obtains pointer to directly allocatable region R >> 2. A second mutator performs a refresh, moving region R out of directly allocatable set (for whatever reason) >> 3. Region R is now eligible to satisfy allocations from behind the global heap lock >> 4. Some third mutator thread acquires the heap lock and fetches top for region $ >> 5. The first mutator performs its allocation within the same region R, not recognizing a CAS conflict >> 6. This third mutator allocates from region R at top, without using CAS. So both mutators think they own the same object >> >> I think this is not a problem if the "only" reason at step 2 above that we move region R out of directly allocatable set is because R is ready to be retired. In that case, there will be no subsequent heap-locked allocations in regions R. However, I anticipate the day in not-too-distant future when we will want to refresh regions even when they are not ready to be retired. Specifically, as we move rebuild-freeset out of safepoints, we will want to refresh regions before we acquire heap-lock to do rebuild, with the goal of making sure there is sufficient directly allocatable memory available that no mutator will be stalled because it needs to allocate during the time that the heap remains locked for the rebuild operation. >> >> So I suppose that if we always use atomic_allocate() even for allocations that happen while holding the heap lock, we won't have this problem. If we decide to keep this architecture, there should be comments explaining why we are doing it this way. (I am not real happy that we have to "pay the cost" of CAS in addition to paying the cost of global heap lock, but I think these allocations should be very rare. It seems this would only come up if, for example, a mutator wanted to allocate ... > > I will update the PR and not use atomic version here, and also another place in refresh_alloc_regions. > > Having volatile_top and nonvolatile_top seems necessary, it will make the code more complicated w/o much performance benefits, with CAS allocator, most of alloc request will be handled by the atomic code path, in only few > cases we need non-atomic allocation: > * After reserving alloc regions from free set before storing to alloc region, it performs obj allocation if the alloc request has not been satisfied yet. > * After trying atomic allocation, refresh alloc regions fails, it will try to find a region in free set with enough space for the allocation request. > > Yes, all the _age, _youth, _top, _tlab_allocs, _gclab_allocs, _plab_allocs are volatile now, out of these fields, I believe I can maybe remove volatile for _age and _youth(?), but the update of the rest must be atomic because mutators will increase the values in the CAS allocation code path w/o heap lock. I have updated the method `atomic_allocate_in` with a template parameter ATOMIC, now only when allocating from shared alloc regions the ATOMIC parameter is true to use atomic operations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670323493 From xpeng at openjdk.org Wed Jan 7 22:59:04 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 22:59:04 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v22] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: No need to use Atomic::load to read shared alloc region in refresh_alloc_regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/61d86546..f5038a3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=20-21 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Wed Jan 7 23:12:41 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 23:12:41 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v14] In-Reply-To: References:

<_mkL6HrCqlS1_qgOmXvEj3vsYgo2idQnbaDdMVFEGUk=.9c71f705-1f96-441e-96d0-c99d3053cffd@github.com>

Message-ID: On Tue, 6 Jan 2026 17:37:46 GMT, Kelvin Nilsen wrote: >> I'll add comments on this, _alloc_region_count == 0 means we don't want to use any shared alloc region, it will always allocate with a heap lock, ideally the performance should be same as before, so it always simply find a region with enough space and allocate in the region. > > Put the comments describing functions in the .hpp file, where they are currently. But we need to enhance those comments. I have added comments on those functions, I'll keep adding more for those missing comments; meanwhile I am trying to avoid excessive comment, pleas point out if any of the comments is not clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670355339 From xpeng at openjdk.org Wed Jan 7 23:16:00 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 23:16:00 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v23] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: No need to use Atomic::load to read shared alloc region in release_alloc_regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/f5038a3a..917dd8a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=21-22 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Wed Jan 7 23:19:28 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 7 Jan 2026 23:19:28 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v20] In-Reply-To: References: <797j9fNWUlV491wWKbFMfthH99o3A5493T0Pcdn_zdc=.417c13b0-8618-4402-aafa-3d86c45c8ff6@github.com> Message-ID: <_vhMVxLnvCdIHO_CJ8kaI3cLNKJSSYvqK7n_wriVhDk=.2d1649e4-512d-4682-842b-29541423b458@github.com> On Tue, 6 Jan 2026 01:55:58 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 265 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix build error after merging from tip >> - Merge branch 'master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Some comments updates as suggested in PR review >> - Fix build failure after merge >> - Expend promoted from ShenandoahOldCollectorAllocator >> - Merge branch 'master' into cas-alloc-1 >> - Address PR comments >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 255 more: https://git.openjdk.org/jdk/compare/de81d389...cf13b7b5 > > src/hotspot/share/gc/shenandoah/shenandoahAllocator.cpp line 338: > >> 336: for (uint i = 0; i < _alloc_region_count; i++) { >> 337: ShenandoahAllocRegion& alloc_region = _alloc_regions[i]; >> 338: ShenandoahHeapRegion* r = AtomicAccess::load(&alloc_region.address); > > We've got heap lock and at safepoint. Do not need AtomicAccess here. That is more costly than necessary. I prefer to use regular fetch. If you prefer to keep AtomicAccess, please provide a comment in the code explaining why and we will revist. The atomic load is not needed, I'll removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2670368014 From kdnilsen at openjdk.org Thu Jan 8 00:14:17 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 00:14:17 GMT Subject: RFR: 8373819: GenShen: Requested generation may be null [v3] In-Reply-To: References: <8SWpQdaleulzSXfgF4fJ_zgekaijLs53t8Wer6IvKwo=.785abf41-65b7-44f6-90d0-2c63d5bf5981@github.com> Message-ID: On Mon, 5 Jan 2026 17:13:08 GMT, William Kemper wrote: >> This PR attempts to simplify the generational control thread by decoupling it somewhat from the heap/gc cancellation mechanism. This is meant to prevent the control thread from seeing inconsistencies between `shHeap::_cancelled_gc` and `shGenControlThread::_requested_gc_cause`. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Fix typo in assertion message > - Take regulator thread out of STS before requesting GC > > The request may block while it waits for control thread to stop old marking. If workers are already in the STS, and the regulator thread is still in the STS, but cannot yield, the safepoint will not run. Control, worker and regulator threads deadlock each other. > - Add comments > - Revert back to what should be on this branch > - Merge remote-tracking branch 'jdk/master' into fix-null-generation-crash > - Don't know how this file got deleted > - Carry over gc cancellation to gc request > - Do not let allocation failure requests be overwritten by other requests > - Fix degen point handling > - ... and 3 more: https://git.openjdk.org/jdk/compare/4458cab4...8f4f55db Thanks for talking us through this PR. Lots of subtle issues here. Looks good to me. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/28932#pullrequestreview-3637243572 From xpeng at openjdk.org Thu Jan 8 00:26:03 2026 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 8 Jan 2026 00:26:03 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v24] In-Reply-To: References: Message-ID: <1qqqdCXoW9PWw_ERccC7zh6kMPBJyKHp9wprAEqbMgM=.24431e29-e46f-4a8b-ade6-d27506432169@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of memory allocation to improve heap lock contention, along with the optimization, a better OOD is also done to Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in Collector partition, similar to ShenandoahMutatorAllocator, only few lines of code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector allocation in OldCollector partition, it doesn't inherit the logic from ShenandoahAllocator for now, the `allocate` method has been overridden to delegate to `FreeSet::allocate_for_collector` due to the special allocation considerations for `plab` in old gen. We will rewrite this part later and move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 usec, 99% 5898 usec, 99.9% 6488 usec, 99.99% 7081 usec, max 8048 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 2... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 271 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - No need to use Atomic::load to read shared alloc region in release_alloc_regions - No need to use Atomic::load to read shared alloc region in refresh_alloc_regions - Update code comments - Update assert message - Only use atomic allocation when allocate from shared alloc regions - Merge branch 'openjdk:master' into cas-alloc-1 - Fix build error after merging from tip - Merge branch 'master' into cas-alloc-1 - Merge branch 'master' into cas-alloc-1 - ... and 261 more: https://git.openjdk.org/jdk/compare/9a944e55...ef10341f ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=23 Stats: 1656 lines in 25 files changed: 1308 ins; 235 del; 113 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From wkemper at openjdk.org Thu Jan 8 00:47:42 2026 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Jan 2026 00:47:42 GMT Subject: RFR: Merge openjdk/jdk21u:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.10+6 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/231/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/231/files/2e594b6c..2e594b6c Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=231&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=231&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/231.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/231/head:pull/231 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/231 From wkemper at openjdk.org Thu Jan 8 00:48:58 2026 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Jan 2026 00:48:58 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v25] In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 21:54:47 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix confusing comment Just posting my comments for today, more to follow. Also, this will conflict mightily with https://github.com/openjdk/jdk/pull/27632. Though I think using the age census to estimate promotion reserves is conceptually compatible with this PR. src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 178: > 176: bool need_to_finalize_mixed = false; > 177: if (_generation->is_young()) { > 178: need_to_finalize_mixed = heap->old_generation()->heuristics()->prime_collection_set(collection_set); We could push this logic for young collections down into `ShenandoahYoungHeuristics::choose_collection_set_from_regiondata` where `_generation` is always `ShenandoahYoungGeneration`. src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 342: > 340: } > 341: > 342: bool ShenandoahOldHeuristics::top_off_collection_set(ssize_t &add_regions_to_old) { Does `add_regions_to_old` really need to be signed? Seems like it will always be non-negative here. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 282: > 280: // reserves for the next GC cycle. > 281: assert(_abbreviated, "Only rebuild free set for abbreviated and old-marking cycles"); > 282: heap->rebuild_free_set(true /*concurrent*/); Should we move this up in the sequence? If promote in place fails we'd go to a degenerated cycle. After a cursory review of the degenerated cycle, it looks like it only rebuilds the freeset when evacuations are performed. Seems like rebuidling the freeset earlier before checking for cancellation might reduce the chance of a degenerated cycle and also guarantee the freeset is rebuilt. Would it make more sense to do this in `early_cleanup`? ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25357#pullrequestreview-3637002268 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670238461 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670502843 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670279391 From wkemper at openjdk.org Thu Jan 8 00:49:00 2026 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Jan 2026 00:49:00 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v26] In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 22:16:55 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - fix another typo > - Fix typo src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 165: > 163: } > 164: > 165: assert(heap->is_concurrent_weak_root_in_progress(), "Must be doing weak roots now"); Was there a reason to remove this `assert`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2670259539 From wkemper at openjdk.org Thu Jan 8 00:50:23 2026 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Jan 2026 00:50:23 GMT Subject: Integrated: Merge openjdk/jdk21u:master In-Reply-To: References: Message-ID: On Thu, 25 Dec 2025 14:24:27 GMT, William Kemper wrote: > Merges tag jdk-21.0.10+6 This pull request has now been integrated. Changeset: 02bb7604 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/02bb7604bb84b2aec47069148f0d64931b3f9743 Stats: 660 lines in 23 files changed: 297 ins; 224 del; 139 mod Merge ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/231 From duke at openjdk.org Thu Jan 8 04:46:41 2026 From: duke at openjdk.org (Harshit470250) Date: Thu, 8 Jan 2026 04:46:41 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v4] In-Reply-To: References: Message-ID: > This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. Harshit470250 has updated the pull request incrementally with three additional commits since the last revision: - move make_clone to barrierSetC2 - move make_clone to barrier_stubc2.hpp - move clone_type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27279/files - new: https://git.openjdk.org/jdk/pull/27279/files/4dfa36ca..630e4be0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27279&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27279&range=02-03 Stats: 52 lines in 4 files changed: 24 ins; 25 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27279/head:pull/27279 PR: https://git.openjdk.org/jdk/pull/27279 From kdnilsen at openjdk.org Thu Jan 8 15:54:04 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 15:54:04 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v26] In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 22:14:15 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix confusing comment > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahGenerationalHeuristics.cpp line 178: > >> 176: bool need_to_finalize_mixed = false; >> 177: if (_generation->is_young()) { >> 178: need_to_finalize_mixed = heap->old_generation()->heuristics()->prime_collection_set(collection_set); > > We could push this logic for young collections down into `ShenandoahYoungHeuristics::choose_collection_set_from_regiondata` where `_generation` is always `ShenandoahYoungGeneration`. Good catch. Thanks for this suggestion. Much cleaner. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2672932077 From kdnilsen at openjdk.org Thu Jan 8 17:41:21 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 17:41:21 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v27] In-Reply-To: References: Message-ID: <3iF-Ny42_W-rxUDNFL7LVK4HtcRS8Hf3TiGbYWoWwOo=.55a4093f-c022-410c-8c9f-c0b270bdd194@github.com> > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Move special handling into ShenandoahYoungHeuristics::choose_collection_set_from_regiondata() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/a8520190..1002fb56 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=25-26 Stats: 100 lines in 22 files changed: 24 ins; 17 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Thu Jan 8 17:52:41 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 17:52:41 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v27] In-Reply-To: References:

Message-ID: On Wed, 7 Jan 2026 22:23:05 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Move special handling into ShenandoahYoungHeuristics::choose_collection_set_from_regiondata() > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 165: > >> 163: } >> 164: >> 165: assert(heap->is_concurrent_weak_root_in_progress(), "Must be doing weak roots now"); > > Was there a reason to remove this `assert`? May have been an accident. I'll put it back in and see what happens. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2673304717 From kdnilsen at openjdk.org Thu Jan 8 17:55:26 2026 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Jan 2026 17:55:26 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v25] In-Reply-To: References: