From kdnilsen at openjdk.org Sat Nov 1 05:40:56 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 1 Nov 2025 05:40:56 GMT Subject: RFR: 8358735: GenShen: bug in #undef'd code in block_start() [v4] In-Reply-To: References: Message-ID: > When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. > > The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: fix bugs in implementation of weakly referenced object handling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27353/files - new: https://git.openjdk.org/jdk/pull/27353/files/80198abe..e16ea231 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=02-03 Stats: 9 lines in 2 files changed: 1 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/27353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27353/head:pull/27353 PR: https://git.openjdk.org/jdk/pull/27353 From kbarrett at openjdk.org Sat Nov 1 07:34:03 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 1 Nov 2025 07:34:03 GMT Subject: RFR: 8369186: HotSpot Style Guide should permit some uses of the C++ Standard Library [v2] In-Reply-To: <4Yh4KeUItjdDYihe9d1u66tBROofb0pRLDKIdhw-XZo=.cf383363-c963-4871-941d-5e358f74c048@github.com> References: <6u0Ia6A2xQnv51Ti0Jchg6rnncfRRvtK5oGS963CeIA=.bc9c1481-60ad-443d-923b-dc86277dbb14@github.com> <4Yh4KeUItjdDYihe9d1u66tBROofb0pRLDKIdhw-XZo=.cf383363-c963-4871-941d-5e358f74c048@github.com> Message-ID: On Mon, 27 Oct 2025 08:24:41 GMT, Florian Weimer wrote: >> We (you and me, @fweimer-rh) discussed this a couple of years ago: >> https://mail.openjdk.org/pipermail/hotspot-dev/2023-December/082324.html >> >> Quoting from here: >> https://mail.openjdk.org/pipermail/hotspot-dev/2023-December/083142.html >> >> " >> Empirically, a recursive initialization attempt doesn't make any attempt to >> throw. Rather, it blocks forever waiting for a futex signal from a thread that >> succeeds in the initialization. Which of course will never come. >> >> And that makes sense, now that I've looked at the code. >> >> In __cxa_guard_acquire, with _GLIBCXX_USE_FUTEX, if the guard indicates >> initialization hasn't yet been completed, then it goes into a while loop. >> This while loop tries to claim initialization. Failing that, it checks >> whether initialization is complete. Failing that, it does a SYS_futex >> syscall, waiting for some other thread to perform the initialization. There's >> nothing there to check for recursion. >> >> throw_recursive_init_exception is only called if single-threaded (either by >> configuration or at runtime). >> " >> >> It doesn't look like there have been any relevant changes in that area since >> then. So I think there is still not a problem here. > > @kimbarrett Sorry, I forgot about the old thread. You can get the exception in a single-threaded scenario, something like this: > > > struct S { > S() { > static S s; > *this = s; > } > } global; > > > Maybe the actual rule is more like this? > >> Functions that may throw exceptions must not be used, unless individual calls ensure that these particular invocations cannot throw exceptions. Recursively entering a block-scoped static is undefined behavior. That some configurations of glibc might throw an exception in that situation (even despite the caller being compiled with exceptions disabled) seems like a mistake in glibc, and not really our concern. Our code should avoid such a situation because it's UB, regardless of whether the actual behavior involves exceptions or nasal demons. The exception only gets thrown when the application is single-threaded. But at least the common way to start java (via the launcher) is already multi-threaded on entry to Threads::create_vm(). So that case doesn't normally apply to us anyway. Also, I really don't think we want people trying to figure out whether a particular call might or might not throw (neither when writing nor when reading code). So no, I don't think the proposed rule should be changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27601#discussion_r2483178071 From kvn at openjdk.org Sat Nov 1 15:06:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 1 Nov 2025 15:06:06 GMT Subject: RFR: 8369186: HotSpot Style Guide should permit some uses of the C++ Standard Library [v4] In-Reply-To: References: Message-ID: On Sat, 18 Oct 2025 17:11:48 GMT, Kim Barrett wrote: >> Please review this change to the HotSpot Style Guide to suggest that C++ >> Standard Library components may be used, after appropriate vetting and >> discussion, rather than just a blanket "no, don't use it" with a few very >> narrow exceptions. It provides some guidance on that vetting process and >> the criteria to use, along with usage patterns. >> >> In particular, it proposes that Standard Library headers should not be >> included directly, but instead through HotSpot-provided wrapper headers. This >> gives us a place to document usage, provide workarounds for platform issues in >> a single place, and so on. >> >> Such wrapper headers are provided by this PR for ``, ``, and >> ``, along with updates to use them. I have a separate change for >> `` that I plan to propose later, under JDK-8369187. There will be >> additional followups for other C compatibility headers besides ``. >> >> This PR also cleans up some nomenclature issues around forbid vs exclude and >> the like. >> >> Testing: mach5 tier1-5, GHA sanity tests > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into stdlib-header-wrappers > - Merge branch 'master' into stdlib-header-wrappers > - Merge branch 'master' into stdlib-header-wrappers > - jrose comments > - move tuple to undecided category > - add wrapper for > - add wrapper for > - add wrapper for > - style guide permits some standard library facilities Approved ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27601#pullrequestreview-3407565447 From duke at openjdk.org Sat Nov 1 16:00:22 2025 From: duke at openjdk.org (duke) Date: Sat, 1 Nov 2025 16:00:22 GMT Subject: Withdrawn: 8366122: Shenandoah: Implement efficient support for object count after gc events In-Reply-To: <7KUQJooZsasGtVU-HaCj7h8_rMFBX13d4yW3T4PfpBw=.07a12734-af51-45cc-9bbb-d6573806478a@github.com> References: <7KUQJooZsasGtVU-HaCj7h8_rMFBX13d4yW3T4PfpBw=.07a12734-af51-45cc-9bbb-d6573806478a@github.com> Message-ID: On Thu, 28 Aug 2025 01:30:39 GMT, pf0n wrote: > ### Summary > > The new implementation of ObjectCountAfterGC for Shenandoah piggybacks off of the existing marking phases and records strongly marked objects in a histogram. If the event is disabled, the original marking closures are used. When enabled new mark-and-count closures are used by the worker threads. Each worker thread updates its local histogram as it marks an object. These local histograms are merged at the conclusion of the marking phase under a mutex. The event is emitted outside a safepoint. Because (most) Shenandoah's marking is done concurrently, so is the object counting work. > > ### Performance > The performance test were ran using the Extremem benchmark on a default and stress workload. (will edit this section to include data after average time and test for GenShen) > > #### Default workload: > ObjectCountAfterGC disabled (master branch): > `[807.216s][info][gc,stats ] Pause Init Mark (G) = 0.003 s (a = 264 us)` > `[807.216s][info][gc,stats ] Pause Init Mark (N) = 0.001 s (a = 91 us)` > `[807.216s][info][gc,stats ] Concurrent Mark Roots = 0.041 s (a = 4099 us)` > `[807.216s][info][gc,stats ] Concurrent Marking = 1.660 s (a = 166035 us)` > `[807.216s][info][gc,stats ] Pause Final Mark (G) = 0.004 s (a = 446 us) ` > `[807.216s][info][gc,stats ] Pause Final Mark (G) = 0.004 s (a = 446 us) ` > `[807.216s][info][gc,stats ] Pause Final Mark (N) = 0.004 s (a = 357 us)` > > ObjectCountAfterGC disabled (feature branch): > `[807.104s][info][gc,stats ] Pause Init Mark (G) = 0.003 s (a = 302 us)` > `[807.104s][info][gc,stats ] Pause Init Mark (N) = 0.001 s (a = 92 us) ` > `[807.104s][info][gc,stats ] Concurrent Mark Roots = 0.048 s (a = 4827 us)` > `[807.104s][info][gc,stats ] Concurrent Marking = 1.666 s (a = 166638 us) ` > `[807.104s][info][gc,stats ] Pause Final Mark (G) = 0.006 s (a = 603 us)` > `[807.104s][info][gc,stats ] Pause Final Mark (N) = 0.005 s (a = 516 us)` > > ObjectCountAfterGC enabled (feature branch) > `[807.299s][info][gc,stats ] Pause Init Mark (G) = 0.002 s (a = 227 us)` > `[807.299s][info][gc,stats ] Pause Init Mark (N) = 0.001 s (a = 89 us) ` > `[807.299s][info][gc,stats ] Concurrent Mark Roots = 0.053 s (a = 5279 us)` > `[807.299s][info][gc,st... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26977 From kdnilsen at openjdk.org Sat Nov 1 19:12:22 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 1 Nov 2025 19:12:22 GMT Subject: RFR: 8358735: GenShen: bug in #undef'd code in block_start() [v5] In-Reply-To: References: Message-ID: <276WbU_gLrJZkxJ6SGxIBBrDdg3NprzZSsnwqSGMpPw=.b3a1a4cc-daae-4cdc-a677-4833f5f6169d@github.com> > When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. > > The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge remote-tracking branch 'jdk/master' into finish-block-start - fix bugs in implementation of weakly referenced object handling - Fixup handling of weakly marked objects in remembered set - fix idiosyncratic formatting - cleanup code for review - Fix order of include files - fix white space - disable for debug build, alphabetic order for includes - add explicit typecast to avoid compiler warning message - Remove troublesome assert that assumes lock is held - ... and 19 more: https://git.openjdk.org/jdk/compare/13b3d2fc...d341522e ------------- Changes: https://git.openjdk.org/jdk/pull/27353/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=04 Stats: 871 lines in 11 files changed: 831 ins; 6 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/27353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27353/head:pull/27353 PR: https://git.openjdk.org/jdk/pull/27353 From kbarrett at openjdk.org Sun Nov 2 07:05:24 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 2 Nov 2025 07:05:24 GMT Subject: RFR: 8369186: HotSpot Style Guide should permit some uses of the C++ Standard Library [v4] In-Reply-To: References: Message-ID: On Sat, 18 Oct 2025 17:11:48 GMT, Kim Barrett wrote: >> Please review this change to the HotSpot Style Guide to suggest that C++ >> Standard Library components may be used, after appropriate vetting and >> discussion, rather than just a blanket "no, don't use it" with a few very >> narrow exceptions. It provides some guidance on that vetting process and >> the criteria to use, along with usage patterns. >> >> In particular, it proposes that Standard Library headers should not be >> included directly, but instead through HotSpot-provided wrapper headers. This >> gives us a place to document usage, provide workarounds for platform issues in >> a single place, and so on. >> >> Such wrapper headers are provided by this PR for ``, ``, and >> ``, along with updates to use them. I have a separate change for >> `` that I plan to propose later, under JDK-8369187. There will be >> additional followups for other C compatibility headers besides ``. >> >> This PR also cleans up some nomenclature issues around forbid vs exclude and >> the like. >> >> Testing: mach5 tier1-5, GHA sanity tests > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into stdlib-header-wrappers > - Merge branch 'master' into stdlib-header-wrappers > - Merge branch 'master' into stdlib-header-wrappers > - jrose comments > - move tuple to undecided category > - add wrapper for > - add wrapper for > - add wrapper for > - style guide permits some standard library facilities Thanks for reviews and comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27601#issuecomment-3477511265 From kbarrett at openjdk.org Sun Nov 2 07:05:26 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 2 Nov 2025 07:05:26 GMT Subject: Integrated: 8369186: HotSpot Style Guide should permit some uses of the C++ Standard Library In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 07:11:51 GMT, Kim Barrett wrote: > Please review this change to the HotSpot Style Guide to suggest that C++ > Standard Library components may be used, after appropriate vetting and > discussion, rather than just a blanket "no, don't use it" with a few very > narrow exceptions. It provides some guidance on that vetting process and > the criteria to use, along with usage patterns. > > In particular, it proposes that Standard Library headers should not be > included directly, but instead through HotSpot-provided wrapper headers. This > gives us a place to document usage, provide workarounds for platform issues in > a single place, and so on. > > Such wrapper headers are provided by this PR for ``, ``, and > ``, along with updates to use them. I have a separate change for > `` that I plan to propose later, under JDK-8369187. There will be > additional followups for other C compatibility headers besides ``. > > This PR also cleans up some nomenclature issues around forbid vs exclude and > the like. > > Testing: mach5 tier1-5, GHA sanity tests This pull request has now been integrated. Changeset: e8a1a870 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/e8a1a8707ee6192c85ac62a2a51c815e07613c38 Stats: 670 lines in 68 files changed: 430 ins; 134 del; 106 mod 8369186: HotSpot Style Guide should permit some uses of the C++ Standard Library Reviewed-by: jrose, lkorinth, iwalulya, kvn, stefank ------------- PR: https://git.openjdk.org/jdk/pull/27601 From andrew at openjdk.org Sun Nov 2 22:55:48 2025 From: andrew at openjdk.org (Andrew John Hughes) Date: Sun, 2 Nov 2025 22:55:48 GMT Subject: RFR: Merge jdk8u:master Message-ID: <4Bn0JXdtXoqusP9GZa4nbc8wHQ67W8eGH4KioZ69XK8=.7bfd3448-9996-4fc7-aaac-8285c7e626c3@github.com> Merge `jdk8u472-b08` ------------- Commit messages: - Merge jdk8u472-b08 - 8360937: Enhance certificate handling - 8356294: Enhance Path Factories - 8352637: Enhance bytecode verification The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/shenandoah-jdk8u/pull/114/files Stats: 196 lines in 11 files changed: 157 ins; 8 del; 31 mod Patch: https://git.openjdk.org/shenandoah-jdk8u/pull/114.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk8u.git pull/114/head:pull/114 PR: https://git.openjdk.org/shenandoah-jdk8u/pull/114 From jsikstro at openjdk.org Mon Nov 3 09:57:38 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 3 Nov 2025 09:57:38 GMT Subject: RFR: 8371131: Cleanup Thread parameter in CollectedHeap TLAB methods Message-ID: Hello, The last usage of the the Thread parameter that is passed to `CollectedHeap::{tlab_capacity, tlab_used, unsafe_max_tlab_alloc}` was removed in [JDK-8370345](https://bugs.openjdk.org/browse/JDK-8370345). Following this we should remove the Thread parameter completely from CollectedHeap and all GCs that derive from it. Testing: * Running through tier1-2 ------------- Commit messages: - 8371131: Cleanup Thread parameter in CollectedHeap TLAB methods Changes: https://git.openjdk.org/jdk/pull/28107/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28107&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371131 Stats: 65 lines in 20 files changed: 0 ins; 2 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/28107.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28107/head:pull/28107 PR: https://git.openjdk.org/jdk/pull/28107 From jsikstro at openjdk.org Mon Nov 3 09:57:39 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 3 Nov 2025 09:57:39 GMT Subject: RFR: 8371131: Cleanup Thread parameter in CollectedHeap TLAB methods In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 09:49:02 GMT, Joel Sikstr?m wrote: > Hello, > > The last usage of the the Thread parameter that is passed to `CollectedHeap::{tlab_capacity, tlab_used, unsafe_max_tlab_alloc}` was removed in [JDK-8370345](https://bugs.openjdk.org/browse/JDK-8370345). Following this we should remove the Thread parameter completely from CollectedHeap and all GCs that derive from it. > > Testing: > * Running through tier1-2 src/hotspot/share/gc/shared/threadLocalAllocBuffer.inline.hpp line 57: > 55: // Compute the size for the new TLAB. > 56: // The "last" tlab may be smaller to reduce fragmentation. > 57: // unsafe_max_tlab_alloc is just a hint. I suggest we remove this comment as it is misleading since not all GCs take the `available_size` calculated here as a hint, but an unconditional size. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28107#discussion_r2485894298 From ayang at openjdk.org Mon Nov 3 10:02:04 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 3 Nov 2025 10:02:04 GMT Subject: RFR: 8371131: Cleanup Thread parameter in CollectedHeap TLAB methods In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 09:49:02 GMT, Joel Sikstr?m wrote: > Hello, > > The last usage of the the Thread parameter that is passed to `CollectedHeap::{tlab_capacity, tlab_used, unsafe_max_tlab_alloc}` was removed in [JDK-8370345](https://bugs.openjdk.org/browse/JDK-8370345). Following this we should remove the Thread parameter completely from CollectedHeap and all GCs that derive from it. > > Testing: > * Running through tier1-2 Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28107#pullrequestreview-3410254336 From tschatzl at openjdk.org Mon Nov 3 12:09:23 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Nov 2025 12:09:23 GMT Subject: RFR: 8371131: Cleanup Thread parameter in CollectedHeap TLAB methods In-Reply-To: References: Message-ID: <88fRRh2O2B2STEwKyd27MN-c6akGhu6JIRd5IWXDzXo=.2d643739-44c4-4f3b-83b7-6def71ff28a4@github.com> On Mon, 3 Nov 2025 09:49:02 GMT, Joel Sikstr?m wrote: > Hello, > > The last usage of the the Thread parameter that is passed to `CollectedHeap::{tlab_capacity, tlab_used, unsafe_max_tlab_alloc}` was removed in [JDK-8370345](https://bugs.openjdk.org/browse/JDK-8370345). Following this we should remove the Thread parameter completely from CollectedHeap and all GCs that derive from it. > > Testing: > * Oracle's tier1-2 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28107#pullrequestreview-3410714487 From wkemper at openjdk.org Mon Nov 3 18:01:31 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 3 Nov 2025 18:01:31 GMT Subject: RFR: 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old [v3] In-Reply-To: References: Message-ID: <-MlfBVpHD57gSd_4_0iIDHI0Cv6HIpTj9H9mX-UHS7g=.54423f43-32dd-49c4-8e7e-705dbbc8c825@github.com> > When GenShen is only marking the old generation, we do not need the SATB mechanism to preserve young pointers. We currently filter these out of the SATB buffers during the final-update-refs and init-mark safepoints. This increases latency and introduces no small amount of complexity. It should be possible to instead filter out these pointers when the SATB buffers are 'compacted' before being 'completed'. > > # Background > When GenShen is marking the old generation it leaves the SATB barrier enabled. When a young collection interrupts old marking, it creates a situation where a mutator thread could overwrite a field holding a pointer into a collection set region. The SATB barrier will dutifully place this object in the SATB queue. If this pointer makes it into a mark queue, the marking thread will crash. Prior to this change, GenShen filtered out such pointers _after_ the thread local SATB buffers were completed. After this change, such pointers are filtered out _before_ the buffers are completed. This is more inline with the natural way of things. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots - Flush SATB buffers upon entering degenerated cycle when old marking is in progress This has to happen at least once during the degenerated cycle. Doing it at the start, rather than the end, simplifies the verifier. - Fix typo in comment - Remove duplicate satb flush closure - Only flush satb once during degenerated cycle - Cleanup and comments - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots - Fix assertion - Oops, move inline definition out of ifdef ASSERT - ... and 11 more: https://git.openjdk.org/jdk/compare/1922c4fd...4bd602de ------------- Changes: https://git.openjdk.org/jdk/pull/27983/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27983&range=02 Stats: 309 lines in 11 files changed: 98 ins; 188 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/27983.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27983/head:pull/27983 PR: https://git.openjdk.org/jdk/pull/27983 From duke at openjdk.org Tue Nov 4 00:11:20 2025 From: duke at openjdk.org (duke) Date: Tue, 4 Nov 2025 00:11:20 GMT Subject: Withdrawn: 8354555: Add generic JFR events for TaskTerminator In-Reply-To: <_7FP2wNe8p3N8SxKdmCN1x4zKO8TT5JWRcWEt51i35c=.4fbac292-3cb7-48b9-922e-1114f74e0549@github.com> References: <_7FP2wNe8p3N8SxKdmCN1x4zKO8TT5JWRcWEt51i35c=.4fbac292-3cb7-48b9-922e-1114f74e0549@github.com> Message-ID: <6C63KrwSNS7jTc5SHtpknktCbt6kCAx12FM6NDEDPt8=.0da94a30-3bae-40ac-b4c0-a40b55876123@github.com> On Wed, 16 Apr 2025 08:24:15 GMT, Xiaolong Peng wrote: > The purpose of the PR is to add generic JFR events for TaskTerminator to track the attempts and timings that GC threads have tried to terminate GC tasks. > > Today only G1 emits JFR event with name `Termination` from [G1ParEvacuateFollowersClosure](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/g1/g1YoungCollector.cpp#L555-L563), all other garbage collectors don't emit any JFR event for the termination attempt at all. > > By adding this, it gives performance engineers the visibility to the termination attempts and termination time when GC threads trying to finish GC tasks, we could build tool to analyze the jfr events to determine if there is potential data structure issue in application code, e.g. very large LinkedList or LinkedBlockingQueue. > > For the test, I have manually tested different GCs with Flight Recording enabled and verified the events: > G1: > > jdk.GCPhaseParallel { > startTime = 23:09:34.124 (2025-05-22) > duration = 0.0108 ms > gcId = 0 > gcWorkerId = 8 > name = "Termination" > eventThread = "GC Thread#4" (osThreadId = 20483) > } > > jdk.GCPhaseParallel { > startTime = 23:09:34.124 (2025-05-22) > duration = 0.0467 ms > gcId = 0 > gcWorkerId = 2 > name = "Termination" > eventThread = "GC Thread#2" (osThreadId = 21251) > } > > jdk.GCPhaseParallel { > startTime = 23:09:34.124 (2025-05-22) > duration = 0.0474 ms > gcId = 0 > gcWorkerId = 1 > name = "Termination" > eventThread = "GC Thread#8" (osThreadId = 36359) > } > jdk.GCPhaseParallel { > startTime = 23:09:41.925 (2025-05-22) > duration = 0.000834 ms > gcId = 14 > gcWorkerId = 7 > name = "Termination: Parallel Marking" > eventThread = "GC Thread#1" (osThreadId = 21507) > } > > jdk.GCPhaseParallel { > startTime = 23:09:41.925 (2025-05-22) > duration = 0.000166 ms > gcId = 14 > gcWorkerId = 7 > name = "Termination: Parallel Marking" > eventThread = "GC Thread#1" (osThreadId = 21507) > } > > > Shenandoah: > > jdk.GCPhaseParallel { > startTime = 23:39:58.890 (2025-05-22) > duration = 0.0202 ms > gcId = 0 > gcWorkerId = 0 > name = "Termination: Concurrent Mark" > eventThread = "Shenandoah GC Threads#3" (osThreadId = 13827) > } > > jdk.GCPhaseParallel { > startTime = 23:39:58.890 (2025-05-22) > duration = 0.0205 ms > gcId = 0 > gcWorkerId = 1 > name = "Termination: Concurrent Mark" > eventThread = "Shenandoah GC Threads#1" (osThreadId = 14339) > } > > jdk.GCPhaseParallel { > startTime = 23:39:58.890 (2025-05-22) > duration = 0.0127 ms > gcId = 0 > gcWorkerId = 5 > name = "Termination: Final Mark" > eventThread = "Shenandoah G... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24676 From jsikstro at openjdk.org Tue Nov 4 09:39:52 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 4 Nov 2025 09:39:52 GMT Subject: RFR: 8371131: Cleanup Thread parameter in CollectedHeap TLAB methods In-Reply-To: References: Message-ID: On Mon, 3 Nov 2025 09:59:39 GMT, Albert Mingkun Yang wrote: >> Hello, >> >> The last usage of the the Thread parameter that is passed to `CollectedHeap::{tlab_capacity, tlab_used, unsafe_max_tlab_alloc}` was removed in [JDK-8370345](https://bugs.openjdk.org/browse/JDK-8370345). Following this we should remove the Thread parameter completely from CollectedHeap and all GCs that derive from it. >> >> Testing: >> * Oracle's tier1-2 > > Marked as reviewed by ayang (Reviewer). Thank you for the reviews! @albertnetymk @tschatzl Nice to get this cleanup in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28107#issuecomment-3484890434 From jsikstro at openjdk.org Tue Nov 4 09:39:54 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 4 Nov 2025 09:39:54 GMT Subject: Integrated: 8371131: Cleanup Thread parameter in CollectedHeap TLAB methods In-Reply-To: References: Message-ID: <5BDwGukXMsP_RV8oGiVADupc5T2gZRhcVe_GGXYRhJE=.a3b3714e-f322-4d13-9d22-f583e45d74ea@github.com> On Mon, 3 Nov 2025 09:49:02 GMT, Joel Sikstr?m wrote: > Hello, > > The last usage of the the Thread parameter that is passed to `CollectedHeap::{tlab_capacity, tlab_used, unsafe_max_tlab_alloc}` was removed in [JDK-8370345](https://bugs.openjdk.org/browse/JDK-8370345). Following this we should remove the Thread parameter completely from CollectedHeap and all GCs that derive from it. > > Testing: > * Oracle's tier1-2 This pull request has now been integrated. Changeset: 19cca0a2 Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/19cca0a2a829396291fa4140b2082ef518425518 Stats: 65 lines in 20 files changed: 0 ins; 2 del; 63 mod 8371131: Cleanup Thread parameter in CollectedHeap TLAB methods Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/28107 From duke at openjdk.org Tue Nov 4 15:41:20 2025 From: duke at openjdk.org (duke) Date: Tue, 4 Nov 2025 15:41:20 GMT Subject: git: openjdk/shenandoah-jdk8u: master: 4 new changesets Message-ID: <4f1ce1d1-1b68-4616-9434-250ce60267e9@openjdk.org> Changeset: 32730331 Branch: master Author: Jan Kratochvil Committer: Andrew John Hughes Date: 2025-09-10 18:12:18 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/327303315367f15113c17c8525d8eb3bc0f66782 8352637: Enhance bytecode verification Reviewed-by: fferrari, andrew Backport-of: 974f4da2e53ebde8b06224f4ba0b80aa74c5f434 ! hotspot/src/share/vm/classfile/stackMapTable.cpp ! hotspot/src/share/vm/classfile/stackMapTable.hpp ! hotspot/src/share/vm/classfile/verifier.cpp ! hotspot/src/share/vm/interpreter/bytecodeStream.hpp ! jdk/src/share/native/common/check_code.c Changeset: dde48aed Branch: master Author: Aleksei Voitylov Committer: Andrew John Hughes Date: 2025-09-03 01:47:08 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/dde48aedce2fe649e1737b3eec829428381bcb8f 8356294: Enhance Path Factories Reviewed-by: abakhtin, fferrari, andrew Backport-of: 2c7f45612d11199a9d5eaa6d61a2893ec4afa687 ! jaxp/src/com/sun/org/apache/xpath/internal/jaxp/XPathFactoryImpl.java ! jaxp/src/com/sun/org/apache/xpath/internal/jaxp/XPathImpl.java ! jaxp/src/jdk/xml/internal/XMLSecurityManager.java Changeset: d5ac2ad8 Branch: master Author: Alexey Bakhtin Committer: Andrew John Hughes Date: 2025-08-27 13:50:17 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/d5ac2ad89a369697a48e7f3e6b889e22afa50a2f 8360937: Enhance certificate handling Reviewed-by: fferrari, andrew Backport-of: d3b1c2be9e87aad07cac29d94679130fe5807c17 ! jdk/src/share/classes/sun/security/util/DerValue.java ! jdk/src/share/classes/sun/security/x509/AVA.java ! jdk/test/java/security/testlibrary/CertificateBuilder.java Changeset: 3a926122 Branch: master Author: Andrew Hughes Date: 2025-10-17 23:09:02 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/3a926122dc31e0f0a184d7bedba27167079763e8 Merge jdk8u472-b08 Added tag jdk8u472-b08 for changeset d5ac2ad89a3 From andrew at openjdk.org Tue Nov 4 15:41:30 2025 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 4 Nov 2025 15:41:30 GMT Subject: git: openjdk/shenandoah-jdk8u: Added tag shenandoah8u472-b08 for changeset 3a926122 Message-ID: Tagged by: Andrew John Hughes Date: 2025-11-02 22:45:33 +0000 Added tag shenandoah8u472-b08 for changeset 3a926122dc3 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRMled0VQO0j4ExaDP2g+bNZZCIgUCaQffDQAKCRDP2g+bNZZC IhRjAP0UHp6GoRGLeZut37FB0UrkHMxUacJcclpqvX3dNsYOwwEA0MVnK/+S6xgQ mYoxhh5AiJEqWEGud5+fON1z1Ld75wU= =2BcR -----END PGP SIGNATURE----- Changeset: 3a926122 Author: Andrew Hughes Date: 2025-10-17 23:09:02 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/3a926122dc31e0f0a184d7bedba27167079763e8 From andrew at openjdk.org Tue Nov 4 15:41:34 2025 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 4 Nov 2025 15:41:34 GMT Subject: git: openjdk/shenandoah-jdk8u: Added tag shenandoah8u472-ga for changeset 3a926122 Message-ID: Tagged by: Andrew John Hughes Date: 2025-11-02 22:46:06 +0000 Added tag shenandoah8u472-ga for changeset 3a926122dc3 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRMled0VQO0j4ExaDP2g+bNZZCIgUCaQffLgAKCRDP2g+bNZZC ItibAQC6StJRRgeno5BVS5/KpH1P69A1bLO6dD/C+YpbArx4GQD/XDKkWTUlQCAS UiQ5sDj7Gp8h0I0wwkH5/pw0HXGeHgk= =UHSn -----END PGP SIGNATURE----- Changeset: 3a926122 Author: Andrew Hughes Date: 2025-10-17 23:09:02 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/3a926122dc31e0f0a184d7bedba27167079763e8 From iris at openjdk.org Tue Nov 4 15:42:01 2025 From: iris at openjdk.org (Iris Clark) Date: Tue, 4 Nov 2025 15:42:01 GMT Subject: Withdrawn: Merge jdk8u:master In-Reply-To: <4Bn0JXdtXoqusP9GZa4nbc8wHQ67W8eGH4KioZ69XK8=.7bfd3448-9996-4fc7-aaac-8285c7e626c3@github.com> References: <4Bn0JXdtXoqusP9GZa4nbc8wHQ67W8eGH4KioZ69XK8=.7bfd3448-9996-4fc7-aaac-8285c7e626c3@github.com> Message-ID: On Sun, 2 Nov 2025 22:50:43 GMT, Andrew John Hughes wrote: > Merge `jdk8u472-b08` This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/shenandoah-jdk8u/pull/114 From andrew at openjdk.org Tue Nov 4 15:42:00 2025 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 4 Nov 2025 15:42:00 GMT Subject: RFR: Merge jdk8u:master [v2] In-Reply-To: <4Bn0JXdtXoqusP9GZa4nbc8wHQ67W8eGH4KioZ69XK8=.7bfd3448-9996-4fc7-aaac-8285c7e626c3@github.com> References: <4Bn0JXdtXoqusP9GZa4nbc8wHQ67W8eGH4KioZ69XK8=.7bfd3448-9996-4fc7-aaac-8285c7e626c3@github.com> Message-ID: > Merge `jdk8u472-b08` Andrew John Hughes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk8u/pull/114/files - new: https://git.openjdk.org/shenandoah-jdk8u/pull/114/files/3a926122..3a926122 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk8u&pr=114&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk8u&pr=114&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah-jdk8u/pull/114.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk8u.git pull/114/head:pull/114 PR: https://git.openjdk.org/shenandoah-jdk8u/pull/114 From xpeng at openjdk.org Tue Nov 4 16:47:20 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 4 Nov 2025 16:47:20 GMT Subject: RFR: 8370850: Shenandoah: Simplify collector allocation to save unnecessary region iteration [v3] In-Reply-To: References: <7KjxoZ7UQ9LugnCVgBT6moBrQEfxmPNBXrvyWXD-MQ8=.37907346-c252-4daf-b7bb-9d0f89291753@github.com> Message-ID: On Fri, 31 Oct 2025 21:00:46 GMT, Xiaolong Peng wrote: >> The young and old collector reserves both have `FREE` regions, but we need to make sure we don't exceed the reserves for one or the other. > > I have added the can_allocate_in_new_region check back, it should have same behavior now. I have removed can_allocate_in_new_region again after merging @kdnilsen's change unifying accountings, since we don't have consistency in accountings anymore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28036#discussion_r2491266912 From wkemper at openjdk.org Tue Nov 4 22:27:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Nov 2025 22:27:52 GMT Subject: RFR: 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old [v3] In-Reply-To: <-MlfBVpHD57gSd_4_0iIDHI0Cv6HIpTj9H9mX-UHS7g=.54423f43-32dd-49c4-8e7e-705dbbc8c825@github.com> References: <-MlfBVpHD57gSd_4_0iIDHI0Cv6HIpTj9H9mX-UHS7g=.54423f43-32dd-49c4-8e7e-705dbbc8c825@github.com> Message-ID: <7kzHfoGKxHS3U5xhG81AuWhzghnHr-iWKuMyDZe-o6Q=.0e08f7b0-7ced-42df-9e39-d3134a095b73@github.com> On Mon, 3 Nov 2025 18:01:31 GMT, William Kemper wrote: >> When GenShen is only marking the old generation, we do not need the SATB mechanism to preserve young pointers. We currently filter these out of the SATB buffers during the final-update-refs and init-mark safepoints. This increases latency and introduces no small amount of complexity. It should be possible to instead filter out these pointers when the SATB buffers are 'compacted' before being 'completed'. >> >> # Background >> When GenShen is marking the old generation it leaves the SATB barrier enabled. When a young collection interrupts old marking, it creates a situation where a mutator thread could overwrite a field holding a pointer into a collection set region. The SATB barrier will dutifully place this object in the SATB queue. If this pointer makes it into a mark queue, the marking thread will crash. Prior to this change, GenShen filtered out such pointers _after_ the thread local SATB buffers were completed. After this change, such pointers are filtered out _before_ the buffers are completed. This is more inline with the natural way of things. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots > - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots > - Flush SATB buffers upon entering degenerated cycle when old marking is in progress > > This has to happen at least once during the degenerated cycle. Doing it at the start, rather than the end, simplifies the verifier. > - Fix typo in comment > - Remove duplicate satb flush closure > - Only flush satb once during degenerated cycle > - Cleanup and comments > - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots > - Fix assertion > - Oops, move inline definition out of ifdef ASSERT > - ... and 11 more: https://git.openjdk.org/jdk/compare/1922c4fd...4bd602de I've run more tests and confirmed that critical and max `jops` are 3% improved on a variety of heap sizes and configurations. Additionally, after running more tests with `extremem`, the apparent regression at `p100` has evaporated: genshen/extremem/control Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum sales_transaction_p100 | 23 | 38817.000 | 1685.651 | 1687.696 | 1690.158 | 84.299 | 1539.000 | 1823.000 browsing_history_p100 | 23 | 32145.000 | 1391.269 | 1397.609 | 1382.211 | 139.779 | 1175.000 | 1769.000 customer_replacement_p100 | 23 | 107141.000 | 4652.940 | 4658.304 | 4664.211 | 227.279 | 4093.000 | 5053.000 product_replacement_p100 | 23 | 58315.000 | 2526.181 | 2535.435 | 2523.737 | 223.834 | 2203.000 | 3041.000 customer_preparation_p100 | 23 | 142953.000 | 5442.520 | 6215.348 | 6064.000 | 3132.333 | 2502.000 | 11547.000 customer_purchase_p100 | 23 | 491201.000 | 18622.792 | 21356.565 | 20385.263 | 11424.057 | 6582.000 | 48274.000 customer_save_for_later_p100 | 23 | 880105.000 | 36675.774 | 38265.435 | 37248.895 | 11777.757 | 23023.000 | 65591.000 customer_abandonment_p100 | 23 | 701197.000 | 28578.563 | 30486.826 | 29311.105 | 11521.080 | 16390.000 | 59179.000 genshen/extremem/experiment Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum sales_transaction_p100 | 23 | 40007.000 | 1730.529 | 1739.435 | 1711.947 | 193.896 | 1528.000 | 2490.000 browsing_history_p100 | 23 | 33032.000 | 1425.803 | 1436.174 | 1436.316 | 175.753 | 1123.000 | 1729.000 customer_replacement_p100 | 23 | 107516.000 | 4653.546 | 4674.609 | 4599.579 | 488.862 | 4072.000 | 6579.000 product_replacement_p100 | 23 | 56647.000 | 2451.855 | 2462.913 | 2469.789 | 235.896 | 1968.000 | 2903.000 customer_preparation_p100 | 23 | 136482.000 | 5224.974 | 5934.000 | 5766.684 | 3027.151 | 2924.000 | 10652.000 customer_purchase_p100 | 23 | 464888.000 | 17675.074 | 20212.522 | 18641.263 | 11355.233 | 7921.000 | 55420.000 customer_save_for_later_p100 | 23 | 854932.000 | 35744.969 | 37170.957 | 35660.579 | 11323.562 | 24370.000 | 71376.000 customer_abandonment_p100 | 23 | 686923.000 | 28261.963 | 29866.217 | 28589.737 | 10907.943 | 17791.000 | 65329.000 Indeed, the _experiment_ looks slightly better in some cases (slightly worse in others). The results also show the expected reduction in safepoint times as we are no longer flushing SATB buffers during `final_update_refs` or `init_mark`: -133.33% extremem/shenandoahfinalupdaterefs_stopped_max p=0.00000 (Welch's T-Test) Control: 1.103 (+/- 0.14 ) 23 Test: 0.473 (+/- 0.06 ) 23 The effect is more pronounced on `specjbb2015`: -216.54% specjbb2015/shenandoahfinalupdaterefs_stopped_max p=0.00000 (Mann-Whitney) Control: 2.217 (+/- 0.09 ) 22 Test: 0.700 (+/- 0.19 ) 22 -791.43% specjbb2015/shenandoahinitmark_stopped_max p=0.00173 (Mann-Whitney) Control: 3.408 (+/- 3.12 ) 22 Test: 0.382 (+/- 0.07 ) 22 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27983#issuecomment-3488220025 From wkemper at openjdk.org Tue Nov 4 23:10:34 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Nov 2025 23:10:34 GMT Subject: RFR: 8370850: Shenandoah: Simplify collector allocation to save unnecessary region iteration [v3] In-Reply-To: References: <7KjxoZ7UQ9LugnCVgBT6moBrQEfxmPNBXrvyWXD-MQ8=.37907346-c252-4daf-b7bb-9d0f89291753@github.com> Message-ID: On Tue, 4 Nov 2025 16:43:58 GMT, Xiaolong Peng wrote: >> I have added the can_allocate_in_new_region check back, it should have same behavior now. > > I have removed can_allocate_in_new_region again after merging @kdnilsen's change unifying accountings, since we don't have consistency in accountings anymore. We still need to enforce that requests for a young evacuation don't take memory that was reserved for old evacuations. `can_allocate_in_new_region` isn't about consistency between freeset and generation accounting, it's used to maintain old/young collector reserves. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28036#discussion_r2492277440 From kdnilsen at openjdk.org Wed Nov 5 02:24:11 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 5 Nov 2025 02:24:11 GMT Subject: RFR: 8358735: GenShen: bug in #undef'd code in block_start() [v6] In-Reply-To: References: Message-ID: > When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. > > The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Revert "Fixup handling of weakly marked objects in remembered set" This reverts commit 80198abe5d06c3532d9a43a53691376e990ed45f. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27353/files - new: https://git.openjdk.org/jdk/pull/27353/files/d341522e..643cdfd6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=04-05 Stats: 28 lines in 1 file changed: 0 ins; 16 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/27353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27353/head:pull/27353 PR: https://git.openjdk.org/jdk/pull/27353 From duke at openjdk.org Wed Nov 5 05:08:57 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 5 Nov 2025 05:08:57 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v8] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - fix assert - add more assert - rid of access.addr().type() - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Fix build - Fix test failed - 8344116: C2: remove slice parameter from LoadNode::make ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/ea83736e..6d122039 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=06-07 Stats: 526337 lines in 7522 files changed: 349612 ins; 122587 del; 54138 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From duke at openjdk.org Wed Nov 5 05:09:05 2025 From: duke at openjdk.org (Zihao Lin) Date: Wed, 5 Nov 2025 05:09:05 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v6] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:04:12 GMT, Roland Westrelin wrote: >> Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into 8344116 >> - Fix build >> - Fix test failed >> - 8344116: C2: remove slice parameter from LoadNode::make > > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 223: > >> 221: MergeMemNode* mm = opt_access.mem(); >> 222: PhaseGVN& gvn = opt_access.gvn(); >> 223: Node* mem = mm->memory_at(gvn.C->get_alias_index(access.addr().type())); > > Can we get rid of all uses of `access.addr().type()`? Get rid of all access.addr().type() > src/hotspot/share/gc/shared/c2/cardTableBarrierSetC2.cpp line 105: > >> 103: // stores. In theory we could relax the load from ctrl() to >> 104: // no_ctrl, but that doesn't buy much latitude. >> 105: Node* card_val = __ load( __ ctrl(), card_adr, TypeInt::BYTE, T_BYTE); > > We could asssert that `C->get_alias_index(kit->type(card_adr) == Compile::AliasIdxRaw`, that is that computed slice is the same as hardcoded slide. Similar asserts could be added for every location where a slice/address type is removed in this patch. Sure, I add more assert for this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2484816831 PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2492987998 From ayang at openjdk.org Wed Nov 5 10:16:39 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Nov 2025 10:16:39 GMT Subject: RFR: 8371321: Remove unused last arg of BarrierSetAssembler::arraycopy_epilogue Message-ID: Removing effectively dead code. Test: tier1, GHA ------------- Commit messages: - remove-barrier-arg Changes: https://git.openjdk.org/jdk/pull/28146/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28146&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371321 Stats: 38 lines in 16 files changed: 0 ins; 6 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/28146.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28146/head:pull/28146 PR: https://git.openjdk.org/jdk/pull/28146 From fandreuzzi at openjdk.org Wed Nov 5 10:33:04 2025 From: fandreuzzi at openjdk.org (Francesco Andreuzzi) Date: Wed, 5 Nov 2025 10:33:04 GMT Subject: RFR: 8371321: Remove unused last arg of BarrierSetAssembler::arraycopy_epilogue In-Reply-To: References: Message-ID: <_jy1EXbNvAcvuv1R0jwOUXuE7dQX1YSLjsnM5ijyBNM=.20d22624-f6ff-4de7-8701-7ff1536ffa5b@github.com> On Wed, 5 Nov 2025 10:10:02 GMT, Albert Mingkun Yang wrote: > Removing effectively dead code. > > Test: tier1, GHA Marked as reviewed by fandreuzzi (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/28146#pullrequestreview-3421109827 From roland at openjdk.org Wed Nov 5 13:23:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 5 Nov 2025 13:23:18 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v8] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 05:08:57 GMT, Zihao Lin wrote: >> This patch remove slice parameter from LoadNode::make >> >> I have done more work which remove slice paramater from StoreNode::make. >> >> Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 >> >> Hi team, I am new, I'd appreciate any guidance. Thank a lot! > > Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - fix assert > - add more assert > - rid of access.addr().type() > - Merge branch 'openjdk:master' into 8344116 > - Merge branch 'openjdk:master' into 8344116 > - Merge branch 'openjdk:master' into 8344116 > - Fix build > - Fix test failed > - 8344116: C2: remove slice parameter from LoadNode::make Can we remove `C2AccessValuePtr` entirely and use: Node* _addr; where, currently, there's: C2AccessValuePtr& _addr; ? src/hotspot/share/opto/callnode.cpp line 1740: > 1738: Node* klass_node = in(AllocateNode::KlassNode); > 1739: Node* proto_adr = phase->transform(new AddPNode(klass_node, klass_node, phase->MakeConX(in_bytes(Klass::prototype_header_offset())))); > 1740: mark_node = LoadNode::make(*phase, control, mem, proto_adr, TypeX_X, TypeX_X->basic_type(), MemNode::unordered); We could assert that C->get_alias_index(kit->type(card_adr) == Compile::AliasIdxRaw ------------- PR Review: https://git.openjdk.org/jdk/pull/24258#pullrequestreview-3421940817 PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2494424924 From kdnilsen at openjdk.org Wed Nov 5 18:20:45 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 5 Nov 2025 18:20:45 GMT Subject: RFR: 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old [v3] In-Reply-To: <-MlfBVpHD57gSd_4_0iIDHI0Cv6HIpTj9H9mX-UHS7g=.54423f43-32dd-49c4-8e7e-705dbbc8c825@github.com> References: <-MlfBVpHD57gSd_4_0iIDHI0Cv6HIpTj9H9mX-UHS7g=.54423f43-32dd-49c4-8e7e-705dbbc8c825@github.com> Message-ID: On Mon, 3 Nov 2025 18:01:31 GMT, William Kemper wrote: >> When GenShen is only marking the old generation, we do not need the SATB mechanism to preserve young pointers. We currently filter these out of the SATB buffers during the final-update-refs and init-mark safepoints. This increases latency and introduces no small amount of complexity. It should be possible to instead filter out these pointers when the SATB buffers are 'compacted' before being 'completed'. >> >> # Background >> When GenShen is marking the old generation it leaves the SATB barrier enabled. When a young collection interrupts old marking, it creates a situation where a mutator thread could overwrite a field holding a pointer into a collection set region. The SATB barrier will dutifully place this object in the SATB queue. If this pointer makes it into a mark queue, the marking thread will crash. Prior to this change, GenShen filtered out such pointers _after_ the thread local SATB buffers were completed. After this change, such pointers are filtered out _before_ the buffers are completed. This is more inline with the natural way of things. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots > - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots > - Flush SATB buffers upon entering degenerated cycle when old marking is in progress > > This has to happen at least once during the degenerated cycle. Doing it at the start, rather than the end, simplifies the verifier. > - Fix typo in comment > - Remove duplicate satb flush closure > - Only flush satb once during degenerated cycle > - Cleanup and comments > - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots > - Fix assertion > - Oops, move inline definition out of ifdef ASSERT > - ... and 11 more: https://git.openjdk.org/jdk/compare/1922c4fd...4bd602de Thank you for bringing this to closure... ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/27983#pullrequestreview-3423680188 From kdnilsen at openjdk.org Wed Nov 5 22:31:22 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 5 Nov 2025 22:31:22 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v5] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 21:59:35 GMT, Xiaolong Peng wrote: >> Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of mutator memory allocation to improve heap lock contention, at vey high level, here is how it works: >> * ShenandoahFreeSet holds a N (default to 13) number of ShenandoahHeapRegion* which are used by mutator threads for regular object allocations, they are called shared regions/directly allocatable regions, which are stored in PaddedEnd data structure(padded array). >> * Each mutator thread will be assigned one of the directly allocatable regions, the thread will try to allocate in the directly allocatable region with CAS atomic operation, if fails will try 2 more consecutive directly allocatable regions in the array storing directly allocatable region. >> * If mutator thread fails after trying 3 directly allocatable regions, it will: >> * Take heap lock >> * Try to retire the directly allocatable regions which are ready to retire. >> * Iterator mutator partition and allocate directly allocatable regions and store to the padded array if any need to be retired. >> * Satisfy mutator allocation request if possible. >> >> >> I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, I have done many tests, here are some of them: >> >> 1. Dacapo lusearch test on EC2 host with 96 CPU cores: >> Openjdk TIP: >> >> [ec2-user at ip-172-31-42-91 jdk]$ ./master-jdk/bin/java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms4G -Xmx4G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" >> ===== DaCapo tail latency, metered full smoothing: 50% 131684 usec, 90% 200192 usec, 99% 211369 usec, 99.9% 212517 usec, 99.99% 213043 usec, max 235289 usec, measured over 524288 events ===== >> ===== DaCapo tail latency, metered full smoothing: 50% 1568 usec, 90% 36101 usec, 99% 42172 usec, 99.9% 42928 usec, 99.99% 43100 usec, max 43305 usec, measured over 524288 events ===== >> ===== DaCapo tail latency, metered full smoothing: 50% 52644 usec, 90% 124393 usec, 99% 137711 usec, 99.9% 139355 usec, 99.99% 139749 usec, max 146722 ... > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 135 commits: > > - Merge branch 'openjdk:master' into cas-alloc-1 > - Merge branch 'openjdk:master' into cas-alloc-1 > - format > - Merge branch 'openjdk:master' into cas-alloc-1 > - Merge branch 'openjdk:master' into cas-alloc-1 > - Merge branch 'master' into cas-alloc-1 > - Move ShenandoahHeapRegionIterationClosure to shenandoahFreeSet.hpp > - Merge branch 'openjdk:master' into cas-alloc-1 > - Fix errors caused by renaming ofAtomic to AtomicAccess > - Merge branch 'openjdk:master' into cas-alloc-1 > - ... and 125 more: https://git.openjdk.org/jdk/compare/2f613911...e6bfef05 src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 762: > 760: #endif > 761: > 762: PaddedEnd* ShenandoahDirectlyAllocatableRegionAffinity::_affinity = nullptr; IIUC, each of the DirectlyAllocatableRegions has an affinity to a particular thread. If some other thread randomly selects to allocate from this region, we change the region's affinity. Now, if the original thread tries another allocation, it will be redirected to a different randomly selected region. It feels to me like this introduces more churn than necessary. It should be ok for multiple threads to have affinity to the same directly allocatable region so they can preserve "locality". The locality to be preserved is the value of region->top() and region->end(), in local caches. Sometimes, the multiple threads that are affiliated with a particular region will be running on the same core, which helps improve cache performance. If they are on different cores, that's now worse than the current implementation. So my thinking is that each thread should maintain an affinity to a particular index within the directly allocatable region array. It should reaffiliate with a different region only if that region entry becomes nullptr, or in case its attempt to CAS allocate fails more than twice in a row (because of heavy contention on the value of top for that region). Please clarify if I've misunderstood. Please explain rationale if there is good reason for preferring the design as it is currently implemented. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 848: > 846: HeapWord* ShenandoahFreeSet::allocate_with_affiliation(Iter& iterator, ShenandoahAffiliation affiliation, ShenandoahAllocRequest& req, bool& in_new_region) { > 847: for (idx_t idx = iterator.current(); iterator.has_next(); idx = iterator.next()) { > 848: ShenandoahHeapRegion* r = _heap->get_region(idx); I wonder if we could refine this a little bit. When the region is moved into the "directly allocatable" set, wouldn't we remove it from its partition? Then, we wouldn't have to test for !r->reserved_for_direct_allocation() here because the iterator wouldn't produce it. We could maybe replace this test with an assert that !r->reserved_for_direct_allocation(). src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1268: > 1266: // If region is not completely free, the current [beg; end] is useless, and we may fast-forward. If we can extend > 1267: // the existing range, we can exploit that certain regions are already known to be in the Mutator free set. > 1268: ShenandoahHeapRegion* region = _heap->get_region(end); Here also, if we remove the region from the partition when we make it directly allocatable, we would not need to rewrite this loop. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2070: > 2068: // retired, the sum of used and capacities within regions that are still in the Mutator free partition may not match > 2069: // my internally tracked values of used() and free(). > 2070: //TODO remove assert, it is not possible to mach since mutators may allocate on region w/o acquiring lock It seems that if we are properly adjusting used() when we make a region directly allocatable, then this assert would still be valid. Why not? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2197: > 2195: // Intentionally not using AtomicAccess::load, if a mutator see a stale region it will fail to allocate anyway. > 2196: if ((r = shared_region._address) != nullptr && r->reserved_for_direct_allocation()) { > 2197: obj = cas_allocate_in_for_mutator(r, req, in_new_region); There are multiple things that can go wrong here, and it's not clear how we distinguish between them: 1. Region r may be retirable and not have enough memory to satisfy req. 2. Region r may not be retirable and not have enough memory to satisfy req. 3. Region r may have sufficient memory to satisfy r but we experience heavy contention (multiple CAS failures) while we attempt to allocate r. In my mental model, I think I might be inclined to behave as follows: 1. Inside case_allocate_in_for_mutator(), if we satisfy our allocation request, but only after "too many" (more than 2) CAS retries, do not update my thread-local _index variable. Leave it at its previously selected value. Otherwise, on successful allocation, update _index to represent current slot. (might have to pass current slot in as an argument) 2. If cas_allocate_in_for_mutator() returns nullptr, ask the question "is this region retirable?" If so, increment a local count of retirable_regions. If the incremented count of retirable_regions exceeds some threshold value (DirectlyAllocatableRegionCount / 4?) (and there may be more than this number of retirable regions, but we've seen at least this many), grab the heap lock to retire and replenish all retirable regions. Then restart this loop with i = 0; src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2204: > 2202: i++; > 2203: } > 2204: return obj; I think obj always equals nullptr at this point. Seems the code would be easier to understand (and would depend less on effective compiler optimization) if we just made that explicit. Can we just say: return nullptr? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2206: > 2204: return obj; > 2205: } > 2206: template space above this line? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2219: > 2217: const uint max_probes = ShenandoahDirectAllocationMaxProbes; > 2218: for (;;) { > 2219: HeapWord* obj = nullptr; This is written as an infinite loop, but I'm not sure it intends to iterate? It looks like there's no way for the loop body to get to a second iteration. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2235: > 2233: start_idx = next_start_index; > 2234: } else { > 2235: // try to steal from other directly allocatable regions Why not just let the enclosing infinite loop (for(;;)) iterate, which will call cas_allocate_single_for_mutator()? You could overwrite start_idx with a new value if you want. I'm not sure I agree with choosing start_idx + max_probes. It seems a more likely new value would be next_start_index. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2256: > 2254: } > 2255: > 2256: // Explicit specializations I'm sorry. I don't understand what is happening here with "explicit specializations". Maybe a comment to explain this less common C++ syntax would help here. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2280: > 2278: } > 2279: > 2280: class DirectAllocatableRegionRefillClosure final : public ShenandoahHeapRegionIterationClosure { I don't think we want to subclass ShenandoahHeapRegionIterationClosure here. That iterates over all 2000 regions. We only want to iterate over the 13 Directly allocatable regions. Maybe we don't even need/want a closure iterator here. We could just write a loop. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2347: > 2345: } > 2346: > 2347: bool heap_region_do(ShenandoahHeapRegion *r) { Need a comment to explain what heap_region_do() is doing. I think it is doing: 1. Returns true if there is no need to iterate over additional regions, otherwise returns false 2. If we've already found a region to hold the requested allocation and there are no more regions to retire, we return true. 3. If the region is trash or is free and we're not doing concurrent_weak_roots, we try to recycle it, setting its affiliation to young. (I'm not understanding how this works. What if the region had been placed in the Collector or OldCollector partitions?) 4. If the requested object has not yet been allocated and this region has sufficient memory to represent the object, allocate it here. (We do all this while holding the heap lock, so we can check available memory at entry to the function, and use the available memory further below within the function.) 5. If this region has more than min_tlab_size memory (after allocating _obj) and there's another region to be retired, we use replace the next retirable region with this region. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2383: > 2381: ShenandoahDirectAllocationRegion& shared_region = _direct_allocation_regions[_next_retire_eligible_region]; > 2382: assert(AtomicAccess::load(&shared_region._address) == nullptr, "Must have been released."); > 2383: r->reserve_for_direct_allocation(); Here also, I wonder if we can simplify the protocol for assigning this region into the directly allocatable array. Maybe the key idea could be to introduce a new volatile variable into ShenandoahHeapRegion known as top_for_cas. cas allocations allocate by incrementing top_for_cas. Before removing a region from the directly allocatable set, we use cas to increase top_for_cas to end(). Before placing a region into the directly allocatable set (and while holding the heap lock), we copy top() to top_for_cas. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2418: > 2416: iterate_regions_for_alloc(&cl, false); > 2417: if (cl._next_region_with_sufficient_mem != ShenandoahDirectlyAllocatableRegionCount && obj == nullptr) { > 2418: new_start_index = cl._next_region_with_sufficient_mem; I think an accessor method is preferred over direct access to a "private" variable. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 263: > 261: _capacity[int(which_partition)] = value; > 262: _available[int(which_partition)] = value - _used[int(which_partition)]; > 263: AtomicAccess::store(_capacity + int(which_partition), value); Shouldn't require AtomicAccess here, because we hold heap lock. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 271: > 269: _used[int(which_partition)] = value; > 270: _available[int(which_partition)] = _capacity[int(which_partition)] - value; > 271: AtomicAccess::store(_used + int(which_partition), value); Also here, should not require AtomicAccess. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 464: > 462: HeapWord* cas_allocate_in_for_mutator(ShenandoahHeapRegion* region, ShenandoahAllocRequest &req, bool &in_new_region); > 463: > 464: bool try_allocate_directly_allocatable_regions(uint start_index, Can we have a block comment description of what this function does, including its impact on var parameters. src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 363: > 361: } > 362: > 363: void ShenandoahHeapRegion::reset_alloc_metadata() { Do we need to make these atomic because we now increment asynchronously from within mutator CAS allocations? Before, they were only adjusted while holding heap lock? I'm wondering if add-with-fetch() or CAS() would be more/less efficient than AtomicAccess::stores. Can we test the tradeoffs? src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 131: > 129: } > 130: > 131: HeapWord* ShenandoahHeapRegion::allocate_atomic(size_t size, const ShenandoahAllocRequest& req) { As mentioned above, this is maybe where we would want to set the thread-local _index variable, and this is also where we might want to count how many times we retry the try_allocate() request before we succeed. The point is that if we have multiple try_allocate() CAS failures, that means this region is heavily contended, and we don't want to make this our new "starting index". src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 559: > 557: range(1, 128) \ > 558: \ > 559: product(uintx, ShenandoahDirectAllocationMaxProbes, 3, EXPERIMENTAL, \ I think we found that setting DirectAllocationMaxProbes to equal ShenandoahDirectlyAlloctableRegionCount works "best". I'm inclined to remove this parameter entirely as it somewhat simplifies the implementation. If you think we want to keep it, can you explain the rationale? Would we change the default value? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2495713187 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2495759091 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2495768272 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2495789372 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2495974976 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2495837037 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2495793783 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496020674 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496020913 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496084186 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496124916 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496344767 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496359448 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496109695 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496045832 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496047006 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496050940 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496384148 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496395761 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496399061 From kdnilsen at openjdk.org Wed Nov 5 22:31:23 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 5 Nov 2025 22:31:23 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v5] In-Reply-To: References: Message-ID: <5YSA3F88CmDDv09M2KOm_EFNDh_09LPO2WMrgETfupI=.cc658dc9-829e-41a5-ad76-393d3eb0f75a@github.com> On Wed, 5 Nov 2025 19:00:03 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 135 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - format >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Move ShenandoahHeapRegionIterationClosure to shenandoahFreeSet.hpp >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix errors caused by renaming ofAtomic to AtomicAccess >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 125 more: https://git.openjdk.org/jdk/compare/2f613911...e6bfef05 > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 848: > >> 846: HeapWord* ShenandoahFreeSet::allocate_with_affiliation(Iter& iterator, ShenandoahAffiliation affiliation, ShenandoahAllocRequest& req, bool& in_new_region) { >> 847: for (idx_t idx = iterator.current(); iterator.has_next(); idx = iterator.next()) { >> 848: ShenandoahHeapRegion* r = _heap->get_region(idx); > > I wonder if we could refine this a little bit. When the region is moved into the "directly allocatable" set, wouldn't we remove it from its partition? Then, we wouldn't have to test for !r->reserved_for_direct_allocation() here because the iterator wouldn't produce it. > > We could maybe replace this test with an assert that !r->reserved_for_direct_allocation(). Same issue in other uses of the allocation iterator. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2280: > >> 2278: } >> 2279: >> 2280: class DirectAllocatableRegionRefillClosure final : public ShenandoahHeapRegionIterationClosure { > > I don't think we want to subclass ShenandoahHeapRegionIterationClosure here. That iterates over all 2000 regions. We only want to iterate over the 13 Directly allocatable regions. Maybe we don't even need/want a closure iterator here. We could just write a loop. I think we should be borrowing from this code when replenishing the regions that are ready to be retired: if (_partitions.alloc_from_left_bias(ShenandoahFreeSetPartitionId::Mutator)) { // Allocate from low to high memory. This keeps the range of fully empty regions more tightly packed. // Note that the most recently allocated regions tend not to be evacuated in a given GC cycle. So this // tends to accumulate "fragmented" uncollected regions in high memory. ShenandoahLeftRightIterator iterator(&_partitions, ShenandoahFreeSetPartitionId::Mutator); return allocate_from_regions(iterator, req, in_new_region); } // Allocate from high to low memory. This preserves low memory for humongous allocations. ShenandoahRightLeftIterator iterator(&_partitions, ShenandoahFreeSetPartitionId::Mutator); return allocate_from_regions(iterator, req, in_new_region); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2495761580 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2496372013 From duke at openjdk.org Thu Nov 6 00:29:26 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 6 Nov 2025 00:29:26 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space Message-ID: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m, so GenShen isn't worse. The reported GenShen failure observation probably came from the Random. public class TestLargeObjectAlignmentDeterministic { static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); static final int NODE_COUNT = Integer.getInteger("nodes", 10000); static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); static Object[] objects; public static void main(String[] args) throws Exception { objects = new Object[SLABS_COUNT]; for (int i = 0; i < SLABS_COUNT; i++) { objects[i] = createSome(); } } public static Object createSome() { List result = new ArrayList(); for (int c = 0; c < NODE_COUNT; c++) { result.add(new Integer(c)); } return result; } } ------------- Commit messages: - 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space Changes: https://git.openjdk.org/jdk/pull/28167/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28167&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8361339 Stats: 9 lines in 1 file changed: 0 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/28167.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28167/head:pull/28167 PR: https://git.openjdk.org/jdk/pull/28167 From ysr at openjdk.org Thu Nov 6 01:25:11 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 6 Nov 2025 01:25:11 GMT Subject: RFR: 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old [v3] In-Reply-To: <-MlfBVpHD57gSd_4_0iIDHI0Cv6HIpTj9H9mX-UHS7g=.54423f43-32dd-49c4-8e7e-705dbbc8c825@github.com> References: <-MlfBVpHD57gSd_4_0iIDHI0Cv6HIpTj9H9mX-UHS7g=.54423f43-32dd-49c4-8e7e-705dbbc8c825@github.com> Message-ID: On Mon, 3 Nov 2025 18:01:31 GMT, William Kemper wrote: >> When GenShen is only marking the old generation, we do not need the SATB mechanism to preserve young pointers. We currently filter these out of the SATB buffers during the final-update-refs and init-mark safepoints. This increases latency and introduces no small amount of complexity. It should be possible to instead filter out these pointers when the SATB buffers are 'compacted' before being 'completed'. >> >> # Background >> When GenShen is marking the old generation it leaves the SATB barrier enabled. When a young collection interrupts old marking, it creates a situation where a mutator thread could overwrite a field holding a pointer into a collection set region. The SATB barrier will dutifully place this object in the SATB queue. If this pointer makes it into a mark queue, the marking thread will crash. Prior to this change, GenShen filtered out such pointers _after_ the thread local SATB buffers were completed. After this change, such pointers are filtered out _before_ the buffers are completed. This is more inline with the natural way of things. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots > - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots > - Flush SATB buffers upon entering degenerated cycle when old marking is in progress > > This has to happen at least once during the degenerated cycle. Doing it at the start, rather than the end, simplifies the verifier. > - Fix typo in comment > - Remove duplicate satb flush closure > - Only flush satb once during degenerated cycle > - Cleanup and comments > - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots > - Fix assertion > - Oops, move inline definition out of ifdef ASSERT > - ... and 11 more: https://git.openjdk.org/jdk/compare/1922c4fd...4bd602de Changes look ok. I found some of the comments confusing -- I have left some remarks at those places, please have a look to see if they can be made clearer. The improvement in performance looks good. Do we track the number of SATB pointers processed by the old marking (to compare between before and after your changes here)? src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1143: > 1141: // be in the collection set. If this happens, the pointer will be preserved, essentially > 1142: // becoming part of the old snapshot. > 1143: // 2. The region is allocated during evacuation of old. This is also not a concern because One related question. In both these cases, I assume the reference will look "marked" because it's above TAMS for the purposes of the old marking? src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.hpp line 234: > 232: // marking phase. In some cases, this can cause a write to a perfectly > 233: // reachable oop to enqueue a pointer that later becomes garbage (because > 234: // it points at an object that is later chosen for the collection set). There are > ``` > // ... In some cases, this can cause a write to a perfectly > // reachable oop to enqueue a pointer that later becomes garbage (because > // it points at an object that is later chosen for the collection set). > ``` I don't understand this statement. The SATB is supposed to be pointers to objects that we will preserve because they were reachable when the snapshot (marking) was started. Can you elaborate what you mean here? Did you mean that the filtering of the SATB didn't filter a (sometime) young reference which was then processed by the old marking? src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.hpp line 237: > 235: // also cases where the referent of a weak reference ends up in the SATB > 236: // and is later collected. In these cases the oop in the SATB buffer becomes > 237: // invalid and the _next_ cycle will crash during its marking phase. To Again I don't understand the concept of an SATB pointer to an object that was later collected? Are we talking about young objects that are subsequently processed by old marking because they weren't filtered out when they should be? I think that is probably the case here, but it would be good to clean up these comments to avoid this confusion. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27983#pullrequestreview-3425195561 PR Review Comment: https://git.openjdk.org/jdk/pull/27983#discussion_r2496682448 PR Review Comment: https://git.openjdk.org/jdk/pull/27983#discussion_r2496698952 PR Review Comment: https://git.openjdk.org/jdk/pull/27983#discussion_r2496702500 From duke at openjdk.org Thu Nov 6 01:45:26 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 6 Nov 2025 01:45:26 GMT Subject: RFR: 8261743: Shenandoah: enable String deduplication with compact heuristics Message-ID: Enable `UseStringDeduplication` when using compact heuristics. Testing: ./build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact -XX:+PrintFlagsFinal --version | grep UseStringDeduplication bool UseStringDeduplication = true {product} {default} Note: The label should be `{product} {ergonomic}` in theory. Pending on a separate issue: [JDK-8371381](https://bugs.openjdk.org/browse/JDK-8371381) ------------- Commit messages: - 8261743: Shenandoah: enable String deduplication with compact heuristics Changes: https://git.openjdk.org/jdk/pull/28170/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28170&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8261743 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28170.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28170/head:pull/28170 PR: https://git.openjdk.org/jdk/pull/28170 From shade at openjdk.org Thu Nov 6 09:39:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Nov 2025 09:39:10 GMT Subject: RFR: 8261743: Shenandoah: enable String deduplication with compact heuristics In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 01:38:36 GMT, Rui Li wrote: > Enable `UseStringDeduplication` when using compact heuristics. > > Testing: > > ./build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact -XX:+PrintFlagsFinal --version | grep UseStringDeduplication > bool UseStringDeduplication = true {product} {default} > > > Note: The labels should be `{product} {ergonomic}` ideally. Pending on a separate issue: [JDK-8371381](https://bugs.openjdk.org/browse/JDK-8371381) Looks good. Have you measured any impact on pauses / GC durations in testing? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28170#pullrequestreview-3427178962 From shade at openjdk.org Thu Nov 6 09:47:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Nov 2025 09:47:04 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space In-Reply-To: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: On Thu, 6 Nov 2025 00:21:55 GMT, Rui Li wrote: > Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. > > Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. > > Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. > > > > public class TestLargeObjectAlignmentDeterministic { > > static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); > static final int NODE_COUNT = Integer.getInteger("nodes", 10000); > static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); > > static Object[] objects; > > public static void main(String[] args) throws Exception { > objects = new Object[SLABS_COUNT]; > > for (int i = 0; i < SLABS_COUNT; i++) { > objects[i] = createSome(); > } > } > > public static Object createSome() { > List result = new ArrayList(); > for (int c = 0; c < NODE_COUNT; c++) { > result.add(new Integer(c)); > } > return result; > } > > } Right. One node is basically an `Integer` and the reference array slot. So about 20 bytes, give or take. There are 10K nodes per slab, meaning there is 200KB per slab. There are 10K slabs, meaning they take 2GB memory. So bumping the limit to 3G makes sense. I think you want to override `Xmx`, though -- that is the decisive factor for sizing heap regions. Sticking with `Xmx` means the test runs in the same conditions everywhere, helping reproducibility. Unless there is a strong reason to explore different heap regions sizes, which I think there is none: the test was about testing `ObjectAlignmentInBytes` first and foremost. ------------- PR Review: https://git.openjdk.org/jdk/pull/28167#pullrequestreview-3427215034 From syan at openjdk.org Thu Nov 6 11:20:02 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 6 Nov 2025 11:20:02 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space In-Reply-To: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: On Thu, 6 Nov 2025 00:21:55 GMT, Rui Li wrote: > Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. > > Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. > > Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. > > > > public class TestLargeObjectAlignmentDeterministic { > > static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); > static final int NODE_COUNT = Integer.getInteger("nodes", 10000); > static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); > > static Object[] objects; > > public static void main(String[] args) throws Exception { > objects = new Object[SLABS_COUNT]; > > for (int i = 0; i < SLABS_COUNT; i++) { > objects[i] = createSome(); > } > } > > public static Object createSome() { > List result = new ArrayList(); > for (int c = 0; c < NODE_COUNT; c++) { > result.add(new Integer(c)); > } > return result; > } > > } test/hotspot/jtreg/gc/shenandoah/TestLargeObjectAlignment.java line 34: > 32: * @library /test/lib > 33: * > 34: * @run main/othervm -Xms3g -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ObjectAlignmentInBytes=16 -Xint TestLargeObjectAlignment Since the initial heap memory set to 3G, maybe we should add '@requires os.maxMemory > 4g' to skip this test when the physical memory of test machine is less than 4g. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28167#discussion_r2498564622 From duke at openjdk.org Thu Nov 6 13:43:33 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 6 Nov 2025 13:43:33 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v9] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request incrementally with one additional commit since the last revision: remove C2AccessValuePtr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/6d122039..e89910c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=07-08 Stats: 58 lines in 8 files changed: 0 ins; 21 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From duke at openjdk.org Thu Nov 6 13:58:53 2025 From: duke at openjdk.org (Zihao Lin) Date: Thu, 6 Nov 2025 13:58:53 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v10] In-Reply-To: References: Message-ID: <1zyQq98OPsZ-2nzYz21X_5v2RgKhWaZrZaJQevDMzo4=.138599b1-4797-42b0-a48a-829a112dfbe7@github.com> > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - fix conflict - Merge master - remove C2AccessValuePtr - fix assert - add more assert - rid of access.addr().type() - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Fix build - ... and 2 more: https://git.openjdk.org/jdk/compare/c173d416...36e024db ------------- Changes: https://git.openjdk.org/jdk/pull/24258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=09 Stats: 230 lines in 18 files changed: 33 ins; 55 del; 142 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From kdnilsen at openjdk.org Thu Nov 6 14:22:16 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 6 Nov 2025 14:22:16 GMT Subject: RFR: 8358735: GenShen: bug in #undef'd code in block_start() [v6] In-Reply-To: References: Message-ID: <6QqmpWAvag906JW5UZj7T1tnWsDKuNNgVArEYv2pQ5g=.69b21a94-afc3-4812-ba9b-7217df4b6704@github.com> On Mon, 29 Sep 2025 16:40:51 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "Fixup handling of weakly marked objects in remembered set" >> >> This reverts commit 80198abe5d06c3532d9a43a53691376e990ed45f. > > src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 132: > >> 130: >> 131: // Search for last one in the range [l_index, r_index). Return r_index if not found. >> 132: inline idx_t get_prev_one_offset (idx_t l_index, idx_t r_index) const; > > Nit: Some idiosyncratic formatting here (space before opening parenthesis). Thanks for catching this. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2499139532 From wkemper at openjdk.org Thu Nov 6 14:31:34 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Nov 2025 14:31:34 GMT Subject: RFR: Merge openjdk/jdk21u:master Message-ID: <3mTSioNe6lIuX9XiTKFwtTGHZYutyfFp2Xch97_1Ymg=.6a364e89-1c4f-4496-b995-dd1a38a73d42@github.com> Merges tag jdk-21.0.10+1 ------------- Commit messages: - 8369947: Bytecode rewriting causes Java heap corruption on RISC-V - 8353832: Opensource FontClass, Selection and Icon tests - 8346753: Test javax/swing/JMenuItem/RightLeftOrientation/RightLeftOrientation.java fails on Windows Server 2025 x64 because the icons of RBMenuItem and CBMenuItem are not visible in Nimbus LookAndFeel - 8369506: Bytecode rewriting causes Java heap corruption on AArch64 - 8325766: Extend CertificateBuilder to create trust and end entity certificates programmatically - 8358813: JPasswordField identifies spaces in password via delete shortcuts - 8355077: Compiler error at splashscreen_gif.c due to unterminated string initialization - 8355241: Move NativeDialogToFrontBackTest.java PL test to manual category - 8334509: Cancelling PageDialog does not return the same PageFormat object - 8328377: Convert java/awt/Cursor/MultiResolutionCursorTest test to main - ... and 263 more: https://git.openjdk.org/shenandoah-jdk21u/compare/7f146482...485ced0d The webrev contains the conflicts with master: - merge conflicts: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=226&range=00.conflicts Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/226/files Stats: 52671 lines in 2989 files changed: 33410 ins; 10089 del; 9172 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/226.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/226/head:pull/226 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/226 From wkemper at openjdk.org Thu Nov 6 17:04:58 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Nov 2025 17:04:58 GMT Subject: RFR: 8261743: Shenandoah: enable String deduplication with compact heuristics In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 01:38:36 GMT, Rui Li wrote: > Enable `UseStringDeduplication` when using compact heuristics. > > Testing: > > ./build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact -XX:+PrintFlagsFinal --version | grep UseStringDeduplication > bool UseStringDeduplication = true {product} {default} > > > Note: The labels should be `{product} {ergonomic}` ideally. Pending on a separate issue: [JDK-8371381](https://bugs.openjdk.org/browse/JDK-8371381) Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28170#pullrequestreview-3429408699 From xpeng at openjdk.org Thu Nov 6 17:27:34 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Nov 2025 17:27:34 GMT Subject: RFR: 8370850: Shenandoah: Simplify collector allocation to save unnecessary region iteration [v3] In-Reply-To: References: <7KjxoZ7UQ9LugnCVgBT6moBrQEfxmPNBXrvyWXD-MQ8=.37907346-c252-4daf-b7bb-9d0f89291753@github.com> Message-ID: On Tue, 4 Nov 2025 23:08:11 GMT, William Kemper wrote: >> I have removed can_allocate_in_new_region again after merging @kdnilsen's change unifying accountings, since we don't have consistency in accountings anymore. > > We still need to enforce that requests for a young evacuation don't take memory that was reserved for old evacuations. `can_allocate_in_new_region` isn't about consistency between freeset and generation accounting, it's used to maintain old/young collector reserves. I am confused again. My understanding is: we won't, that is is done by the partition, for young evacuation we will only look for regions in the Collector partition, if a region is FREE and reserved for old evacuations, the region will be in OldCollector partition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28036#discussion_r2500016821 From wkemper at openjdk.org Thu Nov 6 17:43:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Nov 2025 17:43:52 GMT Subject: RFR: 8370850: Shenandoah: Simplify collector allocation to save unnecessary region iteration [v3] In-Reply-To: References: <7KjxoZ7UQ9LugnCVgBT6moBrQEfxmPNBXrvyWXD-MQ8=.37907346-c252-4daf-b7bb-9d0f89291753@github.com> Message-ID: On Thu, 6 Nov 2025 17:24:21 GMT, Xiaolong Peng wrote: >> We still need to enforce that requests for a young evacuation don't take memory that was reserved for old evacuations. `can_allocate_in_new_region` isn't about consistency between freeset and generation accounting, it's used to maintain old/young collector reserves. > > I am confused again. My understanding is: we won't, that is is done by the partition, for young evacuation we will only look for regions in the Collector partition, if a region is FREE and reserved for old evacuations, the region will be in OldCollector partition. My apologies, you're right. The `FREE` region we remember when looking for the region with the same affiliation as the request does come from the same partition as the second loop would have searched. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28036#discussion_r2500097577 From wkemper at openjdk.org Thu Nov 6 17:47:42 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Nov 2025 17:47:42 GMT Subject: RFR: 8370850: Shenandoah: Simplify collector allocation to save unnecessary region iteration [v3] In-Reply-To: References: Message-ID: <2wIxZ0sslT7_GkFAr5yccBB7zSzfUOZAeaOsf9Ljnqg=.851fb239-04f8-4a85-adf0-42879d646722@github.com> On Fri, 31 Oct 2025 22:09:18 GMT, Xiaolong Peng wrote: >> To allocate an object in Collector/OldCollector partition, current implementation may traverse the regions in the partition twice: >> 1. fast path: traverse regions between left most and right most in the partition, and try to allocate in an affiliated region in the partition; >> 2. if fails in fast path, traverse regions between left most empty and right most empty in the partition, and try try to allocate in a FREE region. >> >> 2 can be saved if we also remember the first FREE region seem in 1. >> >> The PR makes the code much cleaner, and more efficient(although the performance impact may not be measurable, I have run some dacapo benchmarks and didn't see meaningful difference) >> >> >> Test: >> - [x] hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Remove can_allocate_in_new_region > - Merge remote-tracking branch 'origin/master' into collector-allocation > - Remove condition check for trash region > - Address the PR review comments > - Merge branch 'openjdk:master' into collector-allocation > - Touch up > - Remove test 'req.is_old()' when steal an empty region from the mutator view > - Update comment > - Fix wrong condition when steal an empty region from the mutator view > - Fix potential failure in young evac > - ... and 3 more: https://git.openjdk.org/jdk/compare/ec059c0e...8e01d691 Looks good, sorry for the mix up. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28036#pullrequestreview-3429628956 From wkemper at openjdk.org Thu Nov 6 17:53:12 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Nov 2025 17:53:12 GMT Subject: RFR: 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old [v3] In-Reply-To: References: <-MlfBVpHD57gSd_4_0iIDHI0Cv6HIpTj9H9mX-UHS7g=.54423f43-32dd-49c4-8e7e-705dbbc8c825@github.com> Message-ID: <5J3gg2vmNAxe3JuZb88sM_CYm6DPwk9z8-o3ml3_l28=.38cf585c-c2ab-44da-9321-b448da98b4a7@github.com> On Thu, 6 Nov 2025 01:04:04 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots >> - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots >> - Flush SATB buffers upon entering degenerated cycle when old marking is in progress >> >> This has to happen at least once during the degenerated cycle. Doing it at the start, rather than the end, simplifies the verifier. >> - Fix typo in comment >> - Remove duplicate satb flush closure >> - Only flush satb once during degenerated cycle >> - Cleanup and comments >> - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots >> - Fix assertion >> - Oops, move inline definition out of ifdef ASSERT >> - ... and 11 more: https://git.openjdk.org/jdk/compare/1922c4fd...4bd602de > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1143: > >> 1141: // be in the collection set. If this happens, the pointer will be preserved, essentially >> 1142: // becoming part of the old snapshot. >> 1143: // 2. The region is allocated during evacuation of old. This is also not a concern because > > One related question. In both these cases, I assume the reference will look "marked" because it's above TAMS for the purposes of the old marking? In the first case (in-place promotion) the pointer wouldn't necessarily be above TAMS. In this case, we would leave the object in the 'complete' buffer and it would become marked by the old marking threads (it becomes part of the old snapshot). The second case is not an issue because none of the cset regions will become trash until _after_ `final-update-refs` (so no regions could become old until after we filter the SATB buffers). For regions that are already old, then the TAMS and mark bit map work as usual. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27983#discussion_r2500141016 From wkemper at openjdk.org Thu Nov 6 18:06:03 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Nov 2025 18:06:03 GMT Subject: RFR: 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old [v3] In-Reply-To: References: <-MlfBVpHD57gSd_4_0iIDHI0Cv6HIpTj9H9mX-UHS7g=.54423f43-32dd-49c4-8e7e-705dbbc8c825@github.com> Message-ID: On Thu, 6 Nov 2025 01:18:52 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots >> - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots >> - Flush SATB buffers upon entering degenerated cycle when old marking is in progress >> >> This has to happen at least once during the degenerated cycle. Doing it at the start, rather than the end, simplifies the verifier. >> - Fix typo in comment >> - Remove duplicate satb flush closure >> - Only flush satb once during degenerated cycle >> - Cleanup and comments >> - Merge remote-tracking branch 'jdk/master' into piggyback-satb-flush-on-update-roots >> - Fix assertion >> - Oops, move inline definition out of ifdef ASSERT >> - ... and 11 more: https://git.openjdk.org/jdk/compare/1922c4fd...4bd602de > > src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.hpp line 237: > >> 235: // also cases where the referent of a weak reference ends up in the SATB >> 236: // and is later collected. In these cases the oop in the SATB buffer becomes >> 237: // invalid and the _next_ cycle will crash during its marking phase. To > > Again I don't understand the concept of an SATB pointer to an object that was later collected? Are we talking about young objects that are subsequently processed by old marking because they weren't filtered out when they should be? > > I think that is probably the case here, but it would be good to clean up these comments to avoid this confusion. Suppose we have a young collection running while the SATB barrier is active for old marking. The barrier will be enabled for the entirety of the young collection. Now, suppose we have a situation like this: +--Young, CSet------+ +--Young, Regular----+ | | | | | | | | | A <--------------------+ B | | | | | | | | | | | | | | | | | | | | | +-------------------+ +--------------------+ If a mutator overwrites the pointer in `B`, the SATB barrier will enqueue object `A`. These are the objects we need to filter out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27983#discussion_r2500196026 From wkemper at openjdk.org Thu Nov 6 18:36:48 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Nov 2025 18:36:48 GMT Subject: RFR: 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old [v4] In-Reply-To: References: Message-ID: > When GenShen is only marking the old generation, we do not need the SATB mechanism to preserve young pointers. We currently filter these out of the SATB buffers during the final-update-refs and init-mark safepoints. This increases latency and introduces no small amount of complexity. It should be possible to instead filter out these pointers when the SATB buffers are 'compacted' before being 'completed'. > > # Background > When GenShen is marking the old generation it leaves the SATB barrier enabled. When a young collection interrupts old marking, it creates a situation where a mutator thread could overwrite a field holding a pointer into a collection set region. The SATB barrier will dutifully place this object in the SATB queue. If this pointer makes it into a mark queue, the marking thread will crash. Prior to this change, GenShen filtered out such pointers _after_ the thread local SATB buffers were completed. After this change, such pointers are filtered out _before_ the buffers are completed. This is more inline with the natural way of things. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Improve comment describing the need for a method to filter SATB buffers for degenerated cycles ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27983/files - new: https://git.openjdk.org/jdk/pull/27983/files/4bd602de..c741bf6b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27983&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27983&range=02-03 Stats: 24 lines in 1 file changed: 6 ins; 6 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/27983.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27983/head:pull/27983 PR: https://git.openjdk.org/jdk/pull/27983 From wkemper at openjdk.org Thu Nov 6 18:38:08 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Nov 2025 18:38:08 GMT Subject: RFR: 8358735: GenShen: bug in #undef'd code in block_start() [v6] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 02:24:11 GMT, Kelvin Nilsen wrote: >> When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. >> >> The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Fixup handling of weakly marked objects in remembered set" > > This reverts commit 80198abe5d06c3532d9a43a53691376e990ed45f. Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27353#pullrequestreview-3429901677 From ysr at openjdk.org Thu Nov 6 18:57:04 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 6 Nov 2025 18:57:04 GMT Subject: RFR: 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old [v4] In-Reply-To: References: Message-ID: <3fNre-PPou9riBh4OJnCKeMgVIisuncArHrA44Ao4GQ=.4785a481-4280-414e-8a29-b9f7c654accb@github.com> On Thu, 6 Nov 2025 18:36:48 GMT, William Kemper wrote: >> When GenShen is only marking the old generation, we do not need the SATB mechanism to preserve young pointers. We currently filter these out of the SATB buffers during the final-update-refs and init-mark safepoints. This increases latency and introduces no small amount of complexity. It should be possible to instead filter out these pointers when the SATB buffers are 'compacted' before being 'completed'. >> >> # Background >> When GenShen is marking the old generation it leaves the SATB barrier enabled. When a young collection interrupts old marking, it creates a situation where a mutator thread could overwrite a field holding a pointer into a collection set region. The SATB barrier will dutifully place this object in the SATB queue. If this pointer makes it into a mark queue, the marking thread will crash. Prior to this change, GenShen filtered out such pointers _after_ the thread local SATB buffers were completed. After this change, such pointers are filtered out _before_ the buffers are completed. This is more inline with the natural way of things. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Improve comment describing the need for a method to filter SATB buffers for degenerated cycles I approved this, but will follow up with you offline to more properly understand this. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27983#pullrequestreview-3430002460 From ysr at openjdk.org Thu Nov 6 18:57:08 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 6 Nov 2025 18:57:08 GMT Subject: RFR: 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old [v3] In-Reply-To: <5J3gg2vmNAxe3JuZb88sM_CYm6DPwk9z8-o3ml3_l28=.38cf585c-c2ab-44da-9321-b448da98b4a7@github.com> References: <-MlfBVpHD57gSd_4_0iIDHI0Cv6HIpTj9H9mX-UHS7g=.54423f43-32dd-49c4-8e7e-705dbbc8c825@github.com> <5J3gg2vmNAxe3JuZb88sM_CYm6DPwk9z8-o3ml3_l28=.38cf585c-c2ab-44da-9321-b448da98b4a7@github.com> Message-ID: On Thu, 6 Nov 2025 17:50:50 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1143: >> >>> 1141: // be in the collection set. If this happens, the pointer will be preserved, essentially >>> 1142: // becoming part of the old snapshot. >>> 1143: // 2. The region is allocated during evacuation of old. This is also not a concern because >> >> One related question. In both these cases, I assume the reference will look "marked" because it's above TAMS for the purposes of the old marking? > > In the first case (in-place promotion) the pointer wouldn't necessarily be above TAMS. In this case, we would leave the object in the 'complete' buffer and it would become marked by the old marking threads (it becomes part of the old snapshot). > > The second case is not an issue because none of the cset regions will become trash until _after_ `final-update-refs` (so no regions could become old until after we filter the SATB buffers). For regions that are already old, then the TAMS and mark bit map work as usual. Shouldn't promoted objects look black to the old collection, because that's what SATB marking would want? (In other words, the TAMS for the region should be bottom for that promoted in place region for the purposes of old gen marking.) I'll follow up off-line with you so I understand what's happening here better. Meanwhile, I'm going to re-approve this PR because I don't want to hold it hostage to my misunderstanding at this time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27983#discussion_r2500408183 From xpeng at openjdk.org Thu Nov 6 19:01:22 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Nov 2025 19:01:22 GMT Subject: RFR: 8370850: Shenandoah: Simplify collector allocation to save unnecessary region iteration [v3] In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 22:09:18 GMT, Xiaolong Peng wrote: >> To allocate an object in Collector/OldCollector partition, current implementation may traverse the regions in the partition twice: >> 1. fast path: traverse regions between left most and right most in the partition, and try to allocate in an affiliated region in the partition; >> 2. if fails in fast path, traverse regions between left most empty and right most empty in the partition, and try try to allocate in a FREE region. >> >> 2 can be saved if we also remember the first FREE region seem in 1. >> >> The PR makes the code much cleaner, and more efficient(although the performance impact may not be measurable, I have run some dacapo benchmarks and didn't see meaningful difference) >> >> >> Test: >> - [x] hotspot_gc_shenandoah > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Remove can_allocate_in_new_region > - Merge remote-tracking branch 'origin/master' into collector-allocation > - Remove condition check for trash region > - Address the PR review comments > - Merge branch 'openjdk:master' into collector-allocation > - Touch up > - Remove test 'req.is_old()' when steal an empty region from the mutator view > - Update comment > - Fix wrong condition when steal an empty region from the mutator view > - Fix potential failure in young evac > - ... and 3 more: https://git.openjdk.org/jdk/compare/ec059c0e...8e01d691 Thanks a lot for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28036#issuecomment-3498932713 From xpeng at openjdk.org Thu Nov 6 19:01:23 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Nov 2025 19:01:23 GMT Subject: Integrated: 8370850: Shenandoah: Simplify collector allocation to save unnecessary region iteration In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 05:29:14 GMT, Xiaolong Peng wrote: > To allocate an object in Collector/OldCollector partition, current implementation may traverse the regions in the partition twice: > 1. fast path: traverse regions between left most and right most in the partition, and try to allocate in an affiliated region in the partition; > 2. if fails in fast path, traverse regions between left most empty and right most empty in the partition, and try try to allocate in a FREE region. > > 2 can be saved if we also remember the first FREE region seem in 1. > > The PR makes the code much cleaner, and more efficient(although the performance impact may not be measurable, I have run some dacapo benchmarks and didn't see meaningful difference) > > > Test: > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: 9cc542eb Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/9cc542ebcb81552fe8c32a8cc3c63332853e5127 Stats: 77 lines in 2 files changed: 18 ins; 45 del; 14 mod 8370850: Shenandoah: Simplify collector allocation to save unnecessary region iteration Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/28036 From azeller at openjdk.org Thu Nov 6 19:36:06 2025 From: azeller at openjdk.org (Arno Zeller) Date: Thu, 6 Nov 2025 19:36:06 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space In-Reply-To: References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: On Thu, 6 Nov 2025 11:17:01 GMT, SendaoYan wrote: >> Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. >> >> Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. >> >> Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. >> >> >> >> public class TestLargeObjectAlignmentDeterministic { >> >> static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); >> static final int NODE_COUNT = Integer.getInteger("nodes", 10000); >> static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); >> >> static Object[] objects; >> >> public static void main(String[] args) throws Exception { >> objects = new Object[SLABS_COUNT]; >> >> for (int i = 0; i < SLABS_COUNT; i++) { >> objects[i] = createSome(); >> } >> } >> >> public static Object createSome() { >> List result = new ArrayList(); >> for (int c = 0; c < NODE_COUNT; c++) { >> result.add(new Integer(c)); >> } >> return result; >> } >> >> } > > test/hotspot/jtreg/gc/shenandoah/TestLargeObjectAlignment.java line 34: > >> 32: * @library /test/lib >> 33: * >> 34: * @run main/othervm -Xms3g -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ObjectAlignmentInBytes=16 -Xint TestLargeObjectAlignment > > Since the initial heap memory set to 3G, maybe we should add '@requires os.maxMemory > 4g' to skip this test when the physical memory of test machine is less than 4g. I suggest to also set -Xmx3g - otherwise the test will fail in case an -Xmx value of less than 3GB is set by a jtreg -vmoption. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28167#discussion_r2500542446 From wkemper at openjdk.org Thu Nov 6 19:40:14 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Nov 2025 19:40:14 GMT Subject: Integrated: 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old In-Reply-To: References: Message-ID: On Fri, 24 Oct 2025 21:00:40 GMT, William Kemper wrote: > When GenShen is only marking the old generation, we do not need the SATB mechanism to preserve young pointers. We currently filter these out of the SATB buffers during the final-update-refs and init-mark safepoints. This increases latency and introduces no small amount of complexity. It should be possible to instead filter out these pointers when the SATB buffers are 'compacted' before being 'completed'. > > # Background > When GenShen is marking the old generation it leaves the SATB barrier enabled. When a young collection interrupts old marking, it creates a situation where a mutator thread could overwrite a field holding a pointer into a collection set region. The SATB barrier will dutifully place this object in the SATB queue. If this pointer makes it into a mark queue, the marking thread will crash. Prior to this change, GenShen filtered out such pointers _after_ the thread local SATB buffers were completed. After this change, such pointers are filtered out _before_ the buffers are completed. This is more inline with the natural way of things. This pull request has now been integrated. Changeset: cad73d39 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/cad73d39762974776dd6fda5efe4e2a271d69f14 Stats: 331 lines in 11 files changed: 108 ins; 198 del; 25 mod 8370041: GenShen: Filter young pointers from thread local SATB buffers when only marking old Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/27983 From duke at openjdk.org Thu Nov 6 20:46:13 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 6 Nov 2025 20:46:13 GMT Subject: RFR: 8261743: Shenandoah: enable String deduplication with compact heuristics In-Reply-To: References: Message-ID: <2pqCwlRDMHj8qRBGXssro17pUf9sGnMLaeb4J_mY_iQ=.99c31242-76e5-4f08-b818-e7dfd0a38ed1@github.com> On Thu, 6 Nov 2025 09:36:06 GMT, Aleksey Shipilev wrote: > Looks good. Have you measured any impact on pauses / GC durations in testing? Yeah. Had a simple benchmark below (credit to [here](https://muratakkan.medium.com/understanding-string-deduplication-in-java-how-it-works-and-when-to-use-it-fbda71711435)): @Benchmark public void testMethod(Blackhole bh) { // This is a demo/sample template for building your JMH benchmarks. Edit as needed. // Put your benchmark code here. String[] strings = new String[1000000]; for (int i = 0; i < strings.length; i++) { strings[i] = "This is a test string"; } bh.consume(strings); } Results: ######## java -jar target/benchmarks.jar --jvmArgs "-XX:-UseStringDeduplication -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact" -prof gc ######## Benchmark Mode Cnt Score Error Units MyBenchmark.testMethod thrpt 3 3181.651 ? 248.835 ops/s MyBenchmark.testMethod:gc.alloc.rate thrpt 3 12136.888 ? 949.185 MB/sec MyBenchmark.testMethod:gc.alloc.rate.norm thrpt 3 4000016.188 ? 0.016 B/op MyBenchmark.testMethod:gc.count thrpt 3 1568.000 counts MyBenchmark.testMethod:gc.time thrpt 3 1882.000 ms ######## java -jar target/benchmarks.jar --jvmArgs "-XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact" -prof gc ######## Benchmark Mode Cnt Score Error Units MyBenchmark.testMethod thrpt 3 3155.961 ? 365.174 ops/s # throuput decreased by 0.8% MyBenchmark.testMethod:gc.alloc.rate thrpt 3 12038.882 ? 1394.186 MB/sec # decreased by 0.8% MyBenchmark.testMethod:gc.alloc.rate.norm thrpt 3 4000016.190 ? 0.022 B/op # same MyBenchmark.testMethod:gc.count thrpt 3 1172.000 counts # decreased by 25% MyBenchmark.testMethod:gc.time thrpt 3 726.000 ms # decreased by 38.6% The alloc rate / throughput didn't change much, but the gc count and gc time reduced by 25% and 38.6%. Not that familiar with string deduplication, but according to https://openjdk.org/jeps/192, the results seem to match the implementation - does not dedup at allocation time, but dedup the string internal char array at gc time and gc would be more efficient. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28170#issuecomment-3499316594 From wkemper at openjdk.org Thu Nov 6 21:00:34 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Nov 2025 21:00:34 GMT Subject: RFR: 8370039: GenShen: array copy SATB barrier improvements Message-ID: When an array copy happens concurrently with old and young marking, Shenandoah's generational mode walks over the array twice. This is unnecessary and increases the workload for marking threads. It also has been unconditionally enqueuing old references during a young mark. This is also unnecessary and also increases marking workload. Finally, the barrier went through a somewhat complicated decision process based on affiliation of the region where the array resides. However, the barrier must consider the affiliation of objects that are pointed at by array elements. ------------- Commit messages: - More simplification of arraycopy mark barrier - Merge remote-tracking branch 'jdk/master' into satb-fixes - Merge remote-tracking branch 'jdk/master' into satb-fixes - Why does this break? - Simplify arraycopy barrier case for marking - Merge remote-tracking branch 'jdk/master' into satb-fixes - Do not unconditionally enqueue old objects during marking Changes: https://git.openjdk.org/jdk/pull/28183/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28183&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370039 Stats: 53 lines in 2 files changed: 0 ins; 46 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28183/head:pull/28183 PR: https://git.openjdk.org/jdk/pull/28183 From ysr at openjdk.org Thu Nov 6 21:12:03 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 6 Nov 2025 21:12:03 GMT Subject: RFR: 8358735: GenShen: bug in #undef'd code in block_start() [v6] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 02:24:11 GMT, Kelvin Nilsen wrote: >> When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. >> >> The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Fixup handling of weakly marked objects in remembered set" > > This reverts commit 80198abe5d06c3532d9a43a53691376e990ed45f. It looks like a few older comments had not been published previously. I am flushing those and will take a fresh look at the review. ------------- PR Review: https://git.openjdk.org/jdk/pull/27353#pullrequestreview-3300762300 From ysr at openjdk.org Thu Nov 6 21:12:12 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 6 Nov 2025 21:12:12 GMT Subject: RFR: 8358735: GenShen: bug in #undef'd code in block_start() [v2] In-Reply-To: <7zV-fLvjb-4gBVTppg4XTXPNxEheqLfxB0v_WONuinI=.22775b58-42cf-499e-9007-fad07118217d@github.com> References: <7zV-fLvjb-4gBVTppg4XTXPNxEheqLfxB0v_WONuinI=.22775b58-42cf-499e-9007-fad07118217d@github.com> Message-ID: On Fri, 3 Oct 2025 00:21:11 GMT, Kelvin Nilsen wrote: >> When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. >> >> The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > fix idiosyncratic formatting src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 129: > 127: inline idx_t get_prev_bit_impl(idx_t l_index, idx_t r_index) const; > 128: > 129: inline idx_t get_next_one_offset(idx_t l_index, idx_t r_index) const; Please document analogous to line 131. src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 131: > 129: inline idx_t get_next_one_offset(idx_t l_index, idx_t r_index) const; > 130: > 131: // Search for last one in the range [l_index, r_index). Return r_index if not found. Symmetry arguments wrt spec for `get_next_one_offset` may have preferred range `(l_index, r_index]`, returning `l_index` if none found. May be its (transitive) usage prefers this shape? (See similar comment at line 180.) src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 134: > 132: inline idx_t get_prev_one_offset(idx_t l_index, idx_t r_index) const; > 133: > 134: void clear_large_range(idx_t beg, idx_t end); documentation comment. src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 180: > 178: const HeapWord* limit) const; > 179: > 180: // Return the last marked address in the range [limit, addr], or addr+1 if none found. Symmetry would have preferred `(limit, addr]` as the range with `limit` if none found. However, may be usage of this method prefers the present shape? src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 251: > 249: // if marking context is valid and we are below tams, we use the marking bit map to find the first marked object that > 250: // intersects with this card, and if no such object exists, we return null > 251: if ((ctx != nullptr) && (left < tams)) { It seems like the caller should check if `left >= tams` and short-circuit rather than have this method do that work. src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 663: > 661: // we expect that the marking context isn't available and the crossing maps are valid. > 662: // Note that crossing maps may be invalid following class unloading and before dead > 663: // or unloaded objects have been coalesced and filled (updating the crossing maps). Good comment! What's still not clear is why `tams` and `last_relevant_card_index` are passed here. Does it reduce the work in the caller? I'd expect this to just return the first object on the card index or null if no such object exists. I realize `ctx` is used when one must consult the marking context in preference to the "crossing maps". The relevance of the last 2 arguments isn't clear from this documentation comment. May be I'll see why these are passed in when I look at the method definition, but I suspect there may be some leakage of abstraction & functionality here between caller and callee. src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.inline.hpp line 177: > 175: // common. > 176: assert(ctx != nullptr || heap->old_generation()->is_parsable(), "Error"); > 177: HeapWord* p = _scc->first_object_start(dirty_l, ctx, tams, dirty_r); Passing `ctx`, `tams`, and `dirty_r` into this method seems interesting. Let's see how they are used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2403250259 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2403253570 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2403255239 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2403247074 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2403311952 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2403309508 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2403300158 From duke at openjdk.org Thu Nov 6 21:54:00 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 6 Nov 2025 21:54:00 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space In-Reply-To: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: On Thu, 6 Nov 2025 00:21:55 GMT, Rui Li wrote: > Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. > > Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. > > Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. > > > > public class TestLargeObjectAlignmentDeterministic { > > static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); > static final int NODE_COUNT = Integer.getInteger("nodes", 10000); > static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); > > static Object[] objects; > > public static void main(String[] args) throws Exception { > objects = new Object[SLABS_COUNT]; > > for (int i = 0; i < SLABS_COUNT; i++) { > objects[i] = createSome(); > } > } > > public static Object createSome() { > List result = new ArrayList(); > for (int c = 0; c < NODE_COUNT; c++) { > result.add(new Integer(c)); > } > return result; > } > > } Will set Xmx to 3g instead and add `@requires os.maxMemory > 3g` Initially set Xms to 3g was purely to satisfy the "minimal heap size needed" condition for the test. `@requires os.maxMemory` is apparently more close to what I was thinking and I didn't think about reproducibility. Thanks for the suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28167#issuecomment-3499525127 From duke at openjdk.org Thu Nov 6 22:25:32 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 6 Nov 2025 22:25:32 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space [v2] In-Reply-To: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: > Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. > > Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. > > Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. > > > > public class TestLargeObjectAlignmentDeterministic { > > static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); > static final int NODE_COUNT = Integer.getInteger("nodes", 10000); > static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); > > static Object[] objects; > > public static void main(String[] args) throws Exception { > objects = new Object[SLABS_COUNT]; > > for (int i = 0; i < SLABS_COUNT; i++) { > objects[i] = createSome(); > } > } > > public static Object createSome() { > List result = new ArrayList(); > for (int c = 0; c < NODE_COUNT; c++) { > result.add(new Integer(c)); > } > return result; > } > > } Rui Li has updated the pull request incrementally with one additional commit since the last revision: Set Xmx to 3g ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28167/files - new: https://git.openjdk.org/jdk/pull/28167/files/31e4adfc..6c442aa0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28167&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28167&range=00-01 Stats: 10 lines in 1 file changed: 2 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/28167.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28167/head:pull/28167 PR: https://git.openjdk.org/jdk/pull/28167 From duke at openjdk.org Thu Nov 6 22:35:09 2025 From: duke at openjdk.org (duke) Date: Thu, 6 Nov 2025 22:35:09 GMT Subject: RFR: 8261743: Shenandoah: enable String deduplication with compact heuristics In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 01:38:36 GMT, Rui Li wrote: > Enable `UseStringDeduplication` when using compact heuristics. > > Testing: > > ./build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact -XX:+PrintFlagsFinal --version | grep UseStringDeduplication > bool UseStringDeduplication = true {product} {default} > > > Note: The labels should be `{product} {ergonomic}` ideally. Pending on a separate issue: [JDK-8371381](https://bugs.openjdk.org/browse/JDK-8371381) > > > ------- > > Edit: add benchmark results: > > Had a simple benchmark below (credit to [here](https://muratakkan.medium.com/understanding-string-deduplication-in-java-how-it-works-and-when-to-use-it-fbda71711435)): > > > > @Benchmark > public void testMethod(Blackhole bh) { > String[] strings = new String[1000000]; > for (int i = 0; i < strings.length; i++) { > strings[i] = "This is a test string"; > } > bh.consume(strings); > } > > > Results: > > ######## > java -jar target/benchmarks.jar --jvmArgs "-XX:-UseStringDeduplication -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact" -prof gc > ######## > Benchmark Mode Cnt Score Error Units > MyBenchmark.testMethod thrpt 3 3181.651 ? 248.835 ops/s > MyBenchmark.testMethod:gc.alloc.rate thrpt 3 12136.888 ? 949.185 MB/sec > MyBenchmark.testMethod:gc.alloc.rate.norm thrpt 3 4000016.188 ? 0.016 B/op > MyBenchmark.testMethod:gc.count thrpt 3 1568.000 counts > MyBenchmark.testMethod:gc.time thrpt 3 1882.000 ms > > ######## > java -jar target/benchmarks.jar --jvmArgs "-XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact" -prof gc > ######## > Benchmark Mode Cnt Score Error Units > MyBenchmark.testMethod thrpt 3 3155.961 ? 365.174 ops/s # throuput decreased by 0.8% > MyBenchmark.testMethod:gc.alloc.rate thrpt 3 12038.882 ? 1394.186 MB/sec # decreased by 0.8% > MyBenchmark.testMethod:gc.alloc.rate.norm thrpt 3 4000016.190 ? 0.022 B/op # same > MyBenchmark.testMethod:gc.count thrpt 3 1172.000 counts # decreased by 25% > MyBenchmark.testMethod:gc.time thrpt 3 726.000 ms # decrea... @rgithubli Your change (at version 2407976e137e42f3f831690c14501b0e338a4c4e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28170#issuecomment-3499636944 From duke at openjdk.org Thu Nov 6 22:41:20 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 6 Nov 2025 22:41:20 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space [v3] In-Reply-To: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: <8F4jfHS1_VpRu9qzZdgO85PMEZRvL3FVMHMQDdmr7MU=.da7b2e1c-282b-47d1-9c46-aa684520acbb@github.com> > Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. > > Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. > > Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. > > > > public class TestLargeObjectAlignmentDeterministic { > > static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); > static final int NODE_COUNT = Integer.getInteger("nodes", 10000); > static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); > > static Object[] objects; > > public static void main(String[] args) throws Exception { > objects = new Object[SLABS_COUNT]; > > for (int i = 0; i < SLABS_COUNT; i++) { > objects[i] = createSome(); > } > } > > public static Object createSome() { > List result = new ArrayList(); > for (int c = 0; c < NODE_COUNT; c++) { > result.add(new Integer(c)); > } > return result; > } > > } Rui Li has updated the pull request incrementally with one additional commit since the last revision: Adjust tag order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28167/files - new: https://git.openjdk.org/jdk/pull/28167/files/6c442aa0..3dbe9a10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28167&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28167&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28167.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28167/head:pull/28167 PR: https://git.openjdk.org/jdk/pull/28167 From xpeng at openjdk.org Thu Nov 6 23:16:23 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Nov 2025 23:16:23 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v6] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of mutator memory allocation to improve heap lock contention, at vey high level, here is how it works: > * ShenandoahFreeSet holds a N (default to 13) number of ShenandoahHeapRegion* which are used by mutator threads for regular object allocations, they are called shared regions/directly allocatable regions, which are stored in PaddedEnd data structure(padded array). > * Each mutator thread will be assigned one of the directly allocatable regions, the thread will try to allocate in the directly allocatable region with CAS atomic operation, if fails will try 2 more consecutive directly allocatable regions in the array storing directly allocatable region. > * If mutator thread fails after trying 3 directly allocatable regions, it will: > * Take heap lock > * Try to retire the directly allocatable regions which are ready to retire. > * Iterator mutator partition and allocate directly allocatable regions and store to the padded array if any need to be retired. > * Satisfy mutator allocation request if possible. > > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, I have done many tests, here are some of them: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores: > Openjdk TIP: > > [ec2-user at ip-172-31-42-91 jdk]$ ./master-jdk/bin/java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms4G -Xmx4G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > ===== DaCapo tail latency, metered full smoothing: 50% 131684 usec, 90% 200192 usec, 99% 211369 usec, 99.9% 212517 usec, 99.99% 213043 usec, max 235289 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 1568 usec, 90% 36101 usec, 99% 42172 usec, 99.9% 42928 usec, 99.99% 43100 usec, max 43305 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 52644 usec, 90% 124393 usec, 99% 137711 usec, 99.9% 139355 usec, 99.99% 139749 usec, max 146722 usec, measured over 524288 events ====... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 138 commits: - Add assert back - Address test failures after merging the change from master which unify the accounting in FreeSet and ShenandoahGeneration - Merge remote-tracking branch 'origin/master' into cas-alloc-1 - Merge branch 'openjdk:master' into cas-alloc-1 - Merge branch 'openjdk:master' into cas-alloc-1 - format - Merge branch 'openjdk:master' into cas-alloc-1 - Merge branch 'openjdk:master' into cas-alloc-1 - Merge branch 'master' into cas-alloc-1 - Move ShenandoahHeapRegionIterationClosure to shenandoahFreeSet.hpp - ... and 128 more: https://git.openjdk.org/jdk/compare/ec059c0e...dfb9c415 ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=05 Stats: 780 lines in 17 files changed: 690 ins; 23 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Thu Nov 6 23:29:10 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Nov 2025 23:29:10 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v5] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 22:28:06 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 135 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - format >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Move ShenandoahHeapRegionIterationClosure to shenandoahFreeSet.hpp >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix errors caused by renaming ofAtomic to AtomicAccess >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 125 more: https://git.openjdk.org/jdk/compare/2f613911...e6bfef05 > > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 559: > >> 557: range(1, 128) \ >> 558: \ >> 559: product(uintx, ShenandoahDirectAllocationMaxProbes, 3, EXPERIMENTAL, \ > > I think we found that setting DirectAllocationMaxProbes to equal ShenandoahDirectlyAlloctableRegionCount works "best". I'm inclined to remove this parameter entirely as it somewhat simplifies the implementation. If you think we want to keep it, can you explain the rationale? Would we change the default value? I just fixed all the test failures after merging your changes from tip. Totally agreed, I do plan to remove DirectAllocationMaxProbes, we don't need to keep it based on the test result, I'll update the PR to remove it shortly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2501197258 From duke at openjdk.org Thu Nov 6 23:50:13 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 6 Nov 2025 23:50:13 GMT Subject: Integrated: 8261743: Shenandoah: enable String deduplication with compact heuristics In-Reply-To: References: Message-ID: <3KIq-OuvFuL4npIJZcS5IonVljwX8UMjOCfZNv4Pl-I=.ec7ead22-0fae-4993-ac22-6fa0b3a2c824@github.com> On Thu, 6 Nov 2025 01:38:36 GMT, Rui Li wrote: > Enable `UseStringDeduplication` when using compact heuristics. > > Testing: > > ./build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact -XX:+PrintFlagsFinal --version | grep UseStringDeduplication > bool UseStringDeduplication = true {product} {default} > > > Note: The labels should be `{product} {ergonomic}` ideally. Pending on a separate issue: [JDK-8371381](https://bugs.openjdk.org/browse/JDK-8371381) > > > ------- > > Edit: add benchmark results: > > Had a simple benchmark below (credit to [here](https://muratakkan.medium.com/understanding-string-deduplication-in-java-how-it-works-and-when-to-use-it-fbda71711435)): > > > > @Benchmark > public void testMethod(Blackhole bh) { > String[] strings = new String[1000000]; > for (int i = 0; i < strings.length; i++) { > strings[i] = "This is a test string"; > } > bh.consume(strings); > } > > > Results: > > ######## > java -jar target/benchmarks.jar --jvmArgs "-XX:-UseStringDeduplication -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact" -prof gc > ######## > Benchmark Mode Cnt Score Error Units > MyBenchmark.testMethod thrpt 3 3181.651 ? 248.835 ops/s > MyBenchmark.testMethod:gc.alloc.rate thrpt 3 12136.888 ? 949.185 MB/sec > MyBenchmark.testMethod:gc.alloc.rate.norm thrpt 3 4000016.188 ? 0.016 B/op > MyBenchmark.testMethod:gc.count thrpt 3 1568.000 counts > MyBenchmark.testMethod:gc.time thrpt 3 1882.000 ms > > ######## > java -jar target/benchmarks.jar --jvmArgs "-XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=compact" -prof gc > ######## > Benchmark Mode Cnt Score Error Units > MyBenchmark.testMethod thrpt 3 3155.961 ? 365.174 ops/s # throuput decreased by 0.8% > MyBenchmark.testMethod:gc.alloc.rate thrpt 3 12038.882 ? 1394.186 MB/sec # decreased by 0.8% > MyBenchmark.testMethod:gc.alloc.rate.norm thrpt 3 4000016.190 ? 0.022 B/op # same > MyBenchmark.testMethod:gc.count thrpt 3 1172.000 counts # decreased by 25% > MyBenchmark.testMethod:gc.time thrpt 3 726.000 ms # decrea... This pull request has now been integrated. Changeset: e34a8318 Author: Rui Li Committer: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/e34a831814996be3e0a2df86b11b1718a76ea558 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8261743: Shenandoah: enable String deduplication with compact heuristics Reviewed-by: shade, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/28170 From kdnilsen at openjdk.org Thu Nov 6 23:52:05 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 6 Nov 2025 23:52:05 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v2] In-Reply-To: References: <7zV-fLvjb-4gBVTppg4XTXPNxEheqLfxB0v_WONuinI=.22775b58-42cf-499e-9007-fad07118217d@github.com> Message-ID: On Fri, 3 Oct 2025 20:47:16 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix idiosyncratic formatting > > src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 663: > >> 661: // we expect that the marking context isn't available and the crossing maps are valid. >> 662: // Note that crossing maps may be invalid following class unloading and before dead >> 663: // or unloaded objects have been coalesced and filled (updating the crossing maps). > > Good comment! > > What's still not clear is why `tams` and `last_relevant_card_index` are passed here. Does it reduce the work in the caller? I'd expect this to just return the first object on the card index or null if no such object exists. I realize `ctx` is used when one must consult the marking context in preference to the "crossing maps". The relevance of the last 2 arguments isn't clear from this documentation comment. > > May be I'll see why these are passed in when I look at the method definition, but I suspect there may be some leakage of abstraction & functionality here between caller and callee. Thanks for identifying this "confusion". I'm making an attempt to improve documentation for this comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2501237330 From xpeng at openjdk.org Fri Nov 7 00:12:25 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Nov 2025 00:12:25 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v7] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of mutator memory allocation to improve heap lock contention, at vey high level, here is how it works: > * ShenandoahFreeSet holds a N (default to 13) number of ShenandoahHeapRegion* which are used by mutator threads for regular object allocations, they are called shared regions/directly allocatable regions, which are stored in PaddedEnd data structure(padded array). > * Each mutator thread will be assigned one of the directly allocatable regions, the thread will try to allocate in the directly allocatable region with CAS atomic operation, if fails will try 2 more consecutive directly allocatable regions in the array storing directly allocatable region. > * If mutator thread fails after trying 3 directly allocatable regions, it will: > * Take heap lock > * Try to retire the directly allocatable regions which are ready to retire. > * Iterator mutator partition and allocate directly allocatable regions and store to the padded array if any need to be retired. > * Satisfy mutator allocation request if possible. > > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, I have done many tests, here are some of them: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores: > Openjdk TIP: > > [ec2-user at ip-172-31-42-91 jdk]$ ./master-jdk/bin/java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms4G -Xmx4G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > ===== DaCapo tail latency, metered full smoothing: 50% 131684 usec, 90% 200192 usec, 99% 211369 usec, 99.9% 212517 usec, 99.99% 213043 usec, max 235289 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 1568 usec, 90% 36101 usec, 99% 42172 usec, 99.9% 42928 usec, 99.99% 43100 usec, max 43305 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 52644 usec, 90% 124393 usec, 99% 137711 usec, 99.9% 139355 usec, 99.99% 139749 usec, max 146722 usec, measured over 524288 events ====... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Set region back to empty and unaffiliated when release a directly allocatable region(may happen in full GC) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/dfb9c415..3fdb0bc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=05-06 Stats: 12 lines in 2 files changed: 10 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Nov 7 00:49:53 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Nov 2025 00:49:53 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v8] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of mutator memory allocation to improve heap lock contention, at vey high level, here is how it works: > * ShenandoahFreeSet holds a N (default to 13) number of ShenandoahHeapRegion* which are used by mutator threads for regular object allocations, they are called shared regions/directly allocatable regions, which are stored in PaddedEnd data structure(padded array). > * Each mutator thread will be assigned one of the directly allocatable regions, the thread will try to allocate in the directly allocatable region with CAS atomic operation, if fails will try 2 more consecutive directly allocatable regions in the array storing directly allocatable region. > * If mutator thread fails after trying 3 directly allocatable regions, it will: > * Take heap lock > * Try to retire the directly allocatable regions which are ready to retire. > * Iterator mutator partition and allocate directly allocatable regions and store to the padded array if any need to be retired. > * Satisfy mutator allocation request if possible. > > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, I have done many tests, here are some of them: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores: > Openjdk TIP: > > [ec2-user at ip-172-31-42-91 jdk]$ ./master-jdk/bin/java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms4G -Xmx4G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > ===== DaCapo tail latency, metered full smoothing: 50% 131684 usec, 90% 200192 usec, 99% 211369 usec, 99.9% 212517 usec, 99.99% 213043 usec, max 235289 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 1568 usec, 90% 36101 usec, 99% 42172 usec, 99.9% 42928 usec, 99.99% 43100 usec, max 43305 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 52644 usec, 90% 124393 usec, 99% 137711 usec, 99.9% 139355 usec, 99.99% 139749 usec, max 146722 usec, measured over 524288 events ====... Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: - Remove unnecessary fences - tidy up - Remove ShenandoahDirectAllocationMaxProbes, simplifications ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/3fdb0bc0..68b4673a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=06-07 Stats: 55 lines in 3 files changed: 12 ins; 34 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Nov 7 01:07:51 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Nov 2025 01:07:51 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v9] In-Reply-To: References: Message-ID: <_31UGErPvJMV2V7F92kffihoUTtJ7L3HrX-SGgTXj_M=.030a9b2a-8ef0-4d23-a2dd-69ad70e17656@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of mutator memory allocation to improve heap lock contention, at vey high level, here is how it works: > * ShenandoahFreeSet holds a N (default to 13) number of ShenandoahHeapRegion* which are used by mutator threads for regular object allocations, they are called shared regions/directly allocatable regions, which are stored in PaddedEnd data structure(padded array). > * Each mutator thread will be assigned one of the directly allocatable regions, the thread will try to allocate in the directly allocatable region with CAS atomic operation, if fails will try 2 more consecutive directly allocatable regions in the array storing directly allocatable region. > * If mutator thread fails after trying 3 directly allocatable regions, it will: > * Take heap lock > * Try to retire the directly allocatable regions which are ready to retire. > * Iterator mutator partition and allocate directly allocatable regions and store to the padded array if any need to be retired. > * Satisfy mutator allocation request if possible. > > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, I have done many tests, here are some of them: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores: > Openjdk TIP: > > [ec2-user at ip-172-31-42-91 jdk]$ ./master-jdk/bin/java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms4G -Xmx4G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > ===== DaCapo tail latency, metered full smoothing: 50% 131684 usec, 90% 200192 usec, 99% 211369 usec, 99.9% 212517 usec, 99.99% 213043 usec, max 235289 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 1568 usec, 90% 36101 usec, 99% 42172 usec, 99.9% 42928 usec, 99.99% 43100 usec, max 43305 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 52644 usec, 90% 124393 usec, 99% 137711 usec, 99.9% 139355 usec, 99.99% 139749 usec, max 146722 usec, measured over 524288 events ====... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 143 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - Remove unnecessary fences - tidy up - Remove ShenandoahDirectAllocationMaxProbes, simplifications - Set region back to empty and unaffiliated when release a directly allocatable region(may happen in full GC) - Add assert back - Address test failures after merging the change from master which unify the accounting in FreeSet and ShenandoahGeneration - Merge remote-tracking branch 'origin/master' into cas-alloc-1 - Merge branch 'openjdk:master' into cas-alloc-1 - Merge branch 'openjdk:master' into cas-alloc-1 - ... and 133 more: https://git.openjdk.org/jdk/compare/e34a8318...e65457f6 ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=08 Stats: 767 lines in 17 files changed: 677 ins; 23 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From syan at openjdk.org Fri Nov 7 02:12:06 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 7 Nov 2025 02:12:06 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space [v3] In-Reply-To: <8F4jfHS1_VpRu9qzZdgO85PMEZRvL3FVMHMQDdmr7MU=.da7b2e1c-282b-47d1-9c46-aa684520acbb@github.com> References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> <8F4jfHS1_VpRu9qzZdgO85PMEZRvL3FVMHMQDdmr7MU=.da7b2e1c-282b-47d1-9c46-aa684520acbb@github.com> Message-ID: On Thu, 6 Nov 2025 22:41:20 GMT, Rui Li wrote: >> Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. >> >> Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. >> >> Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. >> >> >> >> public class TestLargeObjectAlignmentDeterministic { >> >> static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); >> static final int NODE_COUNT = Integer.getInteger("nodes", 10000); >> static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); >> >> static Object[] objects; >> >> public static void main(String[] args) throws Exception { >> objects = new Object[SLABS_COUNT]; >> >> for (int i = 0; i < SLABS_COUNT; i++) { >> objects[i] = createSome(); >> } >> } >> >> public static Object createSome() { >> List result = new ArrayList(); >> for (int c = 0; c < NODE_COUNT; c++) { >> result.add(new Integer(c)); >> } >> return result; >> } >> >> } > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Adjust tag order test/hotspot/jtreg/gc/shenandoah/TestLargeObjectAlignment.java line 32: > 30: * @requires vm.gc.Shenandoah > 31: * @requires vm.bits == "64" > 32: * @requires os.maxMemory > 3G I think os.maxMemory should slightly large than max heap memory size, since heap memory is not the only memory in jvm. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28167#discussion_r2501446720 From kdnilsen at openjdk.org Fri Nov 7 06:14:45 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 7 Nov 2025 06:14:45 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v7] In-Reply-To: References: Message-ID: > When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. > > The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: fix up comments and simplify API for ShenandoahScanRemembered::first_object_start() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27353/files - new: https://git.openjdk.org/jdk/pull/27353/files/643cdfd6..637c1775 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=05-06 Stats: 43 lines in 3 files changed: 19 ins; 13 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/27353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27353/head:pull/27353 PR: https://git.openjdk.org/jdk/pull/27353 From kdnilsen at openjdk.org Fri Nov 7 18:25:37 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 7 Nov 2025 18:25:37 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v8] In-Reply-To: References: Message-ID: <6QHIbCuNkwTlUX2e3kjvaQF0G6WjJSQbP0bmqLtoYXQ=.1b42521a-eff2-4eab-834b-97c9dc537f2b@github.com> > When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. > > The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: consider last_relevant_card in determining right-most address ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27353/files - new: https://git.openjdk.org/jdk/pull/27353/files/637c1775..0e2120b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27353/head:pull/27353 PR: https://git.openjdk.org/jdk/pull/27353 From xpeng at openjdk.org Fri Nov 7 19:42:31 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Nov 2025 19:42:31 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v10] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of mutator memory allocation to improve heap lock contention, at vey high level, here is how it works: > * ShenandoahFreeSet holds a N (default to 13) number of ShenandoahHeapRegion* which are used by mutator threads for regular object allocations, they are called shared regions/directly allocatable regions, which are stored in PaddedEnd data structure(padded array). > * Each mutator thread will be assigned one of the directly allocatable regions, the thread will try to allocate in the directly allocatable region with CAS atomic operation, if fails will try 2 more consecutive directly allocatable regions in the array storing directly allocatable region. > * If mutator thread fails after trying 3 directly allocatable regions, it will: > * Take heap lock > * Try to retire the directly allocatable regions which are ready to retire. > * Iterator mutator partition and allocate directly allocatable regions and store to the padded array if any need to be retired. > * Satisfy mutator allocation request if possible. > > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, I have done many tests, here are some of them: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores: > Openjdk TIP: > > [ec2-user at ip-172-31-42-91 jdk]$ ./master-jdk/bin/java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms4G -Xmx4G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > ===== DaCapo tail latency, metered full smoothing: 50% 131684 usec, 90% 200192 usec, 99% 211369 usec, 99.9% 212517 usec, 99.99% 213043 usec, max 235289 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 1568 usec, 90% 36101 usec, 99% 42172 usec, 99.9% 42928 usec, 99.99% 43100 usec, max 43305 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 52644 usec, 90% 124393 usec, 99% 137711 usec, 99.9% 139355 usec, 99.99% 139749 usec, max 146722 usec, measured over 524288 events ====... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 144 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - Merge branch 'openjdk:master' into cas-alloc-1 - Remove unnecessary fences - tidy up - Remove ShenandoahDirectAllocationMaxProbes, simplifications - Set region back to empty and unaffiliated when release a directly allocatable region(may happen in full GC) - Add assert back - Address test failures after merging the change from master which unify the accounting in FreeSet and ShenandoahGeneration - Merge remote-tracking branch 'origin/master' into cas-alloc-1 - Merge branch 'openjdk:master' into cas-alloc-1 - ... and 134 more: https://git.openjdk.org/jdk/compare/2c3c4707...69aec972 ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=09 Stats: 767 lines in 17 files changed: 677 ins; 23 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From xpeng at openjdk.org Fri Nov 7 19:42:31 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Nov 2025 19:42:31 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v5] In-Reply-To: <5YSA3F88CmDDv09M2KOm_EFNDh_09LPO2WMrgETfupI=.cc658dc9-829e-41a5-ad76-393d3eb0f75a@github.com> References: <5YSA3F88CmDDv09M2KOm_EFNDh_09LPO2WMrgETfupI=.cc658dc9-829e-41a5-ad76-393d3eb0f75a@github.com> Message-ID: On Wed, 5 Nov 2025 19:00:37 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 848: >> >>> 846: HeapWord* ShenandoahFreeSet::allocate_with_affiliation(Iter& iterator, ShenandoahAffiliation affiliation, ShenandoahAllocRequest& req, bool& in_new_region) { >>> 847: for (idx_t idx = iterator.current(); iterator.has_next(); idx = iterator.next()) { >>> 848: ShenandoahHeapRegion* r = _heap->get_region(idx); >> >> I wonder if we could refine this a little bit. When the region is moved into the "directly allocatable" set, wouldn't we remove it from its partition? Then, we wouldn't have to test for !r->reserved_for_direct_allocation() here because the iterator wouldn't produce it. >> >> We could maybe replace this test with an assert that !r->reserved_for_direct_allocation(). > > Same issue in other uses of the allocation iterator. You are right, allocate_with_affiliation is only called from collector, it won't see any regions from Mutator partition, we don't even need to the assert here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2505191139 From xpeng at openjdk.org Fri Nov 7 19:48:13 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Nov 2025 19:48:13 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v5] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 19:02:59 GMT, Kelvin Nilsen wrote: >> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 135 commits: >> >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - format >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Merge branch 'master' into cas-alloc-1 >> - Move ShenandoahHeapRegionIterationClosure to shenandoahFreeSet.hpp >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - Fix errors caused by renaming ofAtomic to AtomicAccess >> - Merge branch 'openjdk:master' into cas-alloc-1 >> - ... and 125 more: https://git.openjdk.org/jdk/compare/2f613911...e6bfef05 > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1268: > >> 1266: // If region is not completely free, the current [beg; end] is useless, and we may fast-forward. If we can extend >> 1267: // the existing range, we can exploit that certain regions are already known to be in the Mutator free set. >> 1268: ShenandoahHeapRegion* region = _heap->get_region(end); > > Here also, if we remove the region from the partition when we make it directly allocatable, we would not need to rewrite this loop. Yes, same as your comment above. Mutator thread shouldn't see any region which has been made directly allocatable in the Mutator partition here. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2204: > >> 2202: i++; >> 2203: } >> 2204: return obj; > > I think obj always equals nullptr at this point. Seems the code would be easier to understand (and would depend less on effective compiler optimization) if we just made that explicit. Can we just say: > > return nullptr? Yes, it is always `nullptr`, `return nullptr` will make the code more readable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2505212589 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2505221448 From duke at openjdk.org Fri Nov 7 21:02:34 2025 From: duke at openjdk.org (Nityanand Rai) Date: Fri, 7 Nov 2025 21:02:34 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking Message-ID: Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. ------------- Commit messages: - clean whitespace - minor cleanup - Merge branch 'openjdk:master' into 8371284 - exclude young-yong, old-old and honor UseCondCardMark in dirty card marking Changes: https://git.openjdk.org/jdk/pull/28204/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28204&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371284 Stats: 21 lines in 1 file changed: 20 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28204.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28204/head:pull/28204 PR: https://git.openjdk.org/jdk/pull/28204 From xpeng at openjdk.org Fri Nov 7 21:06:28 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Nov 2025 21:06:28 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v11] In-Reply-To: References: Message-ID: <6pTKMOkjQwuamgBRyOItAlgwlNQD18aupF0ykkYN3GA=.690dc564-2ce0-4684-9a25-0c7162857167@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of mutator memory allocation to improve heap lock contention, at vey high level, here is how it works: > * ShenandoahFreeSet holds a N (default to 13) number of ShenandoahHeapRegion* which are used by mutator threads for regular object allocations, they are called shared regions/directly allocatable regions, which are stored in PaddedEnd data structure(padded array). > * Each mutator thread will be assigned one of the directly allocatable regions, the thread will try to allocate in the directly allocatable region with CAS atomic operation, if fails will try 2 more consecutive directly allocatable regions in the array storing directly allocatable region. > * If mutator thread fails after trying 3 directly allocatable regions, it will: > * Take heap lock > * Try to retire the directly allocatable regions which are ready to retire. > * Iterator mutator partition and allocate directly allocatable regions and store to the padded array if any need to be retired. > * Satisfy mutator allocation request if possible. > > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, I have done many tests, here are some of them: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores: > Openjdk TIP: > > [ec2-user at ip-172-31-42-91 jdk]$ ./master-jdk/bin/java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms4G -Xmx4G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > ===== DaCapo tail latency, metered full smoothing: 50% 131684 usec, 90% 200192 usec, 99% 211369 usec, 99.9% 212517 usec, 99.99% 213043 usec, max 235289 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 1568 usec, 90% 36101 usec, 99% 42172 usec, 99.9% 42928 usec, 99.99% 43100 usec, max 43305 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 52644 usec, 90% 124393 usec, 99% 137711 usec, 99.9% 139355 usec, 99.99% 139749 usec, max 146722 usec, measured over 524288 events ====... Xiaolong Peng has updated the pull request incrementally with four additional commits since the last revision: - Fix wrong asserts - Remove unnecessary test for directly allocable region - Merge branch 'cas-alloc-1' of https://github.com/pengxiaolong/jdk into cas-alloc-1 - tidy up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/69aec972..b1c3b90f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=09-10 Stats: 11 lines in 1 file changed: 1 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From wkemper at openjdk.org Fri Nov 7 21:44:01 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 7 Nov 2025 21:44:01 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 20:42:25 GMT, Nityanand Rai wrote: > Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. Let's make sure we don't mark cards for null objects. src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 200: > 198: // Exclude old-old > 199: T heap_oop = RawAccess<>::oop_load(field); > 200: if (!CompressedOops::is_null(heap_oop)) { We can return early if `CompressedOops::is_null` (we don't need to remember old pointers that are null). ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28204#pullrequestreview-3436538439 PR Review Comment: https://git.openjdk.org/jdk/pull/28204#discussion_r2505630257 From duke at openjdk.org Fri Nov 7 21:51:22 2025 From: duke at openjdk.org (Rui Li) Date: Fri, 7 Nov 2025 21:51:22 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space [v4] In-Reply-To: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: > Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. > > Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. > > Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. > > > > public class TestLargeObjectAlignmentDeterministic { > > static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); > static final int NODE_COUNT = Integer.getInteger("nodes", 10000); > static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); > > static Object[] objects; > > public static void main(String[] args) throws Exception { > objects = new Object[SLABS_COUNT]; > > for (int i = 0; i < SLABS_COUNT; i++) { > objects[i] = createSome(); > } > } > > public static Object createSome() { > List result = new ArrayList(); > for (int c = 0; c < NODE_COUNT; c++) { > result.add(new Integer(c)); > } > return result; > } > > } Rui Li has updated the pull request incrementally with one additional commit since the last revision: Bump maxMem to 4g ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28167/files - new: https://git.openjdk.org/jdk/pull/28167/files/3dbe9a10..142d00e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28167&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28167&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28167.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28167/head:pull/28167 PR: https://git.openjdk.org/jdk/pull/28167 From duke at openjdk.org Fri Nov 7 23:09:47 2025 From: duke at openjdk.org (Nityanand Rai) Date: Fri, 7 Nov 2025 23:09:47 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking [v2] In-Reply-To: References: Message-ID: > Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. Nityanand Rai has updated the pull request incrementally with one additional commit since the last revision: early return of oop is null ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28204/files - new: https://git.openjdk.org/jdk/pull/28204/files/09ad25ab..59f7a0d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28204&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28204&range=00-01 Stats: 8 lines in 1 file changed: 1 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28204.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28204/head:pull/28204 PR: https://git.openjdk.org/jdk/pull/28204 From duke at openjdk.org Fri Nov 7 23:09:48 2025 From: duke at openjdk.org (Nityanand Rai) Date: Fri, 7 Nov 2025 23:09:48 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 21:40:38 GMT, William Kemper wrote: >> Nityanand Rai has updated the pull request incrementally with one additional commit since the last revision: >> >> early return of oop is null > > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 200: > >> 198: // Exclude old-old >> 199: T heap_oop = RawAccess<>::oop_load(field); >> 200: if (!CompressedOops::is_null(heap_oop)) { > > We can return early if `CompressedOops::is_null` (we don't need to remember old pointers that are null). Good point, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28204#discussion_r2505799473 From syan at openjdk.org Sat Nov 8 02:31:03 2025 From: syan at openjdk.org (SendaoYan) Date: Sat, 8 Nov 2025 02:31:03 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space [v4] In-Reply-To: References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: On Fri, 7 Nov 2025 21:51:22 GMT, Rui Li wrote: >> Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. >> >> Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. >> >> Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. >> >> >> >> public class TestLargeObjectAlignmentDeterministic { >> >> static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); >> static final int NODE_COUNT = Integer.getInteger("nodes", 10000); >> static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); >> >> static Object[] objects; >> >> public static void main(String[] args) throws Exception { >> objects = new Object[SLABS_COUNT]; >> >> for (int i = 0; i < SLABS_COUNT; i++) { >> objects[i] = createSome(); >> } >> } >> >> public static Object createSome() { >> List result = new ArrayList(); >> for (int c = 0; c < NODE_COUNT; c++) { >> result.add(new Integer(c)); >> } >> return result; >> } >> >> } > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Bump maxMem to 4g Marked as reviewed by syan (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28167#pullrequestreview-3437094158 From kdnilsen at openjdk.org Sat Nov 8 19:54:44 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 8 Nov 2025 19:54:44 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v9] In-Reply-To: References: Message-ID: > When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. > > The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Refinements and debugging ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27353/files - new: https://git.openjdk.org/jdk/pull/27353/files/0e2120b8..29f5d42c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=07-08 Stats: 46 lines in 4 files changed: 27 ins; 4 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/27353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27353/head:pull/27353 PR: https://git.openjdk.org/jdk/pull/27353 From kdnilsen at openjdk.org Sun Nov 9 16:11:46 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 9 Nov 2025 16:11:46 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v10] In-Reply-To: References: Message-ID: > When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. > > The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: fix multiple errors introduced by minor refactoring of API ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27353/files - new: https://git.openjdk.org/jdk/pull/27353/files/29f5d42c..cee16f88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=08-09 Stats: 16 lines in 4 files changed: 13 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/27353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27353/head:pull/27353 PR: https://git.openjdk.org/jdk/pull/27353 From kdnilsen at openjdk.org Sun Nov 9 16:16:25 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 9 Nov 2025 16:16:25 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v11] In-Reply-To: References: Message-ID: > When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. > > The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove debug instrumentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27353/files - new: https://git.openjdk.org/jdk/pull/27353/files/cee16f88..9f629a2a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=09-10 Stats: 12 lines in 1 file changed: 0 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27353/head:pull/27353 PR: https://git.openjdk.org/jdk/pull/27353 From shade at openjdk.org Mon Nov 10 09:12:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Nov 2025 09:12:04 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space [v4] In-Reply-To: References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: On Fri, 7 Nov 2025 21:51:22 GMT, Rui Li wrote: >> Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. >> >> Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. >> >> Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. >> >> >> >> public class TestLargeObjectAlignmentDeterministic { >> >> static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); >> static final int NODE_COUNT = Integer.getInteger("nodes", 10000); >> static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); >> >> static Object[] objects; >> >> public static void main(String[] args) throws Exception { >> objects = new Object[SLABS_COUNT]; >> >> for (int i = 0; i < SLABS_COUNT; i++) { >> objects[i] = createSome(); >> } >> } >> >> public static Object createSome() { >> List result = new ArrayList(); >> for (int c = 0; c < NODE_COUNT; c++) { >> result.add(new Integer(c)); >> } >> return result; >> } >> >> } > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Bump maxMem to 4g Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28167#pullrequestreview-3441786698 From shade at openjdk.org Mon Nov 10 11:43:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Nov 2025 11:43:04 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking [v2] In-Reply-To: References: Message-ID: On Fri, 7 Nov 2025 23:09:47 GMT, Nityanand Rai wrote: >> Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. > > Nityanand Rai has updated the pull request incrementally with one additional commit since the last revision: > > early return of oop is null Generally looks fine, just tighten up the comments. src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 197: > 195: if (_heap->is_in_young(field)) { > 196: return; > 197: } Suggestion: if (_heap->is_in_young(field)) { // Young field stores do not require card mark. return; } src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 206: > 204: if (!_heap->is_in_young(obj)) { > 205: return; > 206: } Suggestion: if (!_heap->is_in_young(obj)) { // Young object -> old field stores do not require card mark. return; } src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 207: > 205: return; > 206: } > 207: // Honor UseCondCardMark: check if card is already dirty before writing Suggestion: ------------- PR Review: https://git.openjdk.org/jdk/pull/28204#pullrequestreview-3442710354 PR Review Comment: https://git.openjdk.org/jdk/pull/28204#discussion_r2510147860 PR Review Comment: https://git.openjdk.org/jdk/pull/28204#discussion_r2510152910 PR Review Comment: https://git.openjdk.org/jdk/pull/28204#discussion_r2510153202 From adinn at openjdk.org Mon Nov 10 13:19:08 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 10 Nov 2025 13:19:08 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 05:11:47 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge master > - update make_barrier_type > - Merge branch 'openjdk:master' into new_pr > - Merge branch 'openjdk:master' into new_pr > - My chages Looks good to me. @Harshit470250 You need another reviewer before you can push this. Perhaps @dean-long can help -- he reviewed the earlier commit which led to this one being created. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27279#pullrequestreview-3443199481 PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3511586105 From tschatzl at openjdk.org Mon Nov 10 13:47:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 10 Nov 2025 13:47:20 GMT Subject: RFR: 8371321: Remove unused last arg of BarrierSetAssembler::arraycopy_epilogue In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 10:10:02 GMT, Albert Mingkun Yang wrote: > Removing effectively dead code. > > Test: tier1, GHA Looks good. Thanks for cleaning this up. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28146#pullrequestreview-3443342641 From ayang at openjdk.org Mon Nov 10 14:29:56 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 10 Nov 2025 14:29:56 GMT Subject: RFR: 8371321: Remove unused last arg of BarrierSetAssembler::arraycopy_epilogue In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 10:10:02 GMT, Albert Mingkun Yang wrote: > Removing effectively dead code. > > Test: tier1, GHA Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28146#issuecomment-3512011561 From kdnilsen at openjdk.org Mon Nov 10 14:31:08 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 14:31:08 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v2] In-Reply-To: References: <7zV-fLvjb-4gBVTppg4XTXPNxEheqLfxB0v_WONuinI=.22775b58-42cf-499e-9007-fad07118217d@github.com> Message-ID: On Fri, 3 Oct 2025 20:49:01 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix idiosyncratic formatting > > src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 251: > >> 249: // if marking context is valid and we are below tams, we use the marking bit map to find the first marked object that >> 250: // intersects with this card, and if no such object exists, we return null >> 251: if ((ctx != nullptr) && (left < tams)) { > > It seems like the caller should check if `left >= tams` and short-circuit rather than have this method do that work. That comment is wrong, which is what caused you to request the alternative semantics for this function. Your comments and questions motivated me to rewrite the comments describing the behavior of this function. Rewriting the comments helped me realize the API was a bit ill-defined. I made some improvements to the behavior so that the definition could be more clearly defined. The new implementation now passes all tests again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2510782671 From ayang at openjdk.org Mon Nov 10 14:32:20 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 10 Nov 2025 14:32:20 GMT Subject: Integrated: 8371321: Remove unused last arg of BarrierSetAssembler::arraycopy_epilogue In-Reply-To: References: Message-ID: <3MvmZGPFXA7cqYhLs44MVY_P3BFyuUh4y5mlSWbGUxA=.d2d98f00-d991-43cd-8dd1-7d5d96c4031c@github.com> On Wed, 5 Nov 2025 10:10:02 GMT, Albert Mingkun Yang wrote: > Removing effectively dead code. > > Test: tier1, GHA This pull request has now been integrated. Changeset: 9d2fa8fe Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/9d2fa8fe22652cbf1c70b953247bd154b363b383 Stats: 38 lines in 16 files changed: 0 ins; 6 del; 32 mod 8371321: Remove unused last arg of BarrierSetAssembler::arraycopy_epilogue Reviewed-by: fandreuzzi, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/28146 From kdnilsen at openjdk.org Mon Nov 10 14:39:09 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 14:39:09 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v13] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 58 commits: - Fix mistaken merge resolution - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates The resulting fastdebug build has 64 failures. I need to debug these. Probably introduced by improper resolution of merge conflicts - fix error in merge conflict resolution - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates - rework CompressedClassSpaceSizeinJmapHeap.java - fix errors in CompressedClassSpaceSizeInJmapHeap.java - Add debug instrumentation to CompressedClassSpaceSizeInJmapHeap.java - fix two indexing bugs - add an assert to detect suspected bug - Remove debug scaffolding - ... and 48 more: https://git.openjdk.org/jdk/compare/c272aca8...16cd6f8a ------------- Changes: https://git.openjdk.org/jdk/pull/24319/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=12 Stats: 284 lines in 31 files changed: 115 ins; 30 del; 139 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From kdnilsen at openjdk.org Mon Nov 10 14:39:12 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 14:39:12 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v12] In-Reply-To: <2yyHGosCQCiu03jYG7lG82_2WrppAZqVQgDATo8bcfQ=.9c8bed3d-980f-499f-a467-a5cd1d90b6b8@github.com> References: <2yyHGosCQCiu03jYG7lG82_2WrppAZqVQgDATo8bcfQ=.9c8bed3d-980f-499f-a467-a5cd1d90b6b8@github.com> Message-ID: On Mon, 13 Oct 2025 22:18:38 GMT, Kelvin Nilsen wrote: >> The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. >> >> However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. >> >> This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates > - remove _mixed_candidate_garbage_words from ShenandoahHeapRegion > - reviewer feedback to reduce code duplication > - Fix compilation errors after merge > - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates > - Fix uninitialized variable > - Remove deprecation conditional compiles > - Adjust candidate live memory for each mixed evac > - Refactor for better abstraction > - Fix set_live() after full gc > - ... and 35 more: https://git.openjdk.org/jdk/compare/92f2ab2e...24322e75 I'm going to place this in draft while I experiment with the code to diagnose apparent regressions with traditional shenandoah mode. I have placed instrumentation into the code to confirm that the live_data reported by ShenandoahHeapRegion is the same before and after this PR for traditional Shenandoah mode. So the regressions are either "signal noise", or perhaps inefficiencies introduced regarding how we compute the live_data. After refactoring the code to perform better in the "tight" rebuild free-set and build-collection set loops, the Shenandoah results show very slight improvement (rather than regression) on specjbb2015. Here is the performance regression that we saw before commit https://github.com/openjdk/jdk/pull/24319/commits/ecdec6363ee9e0a27f4207350cf0b51f8c99bab5 image Here are comparisons (in a slightly different environment) after that same commit: image ------------- PR Comment: https://git.openjdk.org/jdk/pull/24319#issuecomment-3401760959 PR Comment: https://git.openjdk.org/jdk/pull/24319#issuecomment-3416424293 PR Comment: https://git.openjdk.org/jdk/pull/24319#issuecomment-3453055680 From kdnilsen at openjdk.org Mon Nov 10 15:37:59 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 15:37:59 GMT Subject: RFR: 8371573: Shenandoah: Fix style of include directive Message-ID: Fix style of include directive for improved consistency and compatibility with GraalVM conventions. ------------- Commit messages: - Fix style of include directive Changes: https://git.openjdk.org/jdk/pull/28219/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28219&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371573 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28219.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28219/head:pull/28219 PR: https://git.openjdk.org/jdk/pull/28219 From wkemper at openjdk.org Mon Nov 10 15:45:55 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Nov 2025 15:45:55 GMT Subject: RFR: 8371573: Shenandoah: Fix style of include directive In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 15:32:03 GMT, Kelvin Nilsen wrote: > Fix style of include directive for improved consistency and compatibility with GraalVM conventions. This include looks vestigial. Can we just delete it? ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28219#pullrequestreview-3443924743 From kdnilsen at openjdk.org Mon Nov 10 15:55:12 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 15:55:12 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v2] In-Reply-To: References: Message-ID: <_3OuNx5vGJMp_kP0hRZgLvkJgOUXuKVpJZ8ores0H-8=.3ca53897-5f22-4d1c-b3e6-a8e71422612a@github.com> On Sat, 31 May 2025 02:53:52 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: >> >> - respond to reviewer feedback >> - Keep gc cycle times with heuristics for the relevant generation > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahGlobalHeuristics.cpp line 183: > >> 181: } >> 182: >> 183: heap->young_generation()->set_evacuation_reserve((size_t) (young_evac_bytes * ShenandoahEvacWaste)); > > So we are using the amount to be evacuated out of young (suitably marked up to account for waste) from the collection set of a specific cycle to predict the same for the next cycle? And similarly for the promotion bytes. > > This seems reasonable, but how does that compare with using the live data identified in the most recent marking cycle instead? I can imagine that the former is more accurate under steady state assumptions and the latter is an overestimate to the extent that not all live data will be evacuated because it's in mostly live, i.e. densely live regions. However, it would be interesting to see how they compare and which tracks reality better. Since this is in the nature of a prediction/estimate, once can consider a control algorithm that tries to move the estimate closer based on minimizing some historical deviation between marked vs evacuated. > > This need not be done here, but can be considered a future enhancement/experiment. These reserves are applied to the current cycle, as we are about to begin evacuation and we need to know how much memory will be consumed in old and young during this cycle. The promo budget helps assure that overly aggressive promotions do not cause mixed evacuations to fail (by consuming memory that should have been set aside to hold the mixed evacuations). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2511095501 From kdnilsen at openjdk.org Mon Nov 10 16:02:53 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 16:02:53 GMT Subject: RFR: 8371573: Shenandoah: Fix style of include directive [v2] In-Reply-To: References: Message-ID: > Fix style of include directive for improved consistency and compatibility with GraalVM conventions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove unneeded include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28219/files - new: https://git.openjdk.org/jdk/pull/28219/files/2d926fb9..5c95f64e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28219&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28219&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28219.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28219/head:pull/28219 PR: https://git.openjdk.org/jdk/pull/28219 From kdnilsen at openjdk.org Mon Nov 10 16:02:55 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 16:02:55 GMT Subject: RFR: 8371573: Shenandoah: Fix style of include directive [v2] In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 15:42:48 GMT, William Kemper wrote: > This include looks vestigial. Can we just delete it? Good catch. I'll make that change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28219#issuecomment-3512565592 From shade at openjdk.org Mon Nov 10 16:14:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Nov 2025 16:14:52 GMT Subject: RFR: 8371573: Shenandoah: Fix style of include directive [v2] In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 16:02:53 GMT, Kelvin Nilsen wrote: >> Fix style of include directive for improved consistency and compatibility with GraalVM conventions. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove unneeded include This looks fine, but I adjusted the bug title, so PR title also needs adjustments to match it. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28219#pullrequestreview-3444057242 From shade at openjdk.org Mon Nov 10 16:17:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Nov 2025 16:17:59 GMT Subject: RFR: 8371573: Shenandoah: Fix style of include directive [v2] In-Reply-To: References: Message-ID: <-l0j28TqIU6ZzR3kr2cWYz4fudPVzcDof9FW5pG0QWg=.864b9f00-8bbc-4aee-a6f0-14a8a7fe74ec@github.com> On Mon, 10 Nov 2025 16:02:53 GMT, Kelvin Nilsen wrote: >> Fix style of include directive for improved consistency and compatibility with GraalVM conventions. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove unneeded include I also think it is trivial, so you can integrate as soon as testing is green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28219#issuecomment-3512640187 From wkemper at openjdk.org Mon Nov 10 18:38:29 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Nov 2025 18:38:29 GMT Subject: RFR: 8371573: Shenandoah: Remove unnecessary include after JDK-8351091 [v2] In-Reply-To: References: Message-ID: <-LZ5OLqGd10veGNlWuFRxHOPcRWvmW1KtEwSolAs-pI=.b6870b16-2c58-4ee8-bdae-9be8d64ddb35@github.com> On Mon, 10 Nov 2025 16:02:53 GMT, Kelvin Nilsen wrote: >> Fix style of include directive for improved consistency and compatibility with GraalVM conventions. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove unneeded include Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28219#pullrequestreview-3444620341 From kdnilsen at openjdk.org Mon Nov 10 18:54:44 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 18:54:44 GMT Subject: Integrated: 8371573: Shenandoah: Remove unnecessary include after JDK-8351091 In-Reply-To: References: Message-ID: On Mon, 10 Nov 2025 15:32:03 GMT, Kelvin Nilsen wrote: > Fix style of include directive for improved consistency and compatibility with GraalVM conventions. This pull request has now been integrated. Changeset: 43afce54 Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/43afce54a7ecbd124f68f1f32d718f08b24ca61a Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8371573: Shenandoah: Remove unnecessary include after JDK-8351091 Reviewed-by: wkemper, shade ------------- PR: https://git.openjdk.org/jdk/pull/28219 From duke at openjdk.org Mon Nov 10 19:10:46 2025 From: duke at openjdk.org (duke) Date: Mon, 10 Nov 2025 19:10:46 GMT Subject: RFR: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space [v4] In-Reply-To: References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: On Fri, 7 Nov 2025 21:51:22 GMT, Rui Li wrote: >> Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. >> >> Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. >> >> Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. >> >> >> >> public class TestLargeObjectAlignmentDeterministic { >> >> static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); >> static final int NODE_COUNT = Integer.getInteger("nodes", 10000); >> static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); >> >> static Object[] objects; >> >> public static void main(String[] args) throws Exception { >> objects = new Object[SLABS_COUNT]; >> >> for (int i = 0; i < SLABS_COUNT; i++) { >> objects[i] = createSome(); >> } >> } >> >> public static Object createSome() { >> List result = new ArrayList(); >> for (int c = 0; c < NODE_COUNT; c++) { >> result.add(new Integer(c)); >> } >> return result; >> } >> >> } > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Bump maxMem to 4g @rgithubli Your change (at version 142d00e2d5a0f4ea61348551971272aa756b6727) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28167#issuecomment-3513472725 From wkemper at openjdk.org Mon Nov 10 21:57:11 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Nov 2025 21:57:11 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v13] In-Reply-To: References: Message-ID: <7ReicCCT86J3wPR8FzTdOfaKOvxYbj1dhXNkbDNvduU=.b178f2f0-a241-46a2-a201-cc5394706621@github.com> On Mon, 10 Nov 2025 14:39:09 GMT, Kelvin Nilsen wrote: >> The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. >> >> However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. >> >> This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 58 commits: > > - Fix mistaken merge resolution > - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates > > The resulting fastdebug build has 64 failures. I need to debug these. > Probably introduced by improper resolution of merge conflicts > - fix error in merge conflict resolution > - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates > - rework CompressedClassSpaceSizeinJmapHeap.java > - fix errors in CompressedClassSpaceSizeInJmapHeap.java > - Add debug instrumentation to CompressedClassSpaceSizeInJmapHeap.java > - fix two indexing bugs > - add an assert to detect suspected bug > - Remove debug scaffolding > - ... and 48 more: https://git.openjdk.org/jdk/compare/c272aca8...16cd6f8a Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 165: > 163: } > 164: > 165: inline size_t ShenandoahHeapRegion::get_live_data_words(ShenandoahMarkingContext* ctx, size_t index) const { Why do we want to change this signature? If `index` is always `this->_index` why go through the trouble to pass a member field to a member function on the same instance? I have a similar sentiment about passing `ShenandoahMarkingContext` through the function. Should we have a member `ShenandoahMarkingContext* _marking_context`? Changing this signature creates a lot of noise on the PR and it's not clear to me why we would do this. ------------- PR Review: https://git.openjdk.org/jdk/pull/24319#pullrequestreview-3445303571 PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2512084103 From kdnilsen at openjdk.org Mon Nov 10 21:58:02 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 21:58:02 GMT Subject: RFR: 8370039: GenShen: array copy SATB barrier improvements In-Reply-To: References: Message-ID: On Thu, 6 Nov 2025 20:52:11 GMT, William Kemper wrote: > When an array copy happens concurrently with old and young marking, Shenandoah's generational mode walks over the array twice. This is unnecessary and increases the workload for marking threads. It also has been unconditionally enqueuing old references during a young mark. This is also unnecessary and also increases marking workload. Finally, the barrier went through a somewhat complicated decision process based on affiliation of the region where the array resides. However, the barrier must consider the affiliation of objects that are pointed at by array elements. src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 413: > 411: } > 412: shenandoah_assert_forwarded_except(elem_ptr, obj, _heap->cancelled_gc()); > 413: ShenandoahHeap::atomic_update_oop(fwd, elem_ptr, o); Is the comment at the start of arraycopy_work() still relevant? The description of the PR suggests that we will no longer call arraycopy_work() twice, but I'm not sure I fully understand the context of that statement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28183#discussion_r2512087213 From xpeng at openjdk.org Mon Nov 10 22:11:04 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 10 Nov 2025 22:11:04 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v12] In-Reply-To: References: Message-ID: <_eaC5_CJvs4AFnvs2M71p2oc6z-qUAXDZTcIFRR0o60=.5e6cb939-5422-46e2-89ff-f18ea51c1504@github.com> > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of mutator memory allocation to improve heap lock contention, at vey high level, here is how it works: > * ShenandoahFreeSet holds a N (default to 13) number of ShenandoahHeapRegion* which are used by mutator threads for regular object allocations, they are called shared regions/directly allocatable regions, which are stored in PaddedEnd data structure(padded array). > * Each mutator thread will be assigned one of the directly allocatable regions, the thread will try to allocate in the directly allocatable region with CAS atomic operation, if fails will try 2 more consecutive directly allocatable regions in the array storing directly allocatable region. > * If mutator thread fails after trying 3 directly allocatable regions, it will: > * Take heap lock > * Try to retire the directly allocatable regions which are ready to retire. > * Iterator mutator partition and allocate directly allocatable regions and store to the padded array if any need to be retired. > * Satisfy mutator allocation request if possible. > > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, I have done many tests, here are some of them: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores: > Openjdk TIP: > > [ec2-user at ip-172-31-42-91 jdk]$ ./master-jdk/bin/java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms4G -Xmx4G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > ===== DaCapo tail latency, metered full smoothing: 50% 131684 usec, 90% 200192 usec, 99% 211369 usec, 99.9% 212517 usec, 99.99% 213043 usec, max 235289 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 1568 usec, 90% 36101 usec, 99% 42172 usec, 99.9% 42928 usec, 99.99% 43100 usec, max 43305 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 52644 usec, 90% 124393 usec, 99% 137711 usec, 99.9% 139355 usec, 99.99% 139749 usec, max 146722 usec, measured over 524288 events ====... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Remove ShenandoahDirectlyAllocatableRegionAffinity and simply use thread local instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/b1c3b90f..f3fa45c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=10-11 Stats: 75 lines in 2 files changed: 8 ins; 65 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From wkemper at openjdk.org Mon Nov 10 23:20:03 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Nov 2025 23:20:03 GMT Subject: RFR: 8370039: GenShen: array copy SATB barrier improvements In-Reply-To: References: Message-ID: <3QMYD6Upsc5-r4-RGS320hbSFc250n5p-YJRAgOOktQ=.5e440d61-3da8-42ff-9e7c-e3d57bbcc426@github.com> On Mon, 10 Nov 2025 21:55:37 GMT, Kelvin Nilsen wrote: >> When an array copy happens concurrently with old and young marking, Shenandoah's generational mode walks over the array twice. This is unnecessary and increases the workload for marking threads. It also has been unconditionally enqueuing old references during a young mark. This is also unnecessary and also increases marking workload. Finally, the barrier went through a somewhat complicated decision process based on affiliation of the region where the array resides. However, the barrier must consider the affiliation of objects that are pointed at by array elements. > > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 413: > >> 411: } >> 412: shenandoah_assert_forwarded_except(elem_ptr, obj, _heap->cancelled_gc()); >> 413: ShenandoahHeap::atomic_update_oop(fwd, elem_ptr, o); > > Is the comment at the start of arraycopy_work() still relevant? The description of the PR suggests that we will no longer call arraycopy_work() twice, but I'm not sure I fully understand the context of that statement. We used to call this twice (as `arraycopy_marking` _for each object_ if young and old marking were both in progress.. The comment at the start is trying to explain how the `arraycopy_work` could be called with different template parameters depending on the `gc_state`. It is confusingly worded, but I believe the comment is correct and the code is wrong here. The way the code is in the PR, it would effectively turn off the old SATB during young evacuation and young update-refs, which is not what we want. Good catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28183#discussion_r2512294938 From kdnilsen at openjdk.org Mon Nov 10 23:20:06 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 23:20:06 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v2] In-Reply-To: References: <7zV-fLvjb-4gBVTppg4XTXPNxEheqLfxB0v_WONuinI=.22775b58-42cf-499e-9007-fad07118217d@github.com> Message-ID: <6Z8jobEfFtS_xAGN4C_C0dtjfHvwFbmgbp2pDKyGewY=.c5964096-3c8b-42cb-b121-9cbe54fa30da@github.com> On Fri, 3 Oct 2025 20:15:16 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix idiosyncratic formatting > > src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 180: > >> 178: const HeapWord* limit) const; >> 179: >> 180: // Return the last marked address in the range [limit, addr], or addr+1 if none found. > > Symmetry would have preferred `(limit, addr]` as the range with `limit` if none found. > However, may be usage of this method prefers the present shape? Yeah. The reason for the asymmetry is that forward-looking limit may not be a legitimate address (may be end of heap), whereas backward looking limit is a legitimate address. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512294911 From kdnilsen at openjdk.org Mon Nov 10 23:33:07 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 23:33:07 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v2] In-Reply-To: References: <7zV-fLvjb-4gBVTppg4XTXPNxEheqLfxB0v_WONuinI=.22775b58-42cf-499e-9007-fad07118217d@github.com> Message-ID: On Fri, 3 Oct 2025 20:16:50 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix idiosyncratic formatting > > src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 129: > >> 127: inline idx_t get_prev_bit_impl(idx_t l_index, idx_t r_index) const; >> 128: >> 129: inline idx_t get_next_one_offset(idx_t l_index, idx_t r_index) const; > > Please document analogous to line 131. Sorry. I overlooked this request in prior response. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512315966 From duke at openjdk.org Mon Nov 10 23:33:21 2025 From: duke at openjdk.org (Nityanand Rai) Date: Mon, 10 Nov 2025 23:33:21 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking [v3] In-Reply-To: References: Message-ID: <-0iMsHeZnk_Ld_6D9zCBNFVcXi9rIq9S0NmmYEgqb0I=.ffb1591a-83a7-47be-86ff-a5646b51e3e1@github.com> > Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. Nityanand Rai has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28204/files - new: https://git.openjdk.org/jdk/pull/28204/files/59f7a0d0..1c85da72 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28204&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28204&range=01-02 Stats: 5 lines in 1 file changed: 2 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28204.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28204/head:pull/28204 PR: https://git.openjdk.org/jdk/pull/28204 From kdnilsen at openjdk.org Mon Nov 10 23:45:39 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 23:45:39 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v12] In-Reply-To: References: Message-ID: <3LPIbWVGQhvFzoPWfZEVXdj8N-6bm_x3rqat4nZfKxY=.fd6e8854-3da1-4e6e-a7b8-2fdc4e8b1bb6@github.com> > When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. > > The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Add two comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27353/files - new: https://git.openjdk.org/jdk/pull/27353/files/9f629a2a..2dc7e98d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27353&range=10-11 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27353/head:pull/27353 PR: https://git.openjdk.org/jdk/pull/27353 From kdnilsen at openjdk.org Mon Nov 10 23:45:41 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Nov 2025 23:45:41 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v2] In-Reply-To: References: <7zV-fLvjb-4gBVTppg4XTXPNxEheqLfxB0v_WONuinI=.22775b58-42cf-499e-9007-fad07118217d@github.com> Message-ID: <6EUEYLYHG_8Ha50jFIm1KfbEE-hSCfyMRyC8BhlfLmM=.b1407843-9d56-4dad-9693-8937d814fa51@github.com> On Fri, 3 Oct 2025 20:18:46 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix idiosyncratic formatting > > src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 131: > >> 129: inline idx_t get_next_one_offset(idx_t l_index, idx_t r_index) const; >> 130: >> 131: // Search for last one in the range [l_index, r_index). Return r_index if not found. > > Symmetry arguments wrt spec for `get_next_one_offset` may have preferred range `(l_index, r_index]`, returning `l_index` if none found. May be its (transitive) usage prefers this shape? (See similar comment at line 180.) See comment above regarding asymmetry. It is by design, due to shape of the data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512329320 From wkemper at openjdk.org Mon Nov 10 23:59:33 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Nov 2025 23:59:33 GMT Subject: RFR: 8370039: GenShen: array copy SATB barrier improvements [v2] In-Reply-To: References: Message-ID: > When an array copy happens concurrently with old and young marking, Shenandoah's generational mode walks over the array twice. This is unnecessary and increases the workload for marking threads. It also has been unconditionally enqueuing old references during a young mark. This is also unnecessary and also increases marking workload. Finally, the barrier went through a somewhat complicated decision process based on affiliation of the region where the array resides. However, the barrier must consider the affiliation of objects that are pointed at by array elements. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - SATB barrier for old must be independent of young collection gc state - We can also filter out old when striclty marking young ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28183/files - new: https://git.openjdk.org/jdk/pull/28183/files/3086fd30..ddd3d6d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28183&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28183&range=00-01 Stats: 67 lines in 5 files changed: 43 ins; 4 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/28183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28183/head:pull/28183 PR: https://git.openjdk.org/jdk/pull/28183 From wkemper at openjdk.org Tue Nov 11 00:33:36 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Nov 2025 00:33:36 GMT Subject: RFR: 8370039: GenShen: array copy SATB barrier improvements [v3] In-Reply-To: References: Message-ID: <4pRORBaXYXwyCJyUp3BKA4I8bHlTfkfNldK9EnDJvZw=.b0a53f9b-a9a0-4c75-a823-7cf82f69a40b@github.com> > When an array copy happens concurrently with old and young marking, Shenandoah's generational mode walks over the array twice. This is unnecessary and increases the workload for marking threads. It also has been unconditionally enqueuing old references during a young mark. This is also unnecessary and also increases marking workload. Finally, the barrier went through a somewhat complicated decision process based on affiliation of the region where the array resides. However, the barrier must consider the affiliation of objects that are pointed at by array elements. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Revert "We can also filter out old when striclty marking young" This reverts commit c53c4f23f4401785e1049494b6c4e4b92f9a5701. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28183/files - new: https://git.openjdk.org/jdk/pull/28183/files/ddd3d6d9..ba71462d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28183&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28183&range=01-02 Stats: 49 lines in 3 files changed: 0 ins; 34 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/28183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28183/head:pull/28183 PR: https://git.openjdk.org/jdk/pull/28183 From xpeng at openjdk.org Tue Nov 11 00:58:47 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 11 Nov 2025 00:58:47 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v13] In-Reply-To: References: Message-ID: > Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of mutator memory allocation to improve heap lock contention, at vey high level, here is how it works: > * ShenandoahFreeSet holds a N (default to 13) number of ShenandoahHeapRegion* which are used by mutator threads for regular object allocations, they are called shared regions/directly allocatable regions, which are stored in PaddedEnd data structure(padded array). > * Each mutator thread will be assigned one of the directly allocatable regions, the thread will try to allocate in the directly allocatable region with CAS atomic operation, if fails will try 2 more consecutive directly allocatable regions in the array storing directly allocatable region. > * If mutator thread fails after trying 3 directly allocatable regions, it will: > * Take heap lock > * Try to retire the directly allocatable regions which are ready to retire. > * Iterator mutator partition and allocate directly allocatable regions and store to the padded array if any need to be retired. > * Satisfy mutator allocation request if possible. > > > I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, I have done many tests, here are some of them: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores: > Openjdk TIP: > > [ec2-user at ip-172-31-42-91 jdk]$ ./master-jdk/bin/java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms4G -Xmx4G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered full smoothing" > ===== DaCapo tail latency, metered full smoothing: 50% 131684 usec, 90% 200192 usec, 99% 211369 usec, 99.9% 212517 usec, 99.99% 213043 usec, max 235289 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 1568 usec, 90% 36101 usec, 99% 42172 usec, 99.9% 42928 usec, 99.99% 43100 usec, max 43305 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 52644 usec, 90% 124393 usec, 99% 137711 usec, 99.9% 139355 usec, 99.99% 139749 usec, max 146722 usec, measured over 524288 events ====... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Renaming directly allocatable region to alloc region ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26171/files - new: https://git.openjdk.org/jdk/pull/26171/files/f3fa45c6..67414bba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=11-12 Stats: 74 lines in 12 files changed: 0 ins; 0 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171 From duke at openjdk.org Tue Nov 11 03:38:11 2025 From: duke at openjdk.org (Rui Li) Date: Tue, 11 Nov 2025 03:38:11 GMT Subject: Integrated: 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space In-Reply-To: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> References: <4hAMzlEVTLb91k4l8Hd2ysUFx7FEe2erCAB_ReeHU2E=.9cae38e5-261a-499f-aee9-770775c02708@github.com> Message-ID: On Thu, 6 Nov 2025 00:21:55 GMT, Rui Li wrote: > Sporadic failures were observed for TestLargeObjectAlignment.java#generational. The current theory is that jtreg deafult heap size on the reporter's machines is too small, and the randomness in test just sometimes created a huge heap larger than what the test had. > > Did a calculation for the worst case (see the code snippet at the end - it removes the Random in the original test and always allocates the array to full) and the test needs at least 2g. Initiating 3g heap for safety to reduce the noise. > > Also use the test to compare between Shenandoah vs GenShen: on my laptop (Mac M3), Shen failed at 2150m Xmx, GenShen could pass Xmx2150m and failed at Xmx2050m (step: 50m), so GenShen isn't worse, it's actually better. The reported GenShen failure observation probably came from the Random. > > > > public class TestLargeObjectAlignmentDeterministic { > > static final int SLABS_COUNT = Integer.getInteger("slabs", 10000); > static final int NODE_COUNT = Integer.getInteger("nodes", 10000); > static final long TIME_NS = 1000L * 1000L * Integer.getInteger("timeMs", 5000); > > static Object[] objects; > > public static void main(String[] args) throws Exception { > objects = new Object[SLABS_COUNT]; > > for (int i = 0; i < SLABS_COUNT; i++) { > objects[i] = createSome(); > } > } > > public static Object createSome() { > List result = new ArrayList(); > for (int c = 0; c < NODE_COUNT; c++) { > result.add(new Integer(c)); > } > return result; > } > > } This pull request has now been integrated. Changeset: e1c95260 Author: Rui Li Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/e1c952608d61c6c74c3fa4d00789390f3a789de4 Stats: 11 lines in 1 file changed: 2 ins; 1 del; 8 mod 8361339: Test gc/shenandoah/TestLargeObjectAlignment.java#generational fails on macOS aarch64 with OOM: Java heap space Reviewed-by: shade, syan ------------- PR: https://git.openjdk.org/jdk/pull/28167 From duke at openjdk.org Tue Nov 11 06:12:28 2025 From: duke at openjdk.org (Zihao Lin) Date: Tue, 11 Nov 2025 06:12:28 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v11] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'openjdk:master' into 8344116 - fix conflict - Merge master - remove C2AccessValuePtr - fix assert - add more assert - rid of access.addr().type() - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - ... and 3 more: https://git.openjdk.org/jdk/compare/76a1109d...42b17827 ------------- Changes: https://git.openjdk.org/jdk/pull/24258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=10 Stats: 230 lines in 18 files changed: 33 ins; 55 del; 142 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From duke at openjdk.org Tue Nov 11 09:39:09 2025 From: duke at openjdk.org (Harshit470250) Date: Tue, 11 Nov 2025 09:39:09 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 05:11:47 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge master > - update make_barrier_type > - Merge branch 'openjdk:master' into new_pr > - Merge branch 'openjdk:master' into new_pr > - My chages @iwanowww Can you also take a look, as you have reviewed the previous related change. #21782 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3515880656 From shade at openjdk.org Tue Nov 11 09:41:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Nov 2025 09:41:03 GMT Subject: RFR: 8371284: GenShen: Avoid unnecessary card marking [v3] In-Reply-To: <-0iMsHeZnk_Ld_6D9zCBNFVcXi9rIq9S0NmmYEgqb0I=.ffb1591a-83a7-47be-86ff-a5646b51e3e1@github.com> References: <-0iMsHeZnk_Ld_6D9zCBNFVcXi9rIq9S0NmmYEgqb0I=.ffb1591a-83a7-47be-86ff-a5646b51e3e1@github.com> Message-ID: On Mon, 10 Nov 2025 23:33:21 GMT, Nityanand Rai wrote: >> Exclude young-young, old-old and honor UseCondCardMark in dirty card marking. > > Nityanand Rai has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Aleksey Shipil?v Looks fine. How's performance? I suppose this is a frequent path, and `is_in_young` checks are not necessarily free. ------------- PR Review: https://git.openjdk.org/jdk/pull/28204#pullrequestreview-3447143143 From kdnilsen at openjdk.org Tue Nov 11 17:26:49 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Nov 2025 17:26:49 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v5] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: - Merge branch 'share-collector-reserves' of https://github.com/kdnilsen/jdk into share-collector-reserves - fix whitespace - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Fix assert_bounds() assertions when old_trash_not_in_bounds - respond to reviewer feedback - Keep gc cycle times with heuristics for the relevant generation - Fix whitespace - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Merge branch 'share-collector-reserves' of https://github.com/kdnilsen/jdk into share-collector-reserves - make old gc more aggresive - ... and 30 more: https://git.openjdk.org/jdk/compare/9bc23608...f25d5442 ------------- Changes: https://git.openjdk.org/jdk/pull/25357/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=04 Stats: 1259 lines in 24 files changed: 726 ins; 293 del; 240 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Tue Nov 11 17:30:26 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Nov 2025 17:30:26 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v6] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove debug instrumentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/f25d5442..37b98e5f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=04-05 Stats: 9 lines in 1 file changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From duke at openjdk.org Tue Nov 11 18:31:36 2025 From: duke at openjdk.org (Rui Li) Date: Tue, 11 Nov 2025 18:31:36 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO Message-ID: Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. Had to expand `ShenandoahEvacReserve` range from `range(1,100)` to `range(0,100)` because when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L41 The issue is surfaced now because `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does. Testing: jtreg gc. GHA pending. ------------- Commit messages: - 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO Changes: https://git.openjdk.org/jdk/pull/28242/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28242&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371381 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28242.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28242/head:pull/28242 PR: https://git.openjdk.org/jdk/pull/28242 From kdnilsen at openjdk.org Tue Nov 11 18:43:07 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Nov 2025 18:43:07 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v2] In-Reply-To: References: <7zV-fLvjb-4gBVTppg4XTXPNxEheqLfxB0v_WONuinI=.22775b58-42cf-499e-9007-fad07118217d@github.com> Message-ID: On Fri, 3 Oct 2025 20:19:44 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix idiosyncratic formatting > > src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 134: > >> 132: inline idx_t get_prev_one_offset(idx_t l_index, idx_t r_index) const; >> 133: >> 134: void clear_large_range(idx_t beg, idx_t end); > > documentation comment. I've added a comment here as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2515271426 From kdnilsen at openjdk.org Tue Nov 11 19:03:10 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Nov 2025 19:03:10 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v13] In-Reply-To: <7ReicCCT86J3wPR8FzTdOfaKOvxYbj1dhXNkbDNvduU=.b178f2f0-a241-46a2-a201-cc5394706621@github.com> References: <7ReicCCT86J3wPR8FzTdOfaKOvxYbj1dhXNkbDNvduU=.b178f2f0-a241-46a2-a201-cc5394706621@github.com> Message-ID: On Mon, 10 Nov 2025 21:54:14 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 58 commits: >> >> - Fix mistaken merge resolution >> - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates >> >> The resulting fastdebug build has 64 failures. I need to debug these. >> Probably introduced by improper resolution of merge conflicts >> - fix error in merge conflict resolution >> - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates >> - rework CompressedClassSpaceSizeinJmapHeap.java >> - fix errors in CompressedClassSpaceSizeInJmapHeap.java >> - Add debug instrumentation to CompressedClassSpaceSizeInJmapHeap.java >> - fix two indexing bugs >> - add an assert to detect suspected bug >> - Remove debug scaffolding >> - ... and 48 more: https://git.openjdk.org/jdk/compare/c272aca8...16cd6f8a > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 165: > >> 163: } >> 164: >> 165: inline size_t ShenandoahHeapRegion::get_live_data_words(ShenandoahMarkingContext* ctx, size_t index) const { > > Why do we want to change this signature? If `index` is always `this->_index` why go through the trouble to pass a member field to a member function on the same instance? I have a similar sentiment about passing `ShenandoahMarkingContext` through the function. Should we have a member `ShenandoahMarkingContext* _marking_context`? Changing this signature creates a lot of noise on the PR and it's not clear to me why we would do this. Good call out. Am willing to back this change out. Motivation for this change is that we were seeing some performance regression in this PR. At first, I thought this was due to miscomputation of get_live_data_words(), but I confirmed through further testing that the results from get_live_data_words() were the same before and after this PR. So I concluded that the "explanation" for performance regression is that it now takes longer for us to compute get_live_data_words(). The original implementation was: return AtomicAccess::load(&_live_data) The new implementation added: Find the marking context by fetching this from ShenandoahHeap::heap() Find tams by consulting the marking context with region, which has to indirection through region to find index I found that passing this information into the function rather than having the function recompute it brought us back to par with performance of master. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2515338461 From wkemper at openjdk.org Tue Nov 11 19:15:04 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Nov 2025 19:15:04 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v12] In-Reply-To: <3LPIbWVGQhvFzoPWfZEVXdj8N-6bm_x3rqat4nZfKxY=.fd6e8854-3da1-4e6e-a7b8-2fdc4e8b1bb6@github.com> References: <3LPIbWVGQhvFzoPWfZEVXdj8N-6bm_x3rqat4nZfKxY=.fd6e8854-3da1-4e6e-a7b8-2fdc4e8b1bb6@github.com> Message-ID: On Mon, 10 Nov 2025 23:45:39 GMT, Kelvin Nilsen wrote: >> When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. >> >> The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Add two comments Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27353#pullrequestreview-3449555306 From xpeng at openjdk.org Tue Nov 11 20:02:13 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 11 Nov 2025 20:02:13 GMT Subject: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v5] In-Reply-To: <5YSA3F88CmDDv09M2KOm_EFNDh_09LPO2WMrgETfupI=.cc658dc9-829e-41a5-ad76-393d3eb0f75a@github.com> References: <5YSA3F88CmDDv09M2KOm_EFNDh_09LPO2WMrgETfupI=.cc658dc9-829e-41a5-ad76-393d3eb0f75a@github.com> Message-ID: On Wed, 5 Nov 2025 22:17:54 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 2280: >> >>> 2278: } >>> 2279: >>> 2280: class DirectAllocatableRegionRefillClosure final : public ShenandoahHeapRegionIterationClosure { >> >> I don't think we want to subclass ShenandoahHeapRegionIterationClosure here. That iterates over all 2000 regions. We only want to iterate over the 13 Directly allocatable regions. Maybe we don't even need/want a closure iterator here. We could just write a loop. > > I think we should be borrowing from this code when replenishing the regions that are ready to be retired: > > if (_partitions.alloc_from_left_bias(ShenandoahFreeSetPartitionId::Mutator)) { > // Allocate from low to high memory. This keeps the range of fully empty regions more tightly packed. > // Note that the most recently allocated regions tend not to be evacuated in a given GC cycle. So this > // tends to accumulate "fragmented" uncollected regions in high memory. > ShenandoahLeftRightIterator iterator(&_partitions, ShenandoahFreeSetPartitionId::Mutator); > return allocate_from_regions(iterator, req, in_new_region); > } > > // Allocate from high to low memory. This preserves low memory for humongous allocations. > ShenandoahRightLeftIterator iterator(&_partitions, ShenandoahFreeSetPartitionId::Mutator); > return allocate_from_regions(iterator, req, in_new_region); Thanks, the closure I am using here returns a bool indicating when it needs to break out the iteration. We can get rid of the ShenandoahHeapRegionIterationClosure here, I'll replace it with simple loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r2515562893 From kdnilsen at openjdk.org Tue Nov 11 21:10:18 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Nov 2025 21:10:18 GMT Subject: Integrated: 8358735: GenShen: block_start() may be incorrect after class unloading In-Reply-To: References: Message-ID: <79B6TCe0QZO2iP37g1U6jYNe7uMc3vKkNhgVVKOtWhM=.304e2627-fd3c-4077-8af5-2aaf30851d1b@github.com> On Wed, 17 Sep 2025 20:12:49 GMT, Kelvin Nilsen wrote: > When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. > > The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. This pull request has now been integrated. Changeset: 8531fa14 Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/8531fa146be1da5e96c0f23091882a27c67d7893 Stats: 893 lines in 12 files changed: 851 ins; 10 del; 32 mod 8358735: GenShen: block_start() may be incorrect after class unloading Co-authored-by: Y. Srinivas Ramakrishna Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/27353 From kdnilsen at openjdk.org Tue Nov 11 22:07:05 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Nov 2025 22:07:05 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild In-Reply-To: References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: On Fri, 31 Oct 2025 01:08:25 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 240: >> >>> 238: // Return available_in assuming caller does not hold the heap lock. In production builds, available is >>> 239: // returned without acquiring the lock. In debug builds, the global heap lock is acquired in order to >>> 240: // enforce a consistency assert. >> >> Can the comment be simplified to: >> >> >> // Return bytes `available` in the given `partition` >> // while holding the `rebuild_lock`. >> >> >> Don't say anything about the heap lock in the API comment. Rather, in the part that is `ifdef ASSERT` where you take the heap lock (line ~244), say: >> >> // Acquire the heap lock to get a consistent >> // snapshot to check assert. > > As I write this, I realize that in the most general case where two threads may call these API's independently in a fastdebug build, you could theoretically get into a deadlock because they attempted to acquire locks in different orders (this possibility exists -- statically -- only in the fastdebug builds). > > The general MuteLocker machinery has ranked mutexes to avoid such situations through static ranking and checks while acquiring locks (in debug builds as a way of potentially catching such situations and flagging them). > > With such ranking though this code would assert because the locks are acquired in different order between here and elsewhere. > > In product builds you are fine because the rebuild lock acts as a "leaf lock" (in hotspot parlance). But there seems to be a definite possibility of deadlock in debug builds if/when the rebuild is attempted by one thread while another checks available and attempts to acquire the heap lock to check the assertion. You could solve it by acquiring the heap lock before calling the work method where the assertion check is done. > > However, I'd be much more comfortable if we used some form of lock rank framework, unless it was utterly impossible to do so for some reason. (Here it was easy to spot the lock order inversion because it was in the code. Of course, if a debug build deadlocked you would also figure out the same, but having lock ordering gives you a quick and easy way to verify if there's potential for trouble.) > > Not sure of the history of ShenandoahLock or why the parallel infra to MutexLocker was introduced (perhaps for allowing some performance/tunability), but might be worthwhile to see if we want to build lock rank checks in for robustness/maintainability. I'm coming back to this PR after working on others. Thanks for your comments. This is a good catch. I know better than to do that! Sorry. My intention was to rank-order the locks. Whenever multiple locks are held, it should be in this order: first acquire the global heap lock In nested context, acquire the rebuild_lock Any thread that only acquires the global heap lock or only acquires the rebuild_lock will not deadlock. Multiple threads that acquire both locks will not deadlock because they acquire in the same order. The code you identified was definitely a problem because we were acquiring the two lock in the wrong order. I'm going to remove that assert and the lock associated with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27612#discussion_r2515961240 From kdnilsen at openjdk.org Tue Nov 11 22:12:05 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Nov 2025 22:12:05 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild In-Reply-To: References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: On Fri, 31 Oct 2025 00:36:57 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1158: >> >>> 1156: size_t young_cset_regions, old_cset_regions, first_old, last_old, num_old; >>> 1157: ShenandoahFreeSet* free_set = heap->free_set(); >>> 1158: ShenandoahRebuildLocker rebuild_locker(free_set->lock()); >> >> Should you not create a scope around lines 1158 to line 1167, since you don't want to hold the rebuild lock as soon as the rebuild is done (i.e. immediately following `finish_rebuild()`)? > > May be it doesn't matter, since no one else is running during a full gc who needs to query `available()`? I'll tighten up the context for the rebuild lock. I was thinking that set_mark_incomplete() and clear_cancelled_gc() would be "fast enough" that it wouldn't matter to hold the rebuild_lock this much longer, but I agree it is better to release the lock as soon as possible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27612#discussion_r2515969801 From kdnilsen at openjdk.org Tue Nov 11 22:18:22 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Nov 2025 22:18:22 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild In-Reply-To: References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: On Fri, 31 Oct 2025 00:23:53 GMT, Y. Srinivas Ramakrishna wrote: >> This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. >> >> This addresses a problem that results if available memory is probed while we are rebuilding the freeset. >> >> Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 426: > >> 424: >> 425: >> 426: ShenandoahRebuildLock* lock() { > > `rebuild_lock()` instead? Good suggestion. Making this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27612#discussion_r2515981364 From ysr at openjdk.org Tue Nov 11 22:24:00 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 11 Nov 2025 22:24:00 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 18:24:25 GMT, Rui Li wrote: > Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. > > Had to expand `ShenandoahEvacReserve` range from `range(1,100)` to `range(0,100)` because when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L41 The issue is surfaced now because `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does. > > Testing: jtreg gc. GHA pending. Reviewed the `ergo` flg setting macro changes. They look good to me. For the passive mode that requires allowing evac reserve of 0, I'll request @kdnilsen or @earthling-amzn to review. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28242#pullrequestreview-3450362860 From cslucas at openjdk.org Tue Nov 11 22:33:03 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 11 Nov 2025 22:33:03 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 18:24:25 GMT, Rui Li wrote: > Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. > > Had to expand `ShenandoahEvacReserve` range from `range(1,100)` to `range(0,100)` because when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L41 The issue is surfaced now because `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does. > > Testing: jtreg gc. GHA pending. Marked as reviewed by cslucas (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28242#pullrequestreview-3450386135 From xpeng at openjdk.org Tue Nov 11 22:42:10 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 11 Nov 2025 22:42:10 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 18:24:25 GMT, Rui Li wrote: > Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. > > Had to expand `ShenandoahEvacReserve` range from `range(1,100)` to `range(0,100)` because when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L41 The issue is surfaced now because `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does. > > Testing: jtreg gc. GHA pending. Marked as reviewed by xpeng (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28242#pullrequestreview-3450413840 From kdnilsen at openjdk.org Tue Nov 11 22:44:04 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Nov 2025 22:44:04 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild In-Reply-To: References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: On Fri, 31 Oct 2025 00:23:18 GMT, Y. Srinivas Ramakrishna wrote: >> This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. >> >> This addresses a problem that results if available memory is probed while we are rebuilding the freeset. >> >> Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 323: > >> 321: ShenandoahRegionPartitions _partitions; >> 322: >> 323: // This locks the rebuild process (in combination with the global heap lock) > > Explain the role of this & the global heap lock vis-a-vis the rebuild process. > > Also may be call it `_rebuild_lock`, rather than just `_lock`. Am changing the name. I will add discussion of the rank ordering of locks here as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27612#discussion_r2516042979 From wkemper at openjdk.org Tue Nov 11 23:09:02 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Nov 2025 23:09:02 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO In-Reply-To: References: Message-ID: On Tue, 11 Nov 2025 18:24:25 GMT, Rui Li wrote: > Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. > > Had to expand `ShenandoahEvacReserve` range from `range(1,100)` to `range(0,100)` because when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L41 The issue is surfaced now because `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does. > > Testing: jtreg gc. GHA pending. Changes requested by wkemper (Reviewer). Hmm, setting `ShenandoahEvacReserve` to zero will cause a divide by zero error in other modes (it will even cause a divide by zero error in the passive mode if `ShenandoahDegeneratedGC` is enabled). I'd rather contain the accommodation for this configuration to the passive mode, than accept values that will cause crashes. Could we "inline" the macro on line 41 and allow zero only for this special configuration (passive mode and only full gcs) by changing the "inlined" macro to use `SET_DEFAULT` still? src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 356: > 354: "regions) is also bounded by this parameter. In percents of " \ > 355: "total (young-generation) heap size.") \ > 356: range(0,100) \ If we allow this, somebody _will_ file a bug because `-XX:ShenandoahEvacReserve=0` will cause other modes to crash with divide by zero error. ------------- PR Review: https://git.openjdk.org/jdk/pull/28242#pullrequestreview-3450500792 PR Comment: https://git.openjdk.org/jdk/pull/28242#issuecomment-3519062971 PR Review Comment: https://git.openjdk.org/jdk/pull/28242#discussion_r2516100105 From duke at openjdk.org Tue Nov 11 23:44:02 2025 From: duke at openjdk.org (Rui Li) Date: Tue, 11 Nov 2025 23:44:02 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO In-Reply-To: References: Message-ID: <43plTkqggX_oA_fGW7bKG0Gk_K5nzo4XeCvxswOYl6A=.a1770f88-cfc2-40d4-aa59-ff8e9413faa6@github.com> On Tue, 11 Nov 2025 23:06:25 GMT, William Kemper wrote: >> Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. >> >> Had to expand `ShenandoahEvacReserve` range from `range(1,100)` to `range(0,100)` because when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L41 The issue is surfaced now because `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does. >> >> Testing: jtreg gc. GHA pending. > > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 356: > >> 354: "regions) is also bounded by this parameter. In percents of " \ >> 355: "total (young-generation) heap size.") \ >> 356: range(0,100) \ > > If we allow this, somebody _will_ file a bug because `-XX:ShenandoahEvacReserve=0` will cause other modes to crash with divide by zero error. OK, I'll keep the range as it is `(1,100)`, but instead of setting the default with ergo at here: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L40-L42 I'll just use `FLAG_SET_DEFAULT` to bypass the check? We wouldn't have the data source origin in passive mode which is only meant to be diagnostic, but that's better than causing problems in the regular mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28242#discussion_r2516180561 From duke at openjdk.org Tue Nov 11 23:55:02 2025 From: duke at openjdk.org (Rui Li) Date: Tue, 11 Nov 2025 23:55:02 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO In-Reply-To: <43plTkqggX_oA_fGW7bKG0Gk_K5nzo4XeCvxswOYl6A=.a1770f88-cfc2-40d4-aa59-ff8e9413faa6@github.com> References: <43plTkqggX_oA_fGW7bKG0Gk_K5nzo4XeCvxswOYl6A=.a1770f88-cfc2-40d4-aa59-ff8e9413faa6@github.com> Message-ID: On Tue, 11 Nov 2025 23:40:01 GMT, Rui Li wrote: >> src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 356: >> >>> 354: "regions) is also bounded by this parameter. In percents of " \ >>> 355: "total (young-generation) heap size.") \ >>> 356: range(0,100) \ >> >> If we allow this, somebody _will_ file a bug because `-XX:ShenandoahEvacReserve=0` will cause other modes to crash with divide by zero error. > > OK, I'll keep the range as it is `(1,100)`, but instead of setting the default with ergo at here: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L40-L42 I'll just use `FLAG_SET_DEFAULT` to bypass the check? We wouldn't have the data source origin in passive mode which is only meant to be diagnostic, but that's better than causing problems in the regular mode. Ah, missed your general comment above. I think we're talking about the same solution. I'll update the passive mode to use `FLAG_SET_DEFAULT`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28242#discussion_r2516213869 From duke at openjdk.org Wed Nov 12 00:22:39 2025 From: duke at openjdk.org (Rui Li) Date: Wed, 12 Nov 2025 00:22:39 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO [v2] In-Reply-To: References: Message-ID: > Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. > > Had to expand `ShenandoahEvacReserve` range from `range(1,100)` to `range(0,100)` because when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L41 The issue is surfaced now because `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does. > > Testing: jtreg gc. GHA pending. Rui Li has updated the pull request incrementally with one additional commit since the last revision: Use FLAG_SET_DEFAULT for ShenandoahEvacReserve in passive ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28242/files - new: https://git.openjdk.org/jdk/pull/28242/files/1a5a09bf..65fc9878 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28242&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28242&range=00-01 Stats: 8 lines in 2 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28242.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28242/head:pull/28242 PR: https://git.openjdk.org/jdk/pull/28242 From kdnilsen at openjdk.org Wed Nov 12 00:58:45 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 12 Nov 2025 00:58:45 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild [v2] In-Reply-To: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: > This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. > > This addresses a problem that results if available memory is probed while we are rebuilding the freeset. > > Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. Kelvin Nilsen has updated the pull request incrementally with six additional commits since the last revision: - update comment - Add documentation for _rebuild_lock - Hide rebuild_lock inside prepare_to_rebuild and finish_rebuild - Rename rebuild_lock() - Tighten up context for holding rebuild_lock - Remove ShenandoahFreeSet::FreeSetUnderConstruction sentinel value ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27612/files - new: https://git.openjdk.org/jdk/pull/27612/files/a6898392..091e23bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27612&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27612&range=00-01 Stats: 63 lines in 7 files changed: 18 ins; 26 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/27612.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27612/head:pull/27612 PR: https://git.openjdk.org/jdk/pull/27612 From kdnilsen at openjdk.org Wed Nov 12 00:58:45 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 12 Nov 2025 00:58:45 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild [v2] In-Reply-To: References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: On Tue, 11 Nov 2025 22:04:04 GMT, Kelvin Nilsen wrote: >> As I write this, I realize that in the most general case where two threads may call these API's independently in a fastdebug build, you could theoretically get into a deadlock because they attempted to acquire locks in different orders (this possibility exists -- statically -- only in the fastdebug builds). >> >> The general MuteLocker machinery has ranked mutexes to avoid such situations through static ranking and checks while acquiring locks (in debug builds as a way of potentially catching such situations and flagging them). >> >> With such ranking though this code would assert because the locks are acquired in different order between here and elsewhere. >> >> In product builds you are fine because the rebuild lock acts as a "leaf lock" (in hotspot parlance). But there seems to be a definite possibility of deadlock in debug builds if/when the rebuild is attempted by one thread while another checks available and attempts to acquire the heap lock to check the assertion. You could solve it by acquiring the heap lock before calling the work method where the assertion check is done. >> >> However, I'd be much more comfortable if we used some form of lock rank framework, unless it was utterly impossible to do so for some reason. (Here it was easy to spot the lock order inversion because it was in the code. Of course, if a debug build deadlocked you would also figure out the same, but having lock ordering gives you a quick and easy way to verify if there's potential for trouble.) >> >> Not sure of the history of ShenandoahLock or why the parallel infra to MutexLocker was introduced (perhaps for allowing some performance/tunability), but might be worthwhile to see if we want to build lock rank checks in for robustness/maintainability. > > I'm coming back to this PR after working on others. Thanks for your comments. > > This is a good catch. I know better than to do that! Sorry. > > My intention was to rank-order the locks. Whenever multiple locks are held, it should be in this order: > first acquire the global heap lock > In nested context, acquire the rebuild_lock > > Any thread that only acquires the global heap lock or only acquires the rebuild_lock will not deadlock. > > Multiple threads that acquire both locks will not deadlock because they acquire in the same order. > > The code you identified was definitely a problem because we were acquiring the two lock in the wrong order. I'm going to remove that assert and the lock associated with it. I've updated this comment to clarify the refined intent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27612#discussion_r2516294135 From wkemper at openjdk.org Wed Nov 12 01:01:02 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Nov 2025 01:01:02 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO [v2] In-Reply-To: References: <43plTkqggX_oA_fGW7bKG0Gk_K5nzo4XeCvxswOYl6A=.a1770f88-cfc2-40d4-aa59-ff8e9413faa6@github.com> Message-ID: On Tue, 11 Nov 2025 23:52:24 GMT, Rui Li wrote: >> OK, I'll keep the range as it is `(1,100)`, but instead of setting the default with ergo at here: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L40-L42 I'll just use `FLAG_SET_DEFAULT` to bypass the check? We wouldn't have the data source origin in passive mode which is only meant to be diagnostic, but that's better than causing problems in the regular mode. > > Ah, missed your general comment above. I think we're talking about the same solution. I'll update the passive mode to use `FLAG_SET_DEFAULT`. Yeah, that's what I was suggesting. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28242#discussion_r2516301969 From ysr at openjdk.org Wed Nov 12 01:25:26 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 12 Nov 2025 01:25:26 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v12] In-Reply-To: <3LPIbWVGQhvFzoPWfZEVXdj8N-6bm_x3rqat4nZfKxY=.fd6e8854-3da1-4e6e-a7b8-2fdc4e8b1bb6@github.com> References: <3LPIbWVGQhvFzoPWfZEVXdj8N-6bm_x3rqat4nZfKxY=.fd6e8854-3da1-4e6e-a7b8-2fdc4e8b1bb6@github.com> Message-ID: On Mon, 10 Nov 2025 23:45:39 GMT, Kelvin Nilsen wrote: >> When scanning a range of dirty cards within the GenShen remembered set, we need to find the object that spans the beginning of the left-most dirty card. The existing code is not reliable following class unloading. >> >> The new code uses the marking context when it is available to determine the location of live objects that reside below TAMS within each region. Above TAMS, all objects are presumed live and parsable. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Add two comments I am still going through this, but it seems as if there's a bunch of potential clean-ups to do here. I realize this has already been integrated; I can may be create a separate task to perhaps clean up some of the questions raised in this. Since there are quite a few comments now, I am going to flush these for now as a record of some of my thoughts and create a separate task in which I can see if these concerns are real and if the code can be somewhat simplified in a few places. Nothing specific to do here at this time in response to these stream-of-consciousness comments. src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.cpp line 70: > 68: assert(limit >= r->bottom(), "limit must be more than bottom"); > 69: assert(addr <= tams, "addr must be less than TAMS"); > 70: #endif Wouldn't it make more sense for these checks to move to the caller? It appears to me to be a leakage of abstraction to test these conditions here. We should be able to return the address for the marked bit found without interpreting the semantics of the bits themselves? src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 251: > 249: #endif > 250: > 251: HeapWord* right = MIN2(region->top(), end_range_of_interest); This is a safe thing to do, but doesn't the caller already establish the invariant that `region->top() >= end_range_of_interest` ? Can we just assert that instead of doing the clip/clamp? (And rename the formal parameter name from `end_range_of_interest` to `right`?) If so, we might also want to change the name of the formal parameter from `card_index` to `left`, and change it to be a (card-aligned) _heap_ _address_ for symmetry in the API. src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 252: > 250: > 251: HeapWord* right = MIN2(region->top(), end_range_of_interest); > 252: HeapWord* end_of_search_next = MIN2(right, tams); Does the caller ensure that `tams` is always valid (e.g. when `ctx == nullptr`)? src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 261: > 259: last_relevant_card_index--; > 260: } > 261: } I am not sure this is necessary. I'd just adjust the caller so this is ensured, avoiding this computation here. I think the caller has the last dirty card address and can just use that? I realize there's a bit of an issue with the address for `tams` not necessarily being card-aligned, but I think we should be able to deal with that in the caller as well once we remember that all of this always happens within a single region. (We can add such an assertion so that future adjustments do not render this assumption invalid if the code is changed/adjusted later.) src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 296: > 294: // for scanning the portion of the obj-array overlapping the dirty cards in > 295: // its cluster. > 296: // 3. Non-array objects are precisely dirtied by the interpreter and the compilers This should say "imprecisely" at line 296, I think? src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 410: > 408: ShenandoahCardCluster(ShenandoahDirectCardMarkRememberedSet* rs) { > 409: _rs = rs; > 410: _end_of_heap = ShenandoahHeap::heap()->end(); It is interesting why we should need end of heap but not start of heap. src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 657: > 655: > 656: > 657: // Given a card_index, return the starting address of the first live object in the heap The interface/API comment should describe a Dijkstra-like pre- and post-condition. i.e. if these conditions are satisfied, we'll give you this result. A description of what the method does (i.e. how it implements the functionality) belongs in the method implementation. Here, the two are conflated making the interface description unnecessarily long and convoluted. Sometimes this might indicate that the interface isn't as frugal as it should be. I might state this more succinctly as follows: // Given: // `card_index`: a valid index of a card in the old generation // `ctx` : a valid marking context for the old generation // `tams` : a valid top-at-mark-start address for the old generation // region in which the card_index is located // `end_range_of_interest` : an address in that region beyond which we need // not locate an object // // Returns: // the address of the object, if any, at the least address that overlaps with // the address range between the start of card_index and end_range_of_interest, // or nullptr if no such object exists in the given range. Once you look at the spec in this manner, you realize that the first and last arguments go together and define a suitable address range, and the second and third arguments go together and provide a context. This allows you to divide the assertion checking and call interface most optimally between caller and callee. src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.inline.hpp line 168: > 166: // The range of addresses to be scanned is empty > 167: continue; > 168: } When would this happen? We start off with dirty_l to the left of dirty_r, and with dirty_r having started at a card that would correspond to end_addr. I am not convinced this check is needed. I'd rather assert here that: ``` assert(left <= end_addr, "left should remain left of end_addr established above"); ``` src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.inline.hpp line 178: > 176: // if its head card is dirty. If not, (i.e. its head card is clean) > 177: // we'll call it each time we process a new dirty range on the object. > 178: // This is always the case for large object arrays, which are typically more Instead of `This is always the case ...` may be we can say `The latter is aways the case ...`? (Mea culpa for the old comment.) src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.inline.hpp line 181: > 179: // common. > 180: assert(ctx != nullptr || heap->old_generation()->is_parsable(), "Error"); > 181: HeapWord* p = _scc->first_object_start(dirty_l, ctx, tams, right); TODO: Wondering if we need to pass both tams and right, or just the max of the two. Will look at `first_object_start()`. ------------- PR Review: https://git.openjdk.org/jdk/pull/27353#pullrequestreview-3445628617 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2516048266 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512594318 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2516272583 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512602336 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512514800 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512532407 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2516276311 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512428956 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512489229 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512496763 From ysr at openjdk.org Wed Nov 12 01:25:27 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 12 Nov 2025 01:25:27 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v12] In-Reply-To: References: <3LPIbWVGQhvFzoPWfZEVXdj8N-6bm_x3rqat4nZfKxY=.fd6e8854-3da1-4e6e-a7b8-2fdc4e8b1bb6@github.com> Message-ID: On Tue, 11 Nov 2025 22:42:46 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Add two comments > > src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.cpp line 70: > >> 68: assert(limit >= r->bottom(), "limit must be more than bottom"); >> 69: assert(addr <= tams, "addr must be less than TAMS"); >> 70: #endif > > Wouldn't it make more sense for these checks to move to the caller? It appears to me to be a leakage of abstraction to test these conditions here. We should be able to return the address for the marked bit found without interpreting the semantics of the bits themselves? I notice that this is the case for the `get_next_...` version below as well; if my comment makes some sense, this can be addressed separately. Perhaps frugality in testing the conditions required us to site these assertions here, which I kind of understand, although the right thing in that case is to have the wrapper class, viz. `ShenandoahMarkingContext` make those checks before calling here. > src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 252: > >> 250: >> 251: HeapWord* right = MIN2(region->top(), end_range_of_interest); >> 252: HeapWord* end_of_search_next = MIN2(right, tams); > > Does the caller ensure that `tams` is always valid (e.g. when `ctx == nullptr`)? The caller seems to allow for `tams==nullptr` and `ctx==nullptr`. In that case wouldn't we get `end_of_search_next==nullptr`? > src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.inline.hpp line 181: > >> 179: // common. >> 180: assert(ctx != nullptr || heap->old_generation()->is_parsable(), "Error"); >> 181: HeapWord* p = _scc->first_object_start(dirty_l, ctx, tams, right); > > TODO: Wondering if we need to pass both tams and right, or just the max of the two. Will look at `first_object_start()`. Ah, looks like at this point we might potentially have `ctx == nullptr` and `tams == nullptr`. I wonder if we can do better here in terms of passing a sensible single `right` and dispense with passing `tams` entirely? Let me go back and look at the implementation of the method again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2516079967 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2516338478 PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2516323774 From ysr at openjdk.org Wed Nov 12 01:25:29 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 12 Nov 2025 01:25:29 GMT Subject: RFR: 8358735: GenShen: block_start() may be incorrect after class unloading [v2] In-Reply-To: References: <7zV-fLvjb-4gBVTppg4XTXPNxEheqLfxB0v_WONuinI=.22775b58-42cf-499e-9007-fad07118217d@github.com> Message-ID: On Fri, 3 Oct 2025 20:19:44 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> fix idiosyncratic formatting > > src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 134: > >> 132: inline idx_t get_prev_one_offset(idx_t l_index, idx_t r_index) const; >> 133: >> 134: void clear_large_range(idx_t beg, idx_t end); > > documentation comment. Nit: l_index <-> beg r_index <-> end in either comment or formal args to make them mutually consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27353#discussion_r2512337397 From kdnilsen at openjdk.org Wed Nov 12 02:01:45 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 12 Nov 2025 02:01:45 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v7] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Initialize young evac reserve based on soft-max-capacity ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/37b98e5f..35d3911d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=05-06 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From xpeng at openjdk.org Wed Nov 12 02:06:10 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Nov 2025 02:06:10 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code Message-ID: Current alloc request type enum: enum Type { _alloc_shared, // Allocate common, outside of TLAB _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB _alloc_cds, // Allocate for CDS _alloc_tlab, // Allocate TLAB _alloc_gclab, // Allocate GCLAB _alloc_plab, // Allocate PLAB _ALLOC_LIMIT }; With current design, we have to use switch statement resulting in unnecessary branches, for instance the function is_mutator_alloc: inline bool is_mutator_alloc() const { switch (_alloc_type) { case _alloc_tlab: case _alloc_shared: case _alloc_cds: return true; case _alloc_gclab: case _alloc_plab: case _alloc_shared_gc: return false; default: ShouldNotReachHere(); return false; } } In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: 1. Smaller value for mutator alloc, large value for gc alloc 2. Odd for lab, even number for non-lab Three functions have been simplified to one-line impl w/o branches in machine code: inline bool is_mutator_alloc() const { return _alloc_type <= _alloc_shared; } inline bool is_gc_alloc() const { return _alloc_type >= _alloc_shared_gc; } inline bool is_lab_alloc() const { return (_alloc_type & 1) == 1; } Test: - [x] TEST=hotspot_gc_shenandoah - [ ] Tier 1 ------------- Commit messages: - touch-up - Merge branch 'openjdk:master' into JDK-8371667 - re-shuffle the enum values and add comments - 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code Changes: https://git.openjdk.org/jdk/pull/28247/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371667 Stats: 56 lines in 1 file changed: 6 ins; 41 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/28247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28247/head:pull/28247 PR: https://git.openjdk.org/jdk/pull/28247 From epeter at openjdk.org Wed Nov 12 08:33:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 08:33:29 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 09:08:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: > > - review > - infinite loop in gvn fix > - renaming @rwestrel Sorry I dropped the review on this one for a long time :/ I left quite a few comments. But on the whole I'm really happy with the direction you are taking. It's getting much clearer. I would still see some more clear explanations/comments. That way, we can make our previously implicit assumptions even more explicit :) src/hotspot/share/opto/castnode.cpp line 47: > 45: Node* ConstraintCastNode::Identity(PhaseGVN* phase) { > 46: if (!_dependency.narrows_type()) { > 47: return this; Can you please add a code comment? I don't understand it right away :/ src/hotspot/share/opto/castnode.cpp line 153: > 151: if (!_dependency.narrows_type()) { > 152: return nullptr; > 153: } Interesting, we already check that at at least some of the use sites. If it turns out we already do it at all use sites, why not just assert? (maybe not possible or desirable, just an idea) A comment here would also be great. src/hotspot/share/opto/castnode.cpp line 277: > 275: > 276: CastIINode* CastIINode::pin_array_access_node() const { > 277: assert(depends_only_on_test(), "already pinned"); Would this not be more readable? Suggestion: assert(is_dependency_floating(), "already pinned"); src/hotspot/share/opto/castnode.cpp line 588: > 586: > 587: // If both inputs are not constant then, with the Cast pushed through the Add/Sub, the cast gets less precised types, > 588: // and the resulting Add/Sub's type is wider than that of the Cast before pushing. I find this long sentence a bit complicated to read. Can you reformulate and maybe break it into smaller sentences? It would also be good to explicitly say why that may require changing the dependency constraint. src/hotspot/share/opto/castnode.cpp line 615: > 613: // Widening the type of the Cast (to allow some commoning) causes the Cast to change how it can be optimized (if > 614: // type of its input is narrower than the Cast's type, we can't remove it to not loose the dependency). > 615: return make_with(in(1), wide_t, _dependency.widen_type_dependency()); Suggestion: return make_with(in(1), wide_t, _dependency.with_non_narrowing()); This may be clearer here, since non-narrowing prevents folding the cast away if the input is narrower. I like the code comment you already have though :) src/hotspot/share/opto/castnode.cpp line 625: > 623: if (!phase->C->post_loop_opts_phase()) { > 624: return this_type; > 625: } Honestly, I would prefer to see this "delay to post loop opts" to be done outside of `widen_type`. It would just make more sense there. What do you think? src/hotspot/share/opto/castnode.hpp line 46: > 44: // 1- and 2- are not always applied depending on what constraint are applied to the Cast: there are cases where 1- > 45: // and 2- apply, where neither 1- nor 2- apply and where one or the other apply. This class abstract away these > 46: // details. Can you spell it out a little more? Right now it feels a little bit like an "exercise for the reader". For each optimization, what is required of the constraints? I think that would help the reader. Equally: you could name why those constraints are required in the first place. Or is there some other place we could link to that already has those explanations? src/hotspot/share/opto/castnode.hpp line 53: > 51: _narrows_type(narrows_type), > 52: _desc(desc) { > 53: } Could you make the constructor private, and only expose the 4 static fields? That way, nobody comes to the strange idea to construct one of these themselves ;) src/hotspot/share/opto/castnode.hpp line 62: > 60: bool narrows_type() const { > 61: return _narrows_type; > 62: } Nits about naming: I would prefer `is_` for boolean queries. Otherwise, if I look at the names `floating` and `pinned_dependency`, I don't immediately know which one converts to a floating/non-floating, and which one is a boolean query. Maybe `pinned_dependency` should be renamed to `with_pinned_dependency`. src/hotspot/share/opto/castnode.hpp line 65: > 63: void dump_on(outputStream *st) const { > 64: st->print("%s", _desc); > 65: } Suggestion: bool narrows_type() const { return _narrows_type; } void dump_on(outputStream *st) const { st->print("%s", _desc); } Newline for consistency with surrounding code. src/hotspot/share/opto/castnode.hpp line 92: > 90: const bool _floating; // Does this Cast depends on its control input or is it pinned? > 91: const bool _narrows_type; // Does this Cast narrows the type i.e. if input type is narrower can it be removed? > 92: const char* _desc; I thought the hotspot convention was to usually put the fields first, at the top of the class? src/hotspot/share/opto/castnode.hpp line 104: > 102: // NonFloatingNarrowingDependency is used when an array access is no longer dependent on a single range check (range > 103: // check smearing for instance) > 104: // FloatingNonNarrowingDependency is used after loop opts when Cast nodes' types are widen so Casts that only differ Suggestion: // FloatingNonNarrowingDependency is used after loop opts when Cast nodes' types are widened so Casts that only differ src/hotspot/share/opto/castnode.hpp line 110: > 108: static const DependencyType FloatingNonNarrowingDependency; > 109: static const DependencyType NonFloatingNarrowingDependency; > 110: static const DependencyType NonFloatingNonNarrowingDependency; Why not put the example at each definition? Would prevent repeating the names :) It would be good if we could have this section earlier up, so the code comments of the `DependencyType` class and this form a unit. At least link them. `NonFloatingNonNarrowingDependency` example: can you spell out the why? What could go wrong otherwise? Would the node float back into the loop maybe? What's wrong with that? `NonFloatingNarrowingDependency` more detail would be helpful. I would like to know why non floating, and why narrowing? Because that's what these examples are for, right? `FloatingNonNarrowingDependency` ah, maybe that answers one of my questions further up somewhere. If we don't have narrowing, then we should not fold away the cast because of the type, right? I think if we spell out which optimizations require which constraints, that could help a lot here. src/hotspot/share/opto/castnode.hpp line 122: > 120: ShouldNotReachHere(); > 121: return nullptr; > 122: } This always smells like a messed up class hierarchy, when I see default methods with "not implemented". But maybe we can't do much better, and I've done similar things recently ? . A short code comment could be helpful though. Suggestion: virtual ConstraintCastNode* make_with(Node* parent, const TypeInteger* type, const DependencyType& dependency) const { ShouldNotReachHere(); // Only implemented for CastII and CastLL return nullptr; } src/hotspot/share/opto/castnode.hpp line 146: > 144: virtual uint ideal_reg() const = 0; > 145: bool carry_dependency() const { return !_dependency.cmp(FloatingNarrowingDependency); } > 146: virtual bool depends_only_on_test() const { return _dependency.floating(); } Why not rename it to `is_dependency_floating`? That may be more helpful at the use site. test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java line 95: > 93: j += Objects.checkIndex(i - 1, length); > 94: return j; > 95: } Why not add an additional IR rule that checks that there are more casts before they get commoned? Just for completenes ;) ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3451986831 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517197209 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517271796 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517301300 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517315011 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517336133 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517344615 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517236142 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517203781 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517366170 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517205971 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517200829 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517251068 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517260839 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517355725 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517299467 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517370224 From epeter at openjdk.org Wed Nov 12 08:33:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 08:33:29 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: References: Message-ID: <2RJF9zYoCEnq2riltw2AoWpBYa7T2F7eXEQRTIQJT_w=.f9001c12-2fe9-4432-9aba-d4f0eb59e5dd@github.com> On Wed, 12 Nov 2025 07:24:01 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: >> >> - review >> - infinite loop in gvn fix >> - renaming > > src/hotspot/share/opto/castnode.cpp line 47: > >> 45: Node* ConstraintCastNode::Identity(PhaseGVN* phase) { >> 46: if (!_dependency.narrows_type()) { >> 47: return this; > > Can you please add a code comment? I don't understand it right away :/ Maybe I'm slowly starting to understand... but a code comment would still help a lot here. We are trying to find a dominating cast that has the same or narrower type, and replace with that one. We are only allowed to do that if we have a narrowing cast, because ... > src/hotspot/share/opto/castnode.cpp line 277: > >> 275: >> 276: CastIINode* CastIINode::pin_array_access_node() const { >> 277: assert(depends_only_on_test(), "already pinned"); > > Would this not be more readable? > > Suggestion: > > assert(is_dependency_floating(), "already pinned"); Because it seems we are talking about floating vs pinned here. Adding yet another concept of "depending only on test" would require further explanation / definition. > src/hotspot/share/opto/castnode.cpp line 588: > >> 586: >> 587: // If both inputs are not constant then, with the Cast pushed through the Add/Sub, the cast gets less precised types, >> 588: // and the resulting Add/Sub's type is wider than that of the Cast before pushing. > > I find this long sentence a bit complicated to read. Can you reformulate and maybe break it into smaller sentences? > It would also be good to explicitly say why that may require changing the dependency constraint. I wonder if you renamed `widen_type_dependency` to `with_non_narrowing`, and explained that this now prevents folding away the cast if input types are narrower, etc... that would maybe be more straight forward? I suppose your approach was to just "notify" the dependency that we have widened the type, and then the dependency manages what the implications are. But I find that approach a bit less straight forward, because we are not talking about widening the exact same cast, but a cast that has been pushed through an add/sub. Maybe you can manage to make a coherent argument though, up to you. > src/hotspot/share/opto/castnode.cpp line 625: > >> 623: if (!phase->C->post_loop_opts_phase()) { >> 624: return this_type; >> 625: } > > Honestly, I would prefer to see this "delay to post loop opts" to be done outside of `widen_type`. It would just make more sense there. What do you think? But maybe that is a refactoring for a separate RFE, and then not really worth it. > src/hotspot/share/opto/castnode.hpp line 53: > >> 51: _narrows_type(narrows_type), >> 52: _desc(desc) { >> 53: } > > Could you make the constructor private, and only expose the 4 static fields? That way, nobody comes to the strange idea to construct one of these themselves ;) That would probably require moving the 4 static fields into this class here. Example: `ConstraintCastNode::DependencyType::FloatingNarrowing` Just an idea. Maybe you have a different solution. But a private constructor would be great for sure. > src/hotspot/share/opto/castnode.hpp line 146: > >> 144: virtual uint ideal_reg() const = 0; >> 145: bool carry_dependency() const { return !_dependency.cmp(FloatingNarrowingDependency); } >> 146: virtual bool depends_only_on_test() const { return _dependency.floating(); } > > Why not rename it to `is_dependency_floating`? That may be more helpful at the use site. Otherwise you have to give an explanation/code comment about the concept "depending on test", and define it in terms of floating / non-floating. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517268181 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517304372 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517331973 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517345703 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517217941 PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517358981 From epeter at openjdk.org Wed Nov 12 08:33:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Nov 2025 08:33:31 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: <2RJF9zYoCEnq2riltw2AoWpBYa7T2F7eXEQRTIQJT_w=.f9001c12-2fe9-4432-9aba-d4f0eb59e5dd@github.com> References: <2RJF9zYoCEnq2riltw2AoWpBYa7T2F7eXEQRTIQJT_w=.f9001c12-2fe9-4432-9aba-d4f0eb59e5dd@github.com> Message-ID: On Wed, 12 Nov 2025 08:19:21 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/castnode.cpp line 625: >> >>> 623: if (!phase->C->post_loop_opts_phase()) { >>> 624: return this_type; >>> 625: } >> >> Honestly, I would prefer to see this "delay to post loop opts" to be done outside of `widen_type`. It would just make more sense there. What do you think? > > But maybe that is a refactoring for a separate RFE, and then not really worth it. But conceptually, we want to say: if we are in post loop opts, then widen the types. Now it looks like we want to widen always ... but then we check for post loop opts inside the method and bail out anyway. Not very transparent. Another idea: rename the method to `widen_type_in_post_loop_opts`. Totally up to you though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2517350982 From shade at openjdk.org Wed Nov 12 09:57:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Nov 2025 09:57:07 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 01:29:53 GMT, Xiaolong Peng wrote: > Current alloc request type enum: > > enum Type { > _alloc_shared, // Allocate common, outside of TLAB > _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB > _alloc_cds, // Allocate for CDS > _alloc_tlab, // Allocate TLAB > _alloc_gclab, // Allocate GCLAB > _alloc_plab, // Allocate PLAB > _ALLOC_LIMIT > }; > > With current design, we have to use switch statement in multiple places resulting in unnecessary branches, for instance the function is_mutator_alloc: > > > inline bool is_mutator_alloc() const { > switch (_alloc_type) { > case _alloc_tlab: > case _alloc_shared: > case _alloc_cds: > return true; > case _alloc_gclab: > case _alloc_plab: > case _alloc_shared_gc: > return false; > default: > ShouldNotReachHere(); > return false; > } > } > > > > In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: > 1. Smaller value for mutator alloc, larger value for gc alloc; GC alloc types are always greater than any of mutator alloc types. > 2. Odd for lab, even number for non-lab > > Three functions have been simplified to one-line impl w/o branches in machine code: > > > inline bool is_mutator_alloc() const { > return _alloc_type <= _alloc_shared; > } > > inline bool is_gc_alloc() const { > return _alloc_type >= _alloc_shared_gc; > } > > inline bool is_lab_alloc() const { > return (_alloc_type & 1) == 1; > } > > > I didn't check compiled assemble code of hotspot, in instead, I wrote similar/equivalent code and compile with gcc for comparison using godbolt.org: > > bool is_lab_alloc(int alloc_type) { > return (alloc_type & 1) == 1; > } > > bool is_lab_alloc_switch(int alloc_type) { > switch (alloc_type) { > case 0: > case 2: > case 4: > return false; > case 1: > case 3: > case 5: > return true; > default: > throw "Should not reach here"; > > } > } > > x86_64 assembly code (https://godbolt.org/z/h7xfz8PaT): > > is_lab_alloc(int): > push rbp > mov rbp, rsp > mov DWORD PTR [rbp-4], edi > mov eax, DWORD PTR [rbp-4] > and eax, 1 > and eax, 1 > pop rbp > ret > .LC0: > .string "Should not reach here" > is_lab_alloc_switch(int): > push rbp > mov rbp, rsp > sub rsp, 16 > mov DWORD PTR [rbp-4], edi > cmp DWORD PTR [rbp-4], 5 > je .L... Honestly, I don't believe switching from explicit switch cases to bit manipulation is that much readable here. If you want to pursue this, then maybe do the proper bitmask manipulation, something like: // bit 0: mutator (0) or GC (1) alloc // bit 1: LAB (0) or shared (1) alloc // bit 2: if LAB, then GCLAB (0) or PLAB (1) // bit 3: if mutator, then normal (0) or CDS (1) typedef int AllocType; constexpr int bit_gc_alloc = 1 << 1; constexpr int bit_lab_alloc = 1 << 2; constexpr int bit_plab_alloc = 1 << 3; constexpr int bit_cds_alloc = 1 << 4; ... const bool is_lab_alloc(AllocType type) { return (type & bit_lab_alloc) != 0; } constexpr AllocType _alloc_tlab = bit_lab_alloc; constexpr AllocType _alloc_plab = bit_gc_alloc | bit_lab_alloc | bit_plab_alloc; ... Remains to be seen what is the most sensible encoding scheme here. ------------- PR Review: https://git.openjdk.org/jdk/pull/28247#pullrequestreview-3452564118 From xpeng at openjdk.org Wed Nov 12 16:07:03 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Nov 2025 16:07:03 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 09:54:27 GMT, Aleksey Shipilev wrote: > Honestly, I don't believe switching from explicit switch cases to bit manipulation is that much readable here. If you want to pursue this, then maybe do the proper bitmask manipulation, something like: > > ``` > // bit 0: mutator (0) or GC (1) alloc > // bit 1: LAB (0) or shared (1) alloc > // bit 2: if LAB, then GCLAB (0) or PLAB (1) > // bit 3: if mutator, then normal (0) or CDS (1) > typedef int AllocType; > > constexpr int bit_gc_alloc = 1 << 1; > constexpr int bit_lab_alloc = 1 << 2; > constexpr int bit_plab_alloc = 1 << 3; > constexpr int bit_cds_alloc = 1 << 4; > ... > const bool is_lab_alloc(AllocType type) { > return (type & bit_lab_alloc) != 0; > } > > constexpr AllocType _alloc_tlab = bit_lab_alloc; > constexpr AllocType _alloc_plab = bit_gc_alloc | bit_lab_alloc | bit_plab_alloc; > ... > ``` > > Remains to be seen what is the most sensible encoding scheme here. Thanks you for the suggestion. I didn't think about changing the enum to constexpr int for alloc request type, just wanted to re-shuffle the values of current enum, so I only used bit manipulation when test if it lab allocation. I'll update the PR and use proper bitmask manipulation as you suggested. In terms of readability, explicit switch cases is definitely more readable, but machine code size will be also larger with more branches, so in theory it is less efficient. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28247#issuecomment-3522696592 From xpeng at openjdk.org Wed Nov 12 16:11:00 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Nov 2025 16:11:00 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v2] In-Reply-To: References: Message-ID: > Current alloc request type enum: > > enum Type { > _alloc_shared, // Allocate common, outside of TLAB > _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB > _alloc_cds, // Allocate for CDS > _alloc_tlab, // Allocate TLAB > _alloc_gclab, // Allocate GCLAB > _alloc_plab, // Allocate PLAB > _ALLOC_LIMIT > }; > > With current design, we have to use switch statement in multiple places resulting in unnecessary branches, for instance the function is_mutator_alloc: > > > inline bool is_mutator_alloc() const { > switch (_alloc_type) { > case _alloc_tlab: > case _alloc_shared: > case _alloc_cds: > return true; > case _alloc_gclab: > case _alloc_plab: > case _alloc_shared_gc: > return false; > default: > ShouldNotReachHere(); > return false; > } > } > > > > In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: > 1. Smaller value for mutator alloc, larger value for gc alloc; GC alloc types are always greater than any of mutator alloc types. > 2. Odd for lab, even number for non-lab > > Three functions have been simplified to one-line impl w/o branches in machine code: > > > inline bool is_mutator_alloc() const { > return _alloc_type <= _alloc_shared; > } > > inline bool is_gc_alloc() const { > return _alloc_type >= _alloc_shared_gc; > } > > inline bool is_lab_alloc() const { > return (_alloc_type & 1) == 1; > } > > > I didn't check compiled assemble code of hotspot, in instead, I wrote similar/equivalent code and compile with gcc for comparison using godbolt.org: > > bool is_lab_alloc(int alloc_type) { > return (alloc_type & 1) == 1; > } > > bool is_lab_alloc_switch(int alloc_type) { > switch (alloc_type) { > case 0: > case 2: > case 4: > return false; > case 1: > case 3: > case 5: > return true; > default: > throw "Should not reach here"; > > } > } > > x86_64 assembly code (https://godbolt.org/z/h7xfz8PaT): > > is_lab_alloc(int): > push rbp > mov rbp, rsp > mov DWORD PTR [rbp-4], edi > mov eax, DWORD PTR [rbp-4] > and eax, 1 > and eax, 1 > pop rbp > ret > .LC0: > .string "Should not reach here" > is_lab_alloc_switch(int): > push rbp > mov rbp, rsp > sub rsp, 16 > mov DWORD PTR [rbp-4], edi > cmp DWORD PTR [rbp-4], 5 > je .L... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Refactor alloc type with bit masks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28247/files - new: https://git.openjdk.org/jdk/pull/28247/files/eaf5997c..e44a3013 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=00-01 Stats: 20 lines in 1 file changed: 7 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/28247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28247/head:pull/28247 PR: https://git.openjdk.org/jdk/pull/28247 From shade at openjdk.org Wed Nov 12 16:58:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Nov 2025 16:58:51 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 16:11:00 GMT, Xiaolong Peng wrote: >> Current alloc request type enum: >> >> enum Type { >> _alloc_shared, // Allocate common, outside of TLAB >> _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB >> _alloc_cds, // Allocate for CDS >> _alloc_tlab, // Allocate TLAB >> _alloc_gclab, // Allocate GCLAB >> _alloc_plab, // Allocate PLAB >> _ALLOC_LIMIT >> }; >> >> With current design, we have to use switch statement in multiple places resulting in unnecessary branches, for instance the function is_mutator_alloc: >> >> >> inline bool is_mutator_alloc() const { >> switch (_alloc_type) { >> case _alloc_tlab: >> case _alloc_shared: >> case _alloc_cds: >> return true; >> case _alloc_gclab: >> case _alloc_plab: >> case _alloc_shared_gc: >> return false; >> default: >> ShouldNotReachHere(); >> return false; >> } >> } >> >> >> >> In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: >> 1. Smaller value for mutator alloc, larger value for gc alloc; GC alloc types are always greater than any of mutator alloc types. >> 2. Odd for lab, even number for non-lab >> >> Three functions have been simplified to one-line impl w/o branches in machine code: >> >> >> inline bool is_mutator_alloc() const { >> return _alloc_type <= _alloc_shared; >> } >> >> inline bool is_gc_alloc() const { >> return _alloc_type >= _alloc_shared_gc; >> } >> >> inline bool is_lab_alloc() const { >> return (_alloc_type & 1) == 1; >> } >> >> >> I didn't check compiled assemble code of hotspot, in instead, I wrote similar/equivalent code and compile with gcc for comparison using godbolt.org: >> >> bool is_lab_alloc(int alloc_type) { >> return (alloc_type & 1) == 1; >> } >> >> bool is_lab_alloc_switch(int alloc_type) { >> switch (alloc_type) { >> case 0: >> case 2: >> case 4: >> return false; >> case 1: >> case 3: >> case 5: >> return true; >> default: >> throw "Should not reach here"; >> >> } >> } >> >> x86_64 assembly code (https://godbolt.org/z/h7xfz8PaT): >> >> is_lab_alloc(int): >> push rbp >> mov rbp, rsp >> mov DWORD PTR [rbp-4], edi >> mov eax, DWORD PTR [rbp-4] >> and eax, 1 >> and eax, 1 >> pop rbp >> ret >> .LC0: >> .string "Should not reach here" >> is_lab_allo... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Refactor alloc type with bit masks Now that we are doing this... I do wonder if we want to fold `_is_promotion` and `_affiliation` into the same bitset? src/hotspot/share/gc/shenandoah/shenandoahAllocRequest.hpp line 178: > 176: > 177: inline bool is_mutator_alloc() const { > 178: return !is_gc_alloc(); Inline this, to check the `bit_gc_alloc` directly. src/hotspot/share/gc/shenandoah/shenandoahAllocRequest.hpp line 182: > 180: > 181: inline bool is_gc_alloc() const { > 182: return (_alloc_type & bit_gc_alloc) == bit_gc_alloc; Check for `!= 0` in these. It is microscopically more efficient. ------------- PR Review: https://git.openjdk.org/jdk/pull/28247#pullrequestreview-3454438588 PR Review Comment: https://git.openjdk.org/jdk/pull/28247#discussion_r2519075032 PR Review Comment: https://git.openjdk.org/jdk/pull/28247#discussion_r2519052973 From wkemper at openjdk.org Wed Nov 12 17:08:31 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Nov 2025 17:08:31 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO [v2] In-Reply-To: References: <43plTkqggX_oA_fGW7bKG0Gk_K5nzo4XeCvxswOYl6A=.a1770f88-cfc2-40d4-aa59-ff8e9413faa6@github.com> Message-ID: <2xpmXZKcmu3NPRYcj_0PZxLxN2dIF_LbhWerqJL7Ra0=.8552e539-0808-461e-be7b-3964c2a94c8c@github.com> On Wed, 12 Nov 2025 00:58:31 GMT, William Kemper wrote: >> Ah, missed your general comment above. I think we're talking about the same solution. I'll update the passive mode to use `FLAG_SET_DEFAULT`. > > Yeah, that's what I was suggesting. Thanks! Can we put the lower bound back to 1 now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28242#discussion_r2519122056 From xpeng at openjdk.org Wed Nov 12 17:29:24 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Nov 2025 17:29:24 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v3] In-Reply-To: References: Message-ID: > Current alloc request type enum: > > enum Type { > _alloc_shared, // Allocate common, outside of TLAB > _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB > _alloc_cds, // Allocate for CDS > _alloc_tlab, // Allocate TLAB > _alloc_gclab, // Allocate GCLAB > _alloc_plab, // Allocate PLAB > _ALLOC_LIMIT > }; > > With current design, we have to use switch statement in multiple places resulting in unnecessary branches, for instance the function is_mutator_alloc: > > > inline bool is_mutator_alloc() const { > switch (_alloc_type) { > case _alloc_tlab: > case _alloc_shared: > case _alloc_cds: > return true; > case _alloc_gclab: > case _alloc_plab: > case _alloc_shared_gc: > return false; > default: > ShouldNotReachHere(); > return false; > } > } > > > > In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: > 1. Smaller value for mutator alloc, larger value for gc alloc; GC alloc types are always greater than any of mutator alloc types. > 2. Odd for lab, even number for non-lab > > Three functions have been simplified to one-line impl w/o branches in machine code: > > > inline bool is_mutator_alloc() const { > return _alloc_type <= _alloc_shared; > } > > inline bool is_gc_alloc() const { > return _alloc_type >= _alloc_shared_gc; > } > > inline bool is_lab_alloc() const { > return (_alloc_type & 1) == 1; > } > > > I didn't check compiled assemble code of hotspot, in instead, I wrote similar/equivalent code and compile with gcc for comparison using godbolt.org: > > bool is_lab_alloc(int alloc_type) { > return (alloc_type & 1) == 1; > } > > bool is_lab_alloc_switch(int alloc_type) { > switch (alloc_type) { > case 0: > case 2: > case 4: > return false; > case 1: > case 3: > case 5: > return true; > default: > throw "Should not reach here"; > > } > } > > x86_64 assembly code (https://godbolt.org/z/h7xfz8PaT): > > is_lab_alloc(int): > push rbp > mov rbp, rsp > mov DWORD PTR [rbp-4], edi > mov eax, DWORD PTR [rbp-4] > and eax, 1 > and eax, 1 > pop rbp > ret > .LC0: > .string "Should not reach here" > is_lab_alloc_switch(int): > push rbp > mov rbp, rsp > sub rsp, 16 > mov DWORD PTR [rbp-4], edi > cmp DWORD PTR [rbp-4], 5 > je .L... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address pr comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28247/files - new: https://git.openjdk.org/jdk/pull/28247/files/e44a3013..5348c886 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28247/head:pull/28247 PR: https://git.openjdk.org/jdk/pull/28247 From xpeng at openjdk.org Wed Nov 12 17:29:25 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Nov 2025 17:29:25 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 16:55:48 GMT, Aleksey Shipilev wrote: > Now that we are doing this... I do wonder if we want to fold `_is_promotion` and `_affiliation` into the same bitset? I'll take a look today and see if folding `_is_promotion` and `_affiliation` into the same bitset provides benefits we want. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28247#issuecomment-3523066748 From kdnilsen at openjdk.org Wed Nov 12 18:08:00 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 12 Nov 2025 18:08:00 GMT Subject: RFR: 8370039: GenShen: array copy SATB barrier improvements [v3] In-Reply-To: <4pRORBaXYXwyCJyUp3BKA4I8bHlTfkfNldK9EnDJvZw=.b0a53f9b-a9a0-4c75-a823-7cf82f69a40b@github.com> References: <4pRORBaXYXwyCJyUp3BKA4I8bHlTfkfNldK9EnDJvZw=.b0a53f9b-a9a0-4c75-a823-7cf82f69a40b@github.com> Message-ID: On Tue, 11 Nov 2025 00:33:36 GMT, William Kemper wrote: >> When an array copy happens concurrently with old and young marking, Shenandoah's generational mode walks over the array twice. This is unnecessary and increases the workload for marking threads. It also has been unconditionally enqueuing old references during a young mark. This is also unnecessary and also increases marking workload. Finally, the barrier went through a somewhat complicated decision process based on affiliation of the region where the array resides. However, the barrier must consider the affiliation of objects that are pointed at by array elements. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Revert "We can also filter out old when striclty marking young" > > This reverts commit c53c4f23f4401785e1049494b6c4e4b92f9a5701. Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28183#pullrequestreview-3454779191 From duke at openjdk.org Wed Nov 12 19:33:13 2025 From: duke at openjdk.org (Rui Li) Date: Wed, 12 Nov 2025 19:33:13 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO [v3] In-Reply-To: <2xpmXZKcmu3NPRYcj_0PZxLxN2dIF_LbhWerqJL7Ra0=.8552e539-0808-461e-be7b-3964c2a94c8c@github.com> References: <43plTkqggX_oA_fGW7bKG0Gk_K5nzo4XeCvxswOYl6A=.a1770f88-cfc2-40d4-aa59-ff8e9413faa6@github.com> <2xpmXZKcmu3NPRYcj_0PZxLxN2dIF_LbhWerqJL7Ra0=.8552e539-0808-461e-be7b-3964c2a94c8c@github.com> Message-ID: On Wed, 12 Nov 2025 17:05:17 GMT, William Kemper wrote: >> Yeah, that's what I was suggesting. Thanks! > > Can we put the lower bound back to 1 now? Sorry, forgot. Updating. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28242#discussion_r2519528832 From duke at openjdk.org Wed Nov 12 19:33:12 2025 From: duke at openjdk.org (Rui Li) Date: Wed, 12 Nov 2025 19:33:12 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO [v3] In-Reply-To: References: Message-ID: > Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. > > > For `ShenandoahEvacReserve` change: when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L40-L42 > > `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does so some of the jtreg would fail because of this change. Had to move this ergo setting to regular `FLAG_SET_DEFAULT`. > > Testing: jtreg gc. GHA pending. Rui Li has updated the pull request incrementally with one additional commit since the last revision: Move ShenandoahEvacReserve back to (1,100) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28242/files - new: https://git.openjdk.org/jdk/pull/28242/files/65fc9878..6200456b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28242&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28242&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28242.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28242/head:pull/28242 PR: https://git.openjdk.org/jdk/pull/28242 From kdnilsen at openjdk.org Wed Nov 12 21:10:50 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 12 Nov 2025 21:10:50 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v8] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix merge error - compute_old_generation_balance() during freeset rebuild ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/35d3911d..18a49d8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=06-07 Stats: 45 lines in 4 files changed: 42 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From xpeng at openjdk.org Wed Nov 12 23:05:39 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Nov 2025 23:05:39 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v4] In-Reply-To: References: Message-ID: > Current alloc request type enum: > > enum Type { > _alloc_shared, // Allocate common, outside of TLAB > _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB > _alloc_cds, // Allocate for CDS > _alloc_tlab, // Allocate TLAB > _alloc_gclab, // Allocate GCLAB > _alloc_plab, // Allocate PLAB > _ALLOC_LIMIT > }; > > With current design, we have to use switch statement in multiple places resulting in unnecessary branches, for instance the function is_mutator_alloc: > > > inline bool is_mutator_alloc() const { > switch (_alloc_type) { > case _alloc_tlab: > case _alloc_shared: > case _alloc_cds: > return true; > case _alloc_gclab: > case _alloc_plab: > case _alloc_shared_gc: > return false; > default: > ShouldNotReachHere(); > return false; > } > } > > > > In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: > 1. Smaller value for mutator alloc, larger value for gc alloc; GC alloc types are always greater than any of mutator alloc types. > 2. Odd for lab, even number for non-lab > > Three functions have been simplified to one-line impl w/o branches in machine code: > > > inline bool is_mutator_alloc() const { > return _alloc_type <= _alloc_shared; > } > > inline bool is_gc_alloc() const { > return _alloc_type >= _alloc_shared_gc; > } > > inline bool is_lab_alloc() const { > return (_alloc_type & 1) == 1; > } > > > I didn't check compiled assemble code of hotspot, in instead, I wrote similar/equivalent code and compile with gcc for comparison using godbolt.org: > > bool is_lab_alloc(int alloc_type) { > return (alloc_type & 1) == 1; > } > > bool is_lab_alloc_switch(int alloc_type) { > switch (alloc_type) { > case 0: > case 2: > case 4: > return false; > case 1: > case 3: > case 5: > return true; > default: > throw "Should not reach here"; > > } > } > > x86_64 assembly code (https://godbolt.org/z/h7xfz8PaT): > > is_lab_alloc(int): > push rbp > mov rbp, rsp > mov DWORD PTR [rbp-4], edi > mov eax, DWORD PTR [rbp-4] > and eax, 1 > and eax, 1 > pop rbp > ret > .LC0: > .string "Should not reach here" > is_lab_alloc_switch(int): > push rbp > mov rbp, rsp > sub rsp, 16 > mov DWORD PTR [rbp-4], edi > cmp DWORD PTR [rbp-4], 5 > je .L... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fold _affiliation and _is_promotion into the bitset of request type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28247/files - new: https://git.openjdk.org/jdk/pull/28247/files/5348c886..2cf6600e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=02-03 Stats: 72 lines in 2 files changed: 14 ins; 28 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/28247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28247/head:pull/28247 PR: https://git.openjdk.org/jdk/pull/28247 From duke at openjdk.org Wed Nov 12 23:08:08 2025 From: duke at openjdk.org (Rui Li) Date: Wed, 12 Nov 2025 23:08:08 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO [v3] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 19:33:12 GMT, Rui Li wrote: >> Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. >> >> >> For `ShenandoahEvacReserve` change: when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L40-L42 >> >> `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does so some of the jtreg would fail because of this change. Had to move this ergo setting to regular `FLAG_SET_DEFAULT`. >> >> Testing: jtreg gc. GHA pending. > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Move ShenandoahEvacReserve back to (1,100) GHA test `test/jdk/java/util/Arrays/SortingNearlySortedPrimitive.java` failed, but it uses g1, should be unrelated: # JRE version: OpenJDK Runtime Environment (26.0) (build 26-internal-rgithubli-6200456b7130997d227dd5a128e4cefbd05059b5) # Java VM: OpenJDK 64-Bit Server VM (26-internal-rgithubli-6200456b7130997d227dd5a128e4cefbd05059b5, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x6765ff] frame::sender(RegisterMap*) const+0x29f It passed on my local. Rerun on GHA and check hs_err log. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28242#issuecomment-3524259349 From xpeng at openjdk.org Wed Nov 12 23:17:34 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Nov 2025 23:17:34 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v5] In-Reply-To: References: Message-ID: > Current alloc request type enum: > > enum Type { > _alloc_shared, // Allocate common, outside of TLAB > _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB > _alloc_cds, // Allocate for CDS > _alloc_tlab, // Allocate TLAB > _alloc_gclab, // Allocate GCLAB > _alloc_plab, // Allocate PLAB > _ALLOC_LIMIT > }; > > With current design, we have to use switch statement in multiple places resulting in unnecessary branches, for instance the function is_mutator_alloc: > > > inline bool is_mutator_alloc() const { > switch (_alloc_type) { > case _alloc_tlab: > case _alloc_shared: > case _alloc_cds: > return true; > case _alloc_gclab: > case _alloc_plab: > case _alloc_shared_gc: > return false; > default: > ShouldNotReachHere(); > return false; > } > } > > > > In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: > 1. Smaller value for mutator alloc, larger value for gc alloc; GC alloc types are always greater than any of mutator alloc types. > 2. Odd for lab, even number for non-lab > > Three functions have been simplified to one-line impl w/o branches in machine code: > > > inline bool is_mutator_alloc() const { > return _alloc_type <= _alloc_shared; > } > > inline bool is_gc_alloc() const { > return _alloc_type >= _alloc_shared_gc; > } > > inline bool is_lab_alloc() const { > return (_alloc_type & 1) == 1; > } > > > I didn't check compiled assemble code of hotspot, in instead, I wrote similar/equivalent code and compile with gcc for comparison using godbolt.org: > > bool is_lab_alloc(int alloc_type) { > return (alloc_type & 1) == 1; > } > > bool is_lab_alloc_switch(int alloc_type) { > switch (alloc_type) { > case 0: > case 2: > case 4: > return false; > case 1: > case 3: > case 5: > return true; > default: > throw "Should not reach here"; > > } > } > > x86_64 assembly code (https://godbolt.org/z/h7xfz8PaT): > > is_lab_alloc(int): > push rbp > mov rbp, rsp > mov DWORD PTR [rbp-4], edi > mov eax, DWORD PTR [rbp-4] > and eax, 1 > and eax, 1 > pop rbp > ret > .LC0: > .string "Should not reach here" > is_lab_alloc_switch(int): > push rbp > mov rbp, rsp > sub rsp, 16 > mov DWORD PTR [rbp-4], edi > cmp DWORD PTR [rbp-4], 5 > je .L... Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Remove dead code - Add asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28247/files - new: https://git.openjdk.org/jdk/pull/28247/files/2cf6600e..e05b45bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=03-04 Stats: 9 lines in 1 file changed: 8 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28247/head:pull/28247 PR: https://git.openjdk.org/jdk/pull/28247 From xpeng at openjdk.org Wed Nov 12 23:22:29 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Nov 2025 23:22:29 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v6] In-Reply-To: References: Message-ID: > Current alloc request type enum: > > enum Type { > _alloc_shared, // Allocate common, outside of TLAB > _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB > _alloc_cds, // Allocate for CDS > _alloc_tlab, // Allocate TLAB > _alloc_gclab, // Allocate GCLAB > _alloc_plab, // Allocate PLAB > _ALLOC_LIMIT > }; > > With current design, we have to use switch statement in multiple places resulting in unnecessary branches, for instance the function is_mutator_alloc: > > > inline bool is_mutator_alloc() const { > switch (_alloc_type) { > case _alloc_tlab: > case _alloc_shared: > case _alloc_cds: > return true; > case _alloc_gclab: > case _alloc_plab: > case _alloc_shared_gc: > return false; > default: > ShouldNotReachHere(); > return false; > } > } > > > > In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: > 1. Smaller value for mutator alloc, larger value for gc alloc; GC alloc types are always greater than any of mutator alloc types. > 2. Odd for lab, even number for non-lab > > Three functions have been simplified to one-line impl w/o branches in machine code: > > > inline bool is_mutator_alloc() const { > return _alloc_type <= _alloc_shared; > } > > inline bool is_gc_alloc() const { > return _alloc_type >= _alloc_shared_gc; > } > > inline bool is_lab_alloc() const { > return (_alloc_type & 1) == 1; > } > > > I didn't check compiled assemble code of hotspot, in instead, I wrote similar/equivalent code and compile with gcc for comparison using godbolt.org: > > bool is_lab_alloc(int alloc_type) { > return (alloc_type & 1) == 1; > } > > bool is_lab_alloc_switch(int alloc_type) { > switch (alloc_type) { > case 0: > case 2: > case 4: > return false; > case 1: > case 3: > case 5: > return true; > default: > throw "Should not reach here"; > > } > } > > x86_64 assembly code (https://godbolt.org/z/h7xfz8PaT): > > is_lab_alloc(int): > push rbp > mov rbp, rsp > mov DWORD PTR [rbp-4], edi > mov eax, DWORD PTR [rbp-4] > and eax, 1 > and eax, 1 > pop rbp > ret > .LC0: > .string "Should not reach here" > is_lab_alloc_switch(int): > push rbp > mov rbp, rsp > sub rsp, 16 > mov DWORD PTR [rbp-4], edi > cmp DWORD PTR [rbp-4], 5 > je .L... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add missing cases in ShenandoahHeapRegion::adjust_alloc_metadata ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28247/files - new: https://git.openjdk.org/jdk/pull/28247/files/e05b45bd..4b9f1308 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=04-05 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28247/head:pull/28247 PR: https://git.openjdk.org/jdk/pull/28247 From xpeng at openjdk.org Thu Nov 13 00:26:02 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 13 Nov 2025 00:26:02 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v6] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 23:22:29 GMT, Xiaolong Peng wrote: >> Current alloc request type enum: >> >> enum Type { >> _alloc_shared, // Allocate common, outside of TLAB >> _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB >> _alloc_cds, // Allocate for CDS >> _alloc_tlab, // Allocate TLAB >> _alloc_gclab, // Allocate GCLAB >> _alloc_plab, // Allocate PLAB >> _ALLOC_LIMIT >> }; >> >> With current design, we have to use switch statement in multiple places resulting in unnecessary branches, for instance the function is_mutator_alloc: >> >> >> inline bool is_mutator_alloc() const { >> switch (_alloc_type) { >> case _alloc_tlab: >> case _alloc_shared: >> case _alloc_cds: >> return true; >> case _alloc_gclab: >> case _alloc_plab: >> case _alloc_shared_gc: >> return false; >> default: >> ShouldNotReachHere(); >> return false; >> } >> } >> >> >> >> In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: >> 1. Smaller value for mutator alloc, larger value for gc alloc; GC alloc types are always greater than any of mutator alloc types. >> 2. Odd for lab, even number for non-lab >> >> Three functions have been simplified to one-line impl w/o branches in machine code: >> >> >> inline bool is_mutator_alloc() const { >> return _alloc_type <= _alloc_shared; >> } >> >> inline bool is_gc_alloc() const { >> return _alloc_type >= _alloc_shared_gc; >> } >> >> inline bool is_lab_alloc() const { >> return (_alloc_type & 1) == 1; >> } >> >> >> I didn't check compiled assemble code of hotspot, in instead, I wrote similar/equivalent code and compile with gcc for comparison using godbolt.org: >> >> bool is_lab_alloc(int alloc_type) { >> return (alloc_type & 1) == 1; >> } >> >> bool is_lab_alloc_switch(int alloc_type) { >> switch (alloc_type) { >> case 0: >> case 2: >> case 4: >> return false; >> case 1: >> case 3: >> case 5: >> return true; >> default: >> throw "Should not reach here"; >> >> } >> } >> >> x86_64 assembly code (https://godbolt.org/z/h7xfz8PaT): >> >> is_lab_alloc(int): >> push rbp >> mov rbp, rsp >> mov DWORD PTR [rbp-4], edi >> mov eax, DWORD PTR [rbp-4] >> and eax, 1 >> and eax, 1 >> pop rbp >> ret >> .LC0: >> .string "Should not reach here" >> is_lab_allo... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add missing cases in ShenandoahHeapRegion::adjust_alloc_metadata src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 3272: > 3270: switch (req.type()) { > 3271: case ShenandoahAllocRequest::_alloc_shared: > 3272: case ShenandoahAllocRequest::_alloc_shared_gc: humongous objects never move, not possible to be _alloc_shared_gc here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28247#discussion_r2520400311 From xpeng at openjdk.org Thu Nov 13 00:29:05 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 13 Nov 2025 00:29:05 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v2] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 17:25:52 GMT, Xiaolong Peng wrote: > > Now that we are doing this... I do wonder if we want to fold `_is_promotion` and `_affiliation` into the same bitset? > > I'll take a look today and see if folding `_is_promotion` and `_affiliation` into the same bitset provides benefits we want. I have made the change to fold `_is_promotion` and `_affiliation` into the same bitset of `_alloc`, but I am not 100% sure about whether we should do it or not. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28247#issuecomment-3524505491 From xpeng at openjdk.org Thu Nov 13 01:03:28 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 13 Nov 2025 01:03:28 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v7] In-Reply-To: References: Message-ID: > Current alloc request type enum: > > enum Type { > _alloc_shared, // Allocate common, outside of TLAB > _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB > _alloc_cds, // Allocate for CDS > _alloc_tlab, // Allocate TLAB > _alloc_gclab, // Allocate GCLAB > _alloc_plab, // Allocate PLAB > _ALLOC_LIMIT > }; > > With current design, we have to use switch statement in multiple places resulting in unnecessary branches, for instance the function is_mutator_alloc: > > > inline bool is_mutator_alloc() const { > switch (_alloc_type) { > case _alloc_tlab: > case _alloc_shared: > case _alloc_cds: > return true; > case _alloc_gclab: > case _alloc_plab: > case _alloc_shared_gc: > return false; > default: > ShouldNotReachHere(); > return false; > } > } > > > > In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: > 1. Smaller value for mutator alloc, larger value for gc alloc; GC alloc types are always greater than any of mutator alloc types. > 2. Odd for lab, even number for non-lab > > Three functions have been simplified to one-line impl w/o branches in machine code: > > > inline bool is_mutator_alloc() const { > return _alloc_type <= _alloc_shared; > } > > inline bool is_gc_alloc() const { > return _alloc_type >= _alloc_shared_gc; > } > > inline bool is_lab_alloc() const { > return (_alloc_type & 1) == 1; > } > > > I didn't check compiled assemble code of hotspot, in instead, I wrote similar/equivalent code and compile with gcc for comparison using godbolt.org: > > bool is_lab_alloc(int alloc_type) { > return (alloc_type & 1) == 1; > } > > bool is_lab_alloc_switch(int alloc_type) { > switch (alloc_type) { > case 0: > case 2: > case 4: > return false; > case 1: > case 3: > case 5: > return true; > default: > throw "Should not reach here"; > > } > } > > x86_64 assembly code (https://godbolt.org/z/h7xfz8PaT): > > is_lab_alloc(int): > push rbp > mov rbp, rsp > mov DWORD PTR [rbp-4], edi > mov eax, DWORD PTR [rbp-4] > and eax, 1 > and eax, 1 > pop rbp > ret > .LC0: > .string "Should not reach here" > is_lab_alloc_switch(int): > push rbp > mov rbp, rsp > sub rsp, 16 > mov DWORD PTR [rbp-4], edi > cmp DWORD PTR [rbp-4], 5 > je .L... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix wrong bit masks(offset by 1) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28247/files - new: https://git.openjdk.org/jdk/pull/28247/files/4b9f1308..131f6a6a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28247&range=05-06 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/28247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28247/head:pull/28247 PR: https://git.openjdk.org/jdk/pull/28247 From wkemper at openjdk.org Thu Nov 13 01:20:11 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 13 Nov 2025 01:20:11 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO [v3] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 19:33:12 GMT, Rui Li wrote: >> Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. >> >> >> For `ShenandoahEvacReserve` change: when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L40-L42 >> >> `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does so some of the jtreg would fail because of this change. Had to move this ergo setting to regular `FLAG_SET_DEFAULT`. >> >> Testing: jtreg gc. GHA pending. > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Move ShenandoahEvacReserve back to (1,100) Thank you! ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28242#pullrequestreview-3456510194 From dlong at openjdk.org Thu Nov 13 01:22:13 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 13 Nov 2025 01:22:13 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 05:11:47 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge master > - update make_barrier_type > - Merge branch 'openjdk:master' into new_pr > - Merge branch 'openjdk:master' into new_pr > - My chages It looks good but I need to run testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/27279#pullrequestreview-3456522254 From dlong at openjdk.org Thu Nov 13 10:05:29 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 13 Nov 2025 10:05:29 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 05:11:47 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge master > - update make_barrier_type > - Merge branch 'openjdk:master' into new_pr > - Merge branch 'openjdk:master' into new_pr > - My chages Testing passed. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27279#pullrequestreview-3458860722 From wkemper at openjdk.org Thu Nov 13 14:29:38 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 13 Nov 2025 14:29:38 GMT Subject: RFR: Merge openjdk/jdk21u:master Message-ID: Merges tag jdk-21.0.10+2 ------------- Commit messages: - 8361599: [PPC64] enable missing tests via jtreg requires - 8342576: [macos] AppContentTest still fails after JDK-8341443 for same reason on older macOS versions - 8341443: [macos] AppContentTest and SigningOptionsTest failed due to "codesign" does not fails with "--app-content" on macOS 15 - 8311906: Improve robustness of String constructors with mutable array inputs - 8315990: Amend problemlisted tests to proper position - 8316422: TestIntegerUnsignedDivMod.java triggers "invalid layout" assert in FrameValues::validate - 8328299: Convert DnDFileGroupDescriptor.html applet test to main - 8352686: Opensource JInternalFrame tests - series3 - 8352905: Open some JComboBox bugs 1 - 8352687: Opensource few JInternalFrame and JTextField tests - ... and 281 more: https://git.openjdk.org/shenandoah-jdk21u/compare/7f146482...b2eccbc3 The webrev contains the conflicts with master: - merge conflicts: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=227&range=00.conflicts Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/227/files Stats: 65316 lines in 3049 files changed: 37242 ins; 10619 del; 17455 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/227.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/227/head:pull/227 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/227 From kdnilsen at openjdk.org Thu Nov 13 16:45:24 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 13 Nov 2025 16:45:24 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v9] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix race so freeset rebuild can happen concurrently ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/18a49d8d..8f3751ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=07-08 Stats: 13 lines in 1 file changed: 9 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From ysr at openjdk.org Thu Nov 13 17:09:12 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 13 Nov 2025 17:09:12 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v13] In-Reply-To: References: Message-ID: <74ZcsLnzeVJr4F54L7-nPE5IkWPHh77SLeHiXxlZzJ0=.c5061494-27d8-40ee-b1e5-18946853e2e9@github.com> On Mon, 10 Nov 2025 14:39:09 GMT, Kelvin Nilsen wrote: >> The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint. >> >> However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. >> >> This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 58 commits: > > - Fix mistaken merge resolution > - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates > > The resulting fastdebug build has 64 failures. I need to debug these. > Probably introduced by improper resolution of merge conflicts > - fix error in merge conflict resolution > - Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixed-evac-candidates > - rework CompressedClassSpaceSizeinJmapHeap.java > - fix errors in CompressedClassSpaceSizeInJmapHeap.java > - Add debug instrumentation to CompressedClassSpaceSizeInJmapHeap.java > - fix two indexing bugs > - add an assert to detect suspected bug > - Remove debug scaffolding > - ... and 48 more: https://git.openjdk.org/jdk/compare/c272aca8...16cd6f8a What if one used "garbage" as the sorting metric for efficiency (under assumption that I stated earlier of considering only retired, fully allocated regions -- the alternative makes the metric a bit more nuanced), and compute garbage as `[regionSize(or used assuming all of region allocated) - markedLive]`. This makes the metric invariant after final marking for any region considered in the target evacuation set, and you don't deal with trying to determine the amount allocated above TAMS, keeping the calculations simple and the selection and sorting criteria clean and easy to reason about. I also noticed that choosing selection set etc. takes the heap lock. Why? I'll leave more specific comments in the code later today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24319#issuecomment-3528816403 From xpeng at openjdk.org Thu Nov 13 17:39:28 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 13 Nov 2025 17:39:28 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO [v3] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 19:33:12 GMT, Rui Li wrote: >> Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. >> >> >> For `ShenandoahEvacReserve` change: when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L40-L42 >> >> `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does so some of the jtreg would fail because of this change. Had to move this ergo setting to regular `FLAG_SET_DEFAULT`. >> >> Testing: jtreg gc. GHA pending. > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Move ShenandoahEvacReserve back to (1,100) Marked as reviewed by xpeng (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28242#pullrequestreview-3460930727 From duke at openjdk.org Thu Nov 13 17:48:17 2025 From: duke at openjdk.org (duke) Date: Thu, 13 Nov 2025 17:48:17 GMT Subject: RFR: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO [v3] In-Reply-To: References: Message-ID: On Wed, 12 Nov 2025 19:33:12 GMT, Rui Li wrote: >> Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. >> >> >> For `ShenandoahEvacReserve` change: when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L40-L42 >> >> `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does so some of the jtreg would fail because of this change. Had to move this ergo setting to regular `FLAG_SET_DEFAULT`. >> >> Testing: jtreg gc. GHA pending. > > Rui Li has updated the pull request incrementally with one additional commit since the last revision: > > Move ShenandoahEvacReserve back to (1,100) @rgithubli Your change (at version 6200456b7130997d227dd5a128e4cefbd05059b5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28242#issuecomment-3528982588 From duke at openjdk.org Thu Nov 13 18:05:09 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 13 Nov 2025 18:05:09 GMT Subject: Integrated: 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO In-Reply-To: References: Message-ID: <5VdDwrmM_qxAOSIr0KDwmQasQ6UPK-y_a-JMa5F0o-E=.e9c67c95-30ab-4037-90fb-5c2cdb22bfa3@github.com> On Tue, 11 Nov 2025 18:24:25 GMT, Rui Li wrote: > Setting ergo flags using `FLAG_SET_ERGO`, instead of `FLAG_SET_DEFAULT`, so we can have the right origin info. > > > For `ShenandoahEvacReserve` change: when we use shenandoah passive mode and degen is also turned off (`-XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC`), we set the ShenandoahEvacReserve to 0: https://github.com/openjdk/jdk/blob/c6a8027b94bbcbde5f7dcabd0bff48b93bbb5a7f/src/hotspot/share/gc/shenandoah/mode/shenandoahPassiveMode.cpp#L40-L42 > > `FLAG_SET_DEFAULT` doesn't check the range but `FLAG_SET_ERGO` does so some of the jtreg would fail because of this change. Had to move this ergo setting to regular `FLAG_SET_DEFAULT`. > > Testing: jtreg gc. GHA pending. This pull request has now been integrated. Changeset: 2199b5fe Author: Rui Li Committer: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/2199b5fef4540ae8da77c5c4feafc8822a3d9d3d Stats: 8 lines in 2 files changed: 4 ins; 0 del; 4 mod 8371381: [Shenandoah] Setting ergo flags should use FLAG_SET_ERGO Reviewed-by: xpeng, wkemper, ysr, cslucas ------------- PR: https://git.openjdk.org/jdk/pull/28242 From kdnilsen at openjdk.org Thu Nov 13 18:12:38 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 13 Nov 2025 18:12:38 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild [v3] In-Reply-To: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: > This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. > > This addresses a problem that results if available memory is probed while we are rebuilding the freeset. > > Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge remote-tracking branch 'jdk/master' into synchronize-available-with-rebuild - update comment - Add documentation for _rebuild_lock - Hide rebuild_lock inside prepare_to_rebuild and finish_rebuild - Rename rebuild_lock() - Tighten up context for holding rebuild_lock - Remove ShenandoahFreeSet::FreeSetUnderConstruction sentinel value - Revert "revert introduction of RebuildLock" This reverts commit bec73da1dc169d391e9919203e5a406ea02a699c. - Revert "available() returns previous value if called during freeset rebuild" This reverts commit 1a5e483a4abb04b6045175e8bd4b0c11fa68cb73. - Revert "remove obsolete assertion" This reverts commit 717e7da17f03f1e52008d154fafcbbfc5f2bb20e. - ... and 7 more: https://git.openjdk.org/jdk/compare/bfc048ab...8462a290 ------------- Changes: https://git.openjdk.org/jdk/pull/27612/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27612&range=02 Stats: 90 lines in 7 files changed: 44 ins; 29 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/27612.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27612/head:pull/27612 PR: https://git.openjdk.org/jdk/pull/27612 From kdnilsen at openjdk.org Thu Nov 13 19:28:34 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 13 Nov 2025 19:28:34 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v10] In-Reply-To: References: Message-ID: <4xOcHBZN-D6qbe1vq5gMtvdXHbRVL8_jzhn_78THy98=.abb2c5ee-3188-4825-9066-6959ee49a9cd@github.com> > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix release build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/8f3751ff..68814cd0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=08-09 Stats: 6 lines in 2 files changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From kdnilsen at openjdk.org Thu Nov 13 21:06:09 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 13 Nov 2025 21:06:09 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v11] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 46 commits: - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Fix release build - Fix race so freeset rebuild can happen concurrently - Fix merge error - compute_old_generation_balance() during freeset rebuild - Initialize young evac reserve based on soft-max-capacity - Remove debug instrumentation - Merge branch 'share-collector-reserves' of https://github.com/kdnilsen/jdk into share-collector-reserves - fix whitespace - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Fix assert_bounds() assertions when old_trash_not_in_bounds - ... and 36 more: https://git.openjdk.org/jdk/compare/6322aaba...f0d99ae4 ------------- Changes: https://git.openjdk.org/jdk/pull/25357/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=10 Stats: 1223 lines in 24 files changed: 725 ins; 259 del; 239 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From vlivanov at openjdk.org Fri Nov 14 04:14:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 14 Nov 2025 04:14:11 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v2] In-Reply-To: References: Message-ID: <30fb4VWWtF8PPVn1ZTwIMZpmwt7ZB9jR2pHzSaj-e7s=.ed610e8b-0bb3-48a6-baf7-bcce09d5f274@github.com> On Mon, 27 Oct 2025 05:11:47 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge master > - update make_barrier_type > - Merge branch 'openjdk:master' into new_pr > - Merge branch 'openjdk:master' into new_pr > - My chages It would be clearer if ShenandoahGC-specific names explicitly refer to Shenandoah GC (`OptoRuntime::_shenandoah_load_reference_barrier_Type`, `make_shenandoah_load_reference_barrier_Type() `, `shenandoah_load_reference_barrier_Type()`). Otherwise, looks good. ------------- PR Review: https://git.openjdk.org/jdk/pull/27279#pullrequestreview-3462688619 From kbarrett at openjdk.org Fri Nov 14 05:46:08 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 14 Nov 2025 05:46:08 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 05:11:47 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge master > - update make_barrier_type > - Merge branch 'openjdk:master' into new_pr > - Merge branch 'openjdk:master' into new_pr > - My chages > I have put guard on the shenandoah gc specific part of the code. It seems weird to me that a big pile of shenandoah-specific code is being moved into this otherwise GC-agnostic place. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27279#issuecomment-3530924386 From dlong at openjdk.org Fri Nov 14 09:35:09 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 14 Nov 2025 09:35:09 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 05:11:47 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge master > - update make_barrier_type > - Merge branch 'openjdk:master' into new_pr > - Merge branch 'openjdk:master' into new_pr > - My chages I agree with Kim. It seems cleaner to leave Shenandoah code in shenandoahBarrierSetC2.cpp. ------------- Changes requested by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27279#pullrequestreview-3463931863 From shade at openjdk.org Fri Nov 14 09:45:20 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Nov 2025 09:45:20 GMT Subject: RFR: 8371667: Shenandoah: Re-design alloc request type enum for better efficiency and cleaner code [v7] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 01:03:28 GMT, Xiaolong Peng wrote: >> Current alloc request type enum: >> >> enum Type { >> _alloc_shared, // Allocate common, outside of TLAB >> _alloc_shared_gc, // Allocate common, outside of GCLAB/PLAB >> _alloc_cds, // Allocate for CDS >> _alloc_tlab, // Allocate TLAB >> _alloc_gclab, // Allocate GCLAB >> _alloc_plab, // Allocate PLAB >> _ALLOC_LIMIT >> }; >> >> With current design, we have to use switch statement in multiple places resulting in unnecessary branches, for instance the function is_mutator_alloc: >> >> >> inline bool is_mutator_alloc() const { >> switch (_alloc_type) { >> case _alloc_tlab: >> case _alloc_shared: >> case _alloc_cds: >> return true; >> case _alloc_gclab: >> case _alloc_plab: >> case _alloc_shared_gc: >> return false; >> default: >> ShouldNotReachHere(); >> return false; >> } >> } >> >> >> >> In PR, I have re-designed the enum to make the function like is_mutator_alloc much simpler by making the values of the enum follow two simple rules: >> 1. Smaller value for mutator alloc, larger value for gc alloc; GC alloc types are always greater than any of mutator alloc types. >> 2. Odd for lab, even number for non-lab >> >> Three functions have been simplified to one-line impl w/o branches in machine code: >> >> >> inline bool is_mutator_alloc() const { >> return _alloc_type <= _alloc_shared; >> } >> >> inline bool is_gc_alloc() const { >> return _alloc_type >= _alloc_shared_gc; >> } >> >> inline bool is_lab_alloc() const { >> return (_alloc_type & 1) == 1; >> } >> >> >> I didn't check compiled assemble code of hotspot, in instead, I wrote similar/equivalent code and compile with gcc for comparison using godbolt.org: >> >> bool is_lab_alloc(int alloc_type) { >> return (alloc_type & 1) == 1; >> } >> >> bool is_lab_alloc_switch(int alloc_type) { >> switch (alloc_type) { >> case 0: >> case 2: >> case 4: >> return false; >> case 1: >> case 3: >> case 5: >> return true; >> default: >> throw "Should not reach here"; >> >> } >> } >> >> x86_64 assembly code (https://godbolt.org/z/h7xfz8PaT): >> >> is_lab_alloc(int): >> push rbp >> mov rbp, rsp >> mov DWORD PTR [rbp-4], edi >> mov eax, DWORD PTR [rbp-4] >> and eax, 1 >> and eax, 1 >> pop rbp >> ret >> .LC0: >> .string "Should not reach here" >> is_lab_allo... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Fix wrong bit masks(offset by 1) A bit confusing still. How about this: [x|xx|xxx|xx] ^---- Requester: 00 -- mutator 10 -- mutator (CDS) 01 -- GC ^------- Purpose: 00 -- shared 01 -- TLAB/GCLAB 11 -- PLAB ^---------- Affiliation: 00 -- YOUNG 01 -- OLD 11 -- OLD, promotion Then: static constexpr int bit_gc_alloc = 1 << 0; static constexpr int bit_cds_alloc = 1 << 1; static constexpr int bit_lab_alloc = 1 << 2; static constexpr int bit_plab_alloc = 1 << 3; static constexpr int bit_old_alloc = 1 << 4; static constexpr int bit_promotion_alloc = 1 << 5; static constexpr Type _alloc_shared = 0; static constexpr Type _alloc_tlab = bit_lab_alloc; static constexpr Type _alloc_cds = bit_cds_alloc; static constexpr Type _alloc_shared_gc = bit_gc_alloc; static constexpr Type _alloc_shared_gc_old = bit_gc_alloc | bit_old_alloc; static constexpr Type _alloc_shared_gc_promotion = bit_gc_alloc | bit_old_alloc | bit_promotion_alloc; static constexpr Type _alloc_gclab = bit_gc_alloc | bit_lab_alloc; static constexpr Type _alloc_plab = bit_gc_alloc | bit_plab_alloc | bit_old_alloc; ------------- PR Review: https://git.openjdk.org/jdk/pull/28247#pullrequestreview-3463990104 From dlong at openjdk.org Fri Nov 14 09:45:24 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 14 Nov 2025 09:45:24 GMT Subject: RFR: 8347396: Efficient TypeFunc creations [v2] In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 05:11:47 GMT, Harshit470250 wrote: >> This PR do similar changes done by [JDK-8330851](https://bugs.openjdk.org/browse/JDK-8330851) on the GC TypeFunc creation as suggested by [JDK-8347396](https://bugs.openjdk.org/browse/JDK-8347396). As discussed in [https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686,](https://github.com/openjdk/jdk/pull/21782#discussion_r1906535686) I have put guard on the shenandoah gc specific part of the code. > > Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge master > - update make_barrier_type > - Merge branch 'openjdk:master' into new_pr > - Merge branch 'openjdk:master' into new_pr > - My chages src/hotspot/share/opto/runtime.cpp line 2413: > 2411: _dtrace_object_alloc_Type = make_dtrace_object_alloc_Type(); > 2412: _clone_type_Type = make_clone_type_Type(); > 2413: #if INCLUDE_SHENANDOAHGC A lot of the initializations in this function could be skipped based on runtime flags. Should we check `UseShenandoahGC` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27279#discussion_r2526743350 From duke at openjdk.org Fri Nov 14 20:44:24 2025 From: duke at openjdk.org (Nityanand Rai) Date: Fri, 14 Nov 2025 20:44:24 GMT Subject: RFR: 8371852: Shenandoah: Remove unused =?UTF-8?B?U2hlbmFuZG9haEZyZWVTZXQ6Ol9hbGxvY2F0ZWRfc2luY+KApg==?= Message-ID: Removed the unused _allocated_since_gc_start[UIntNumPartitions] field from ShenandoahRegionPartitions class. This field became obsolete after JDK-8365880 unified memory usage accounting in ShenandoahFreeSet but was not cleaned up. ------------- Commit messages: - 8371852: Shenandoah: Remove unused ShenandoahFreeSet::_allocated_since_gc_start field Changes: https://git.openjdk.org/jdk/pull/28332/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28332&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371852 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28332.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28332/head:pull/28332 PR: https://git.openjdk.org/jdk/pull/28332 From duke at openjdk.org Fri Nov 14 21:05:46 2025 From: duke at openjdk.org (Nityanand Rai) Date: Fri, 14 Nov 2025 21:05:46 GMT Subject: RFR: 8371852: Shenandoah: Remove unused =?UTF-8?B?U2hlbmFuZG9haEZyZWVTZXQ6Ol9hbGxvY2F0ZWRfc2luY+KApg==?= [v2] In-Reply-To: References: Message-ID: > Removed the unused _allocated_since_gc_start[UIntNumPartitions] field from ShenandoahRegionPartitions class. This field became obsolete after JDK-8365880 unified memory usage accounting in ShenandoahFreeSet but was not cleaned up. Nityanand Rai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into 8371852-nityanar - 8371852: Shenandoah: Remove unused ShenandoahFreeSet::_allocated_since_gc_start field Removed the unused _allocated_since_gc_start[UIntNumPartitions] field from ShenandoahRegionPartitions class. This field became obsolete after JDK-8365880 unified memory usage accounting in ShenandoahFreeSet but was not cleaned up. Reviewed-by: TBD ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28332/files - new: https://git.openjdk.org/jdk/pull/28332/files/e7dbd54e..887d25e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28332&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28332&range=00-01 Stats: 164817 lines in 903 files changed: 113165 ins; 23348 del; 28304 mod Patch: https://git.openjdk.org/jdk/pull/28332.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28332/head:pull/28332 PR: https://git.openjdk.org/jdk/pull/28332 From duke at openjdk.org Fri Nov 14 21:49:46 2025 From: duke at openjdk.org (Nityanand Rai) Date: Fri, 14 Nov 2025 21:49:46 GMT Subject: RFR: 8371854: Shenandoah - Simplify WALK_FORWARD_IN_BLOCK_START use Message-ID: Replace define/undefine pattern with #ifdef ASSERT block ------------- Commit messages: - Update JDK-8371854 fix: Simplify WALK_FORWARD_IN_BLOCK_START macro usage Changes: https://git.openjdk.org/jdk/pull/28333/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28333&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371854 Stats: 6 lines in 1 file changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28333/head:pull/28333 PR: https://git.openjdk.org/jdk/pull/28333 From ysr at openjdk.org Sat Nov 15 00:22:16 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 15 Nov 2025 00:22:16 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild [v3] In-Reply-To: References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: On Thu, 13 Nov 2025 18:12:38 GMT, Kelvin Nilsen wrote: >> This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. >> >> This addresses a problem that results if available memory is probed while we are rebuilding the freeset. >> >> Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge remote-tracking branch 'jdk/master' into synchronize-available-with-rebuild > - update comment > - Add documentation for _rebuild_lock > - Hide rebuild_lock inside prepare_to_rebuild and finish_rebuild > - Rename rebuild_lock() > - Tighten up context for holding rebuild_lock > - Remove ShenandoahFreeSet::FreeSetUnderConstruction sentinel value > - Revert "revert introduction of RebuildLock" > > This reverts commit bec73da1dc169d391e9919203e5a406ea02a699c. > - Revert "available() returns previous value if called during freeset rebuild" > > This reverts commit 1a5e483a4abb04b6045175e8bd4b0c11fa68cb73. > - Revert "remove obsolete assertion" > > This reverts commit 717e7da17f03f1e52008d154fafcbbfc5f2bb20e. > - ... and 7 more: https://git.openjdk.org/jdk/compare/bfc048ab...8462a290 src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 790: > 788: inline size_t available() { > 789: shenandoah_assert_not_heaplocked(); > 790: ShenandoahRebuildLocker locker(rebuild_lock()); May be motivate in a brief comment why we need the rebuild lock in this API, but not around the other APIs such as capacity() and used()? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27612#discussion_r2529307229 From ysr at openjdk.org Sat Nov 15 00:36:45 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 15 Nov 2025 00:36:45 GMT Subject: RFR: 8369048: GenShen: Defer ShenFreeSet::available() during rebuild [v3] In-Reply-To: References: <_PEoOc0oWb8Vzq16-Or_hykkL4NkIrwEFgLCgCRac5U=.2c23c497-acbe-48f9-a1dc-4eb4e8f25a8d@github.com> Message-ID: On Thu, 13 Nov 2025 18:12:38 GMT, Kelvin Nilsen wrote: >> This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition. >> >> This addresses a problem that results if available memory is probed while we are rebuilding the freeset. >> >> Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge remote-tracking branch 'jdk/master' into synchronize-available-with-rebuild > - update comment > - Add documentation for _rebuild_lock > - Hide rebuild_lock inside prepare_to_rebuild and finish_rebuild > - Rename rebuild_lock() > - Tighten up context for holding rebuild_lock > - Remove ShenandoahFreeSet::FreeSetUnderConstruction sentinel value > - Revert "revert introduction of RebuildLock" > > This reverts commit bec73da1dc169d391e9919203e5a406ea02a699c. > - Revert "available() returns previous value if called during freeset rebuild" > > This reverts commit 1a5e483a4abb04b6045175e8bd4b0c11fa68cb73. > - Revert "remove obsolete assertion" > > This reverts commit 717e7da17f03f1e52008d154fafcbbfc5f2bb20e. > - ... and 7 more: https://git.openjdk.org/jdk/compare/bfc048ab...8462a290 This looks good to me. Curious if any performance delta was noted in fresh measurements following this final shape of fix. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27612#pullrequestreview-3467303404 From kdnilsen at openjdk.org Sat Nov 15 00:55:18 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 15 Nov 2025 00:55:18 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v12] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Do not reset generation reserves at end of GC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/f0d99ae4..6b62448d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=10-11 Stats: 29 lines in 3 files changed: 0 ins; 29 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From ysr at openjdk.org Sat Nov 15 01:56:05 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 15 Nov 2025 01:56:05 GMT Subject: RFR: 8370039: GenShen: array copy SATB barrier improvements [v3] In-Reply-To: <4pRORBaXYXwyCJyUp3BKA4I8bHlTfkfNldK9EnDJvZw=.b0a53f9b-a9a0-4c75-a823-7cf82f69a40b@github.com> References: <4pRORBaXYXwyCJyUp3BKA4I8bHlTfkfNldK9EnDJvZw=.b0a53f9b-a9a0-4c75-a823-7cf82f69a40b@github.com> Message-ID: On Tue, 11 Nov 2025 00:33:36 GMT, William Kemper wrote: >> When an array copy happens concurrently with old and young marking, Shenandoah's generational mode walks over the array twice. This is unnecessary and increases the workload for marking threads. It also has been unconditionally enqueuing old references during a young mark. This is also unnecessary and also increases marking workload. Finally, the barrier went through a somewhat complicated decision process based on affiliation of the region where the array resides. However, the barrier must consider the affiliation of objects that are pointed at by array elements. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Revert "We can also filter out old when striclty marking young" > > This reverts commit c53c4f23f4401785e1049494b6c4e4b92f9a5701. Thank you for the clean up and corrections. I am curious how you found this issue -- nice catch! Any performance data to share, may be even from a microbenchmark perhaps or any other benchmark that exercises array copying -- may be something in DaCapo/Renaissance? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28183#pullrequestreview-3467459541