From stefank at openjdk.org Tue Apr 1 07:04:54 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 1 Apr 2025 07:04:54 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 Message-ID: We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. Thanks to @plummercj for digging into this and proposing the same workaround. Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline ------------- Commit messages: - 8352994: ZGC: Fix regression introduced in JDK-8350572 Changes: https://git.openjdk.org/jdk/pull/24349/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24349&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352994 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24349/head:pull/24349 PR: https://git.openjdk.org/jdk/pull/24349 From cjplummer at openjdk.org Tue Apr 1 07:34:10 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 1 Apr 2025 07:34:10 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 In-Reply-To: References: Message-ID: <1S8NSOeUGbiCGZVwqiX0WGoHBguDWHvwwsxziFaFdtk=.3f5d4b1a-8d4e-47c3-a72c-9b8fc00e529d@github.com> On Tue, 1 Apr 2025 06:58:56 GMT, Stefan Karlsson wrote: > We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). > > The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. > > Thanks to @plummercj for digging into this and proposing the same workaround. > > Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline I think you should also remove com/sun/jdi/JdbStopInNotificationThreadTest.java from the ZGC problem list. ------------- PR Review: https://git.openjdk.org/jdk/pull/24349#pullrequestreview-2731743846 From manc at openjdk.org Tue Apr 1 08:33:52 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 1 Apr 2025 08:33:52 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v4] In-Reply-To: References: Message-ID: <_pxXWVlRMa_NcaIQWm6RS_CCrMuHpKZiKIXzxJuer6g=.ba7c6007-cc1f-44a4-b7cd-dd55f3322c65@github.com> > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Add two tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/6f201fac..fc22cbfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=02-03 Stats: 162 lines in 2 files changed: 162 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From tschatzl at openjdk.org Tue Apr 1 08:43:01 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 1 Apr 2025 08:43:01 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure Message-ID: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Hi all, please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). This has been made possible with the refactoring of object array task queues. At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). Testing: tier1-5, some perf testing with no differences Thanks, Thomas ------------- Commit messages: - 8271870 Changes: https://git.openjdk.org/jdk/pull/24222/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24222&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8271870 Stats: 101 lines in 3 files changed: 46 ins; 32 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/24222.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24222/head:pull/24222 PR: https://git.openjdk.org/jdk/pull/24222 From manc at openjdk.org Tue Apr 1 08:44:55 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 1 Apr 2025 08:44:55 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v4] In-Reply-To: <_pxXWVlRMa_NcaIQWm6RS_CCrMuHpKZiKIXzxJuer6g=.ba7c6007-cc1f-44a4-b7cd-dd55f3322c65@github.com> References: <_pxXWVlRMa_NcaIQWm6RS_CCrMuHpKZiKIXzxJuer6g=.ba7c6007-cc1f-44a4-b7cd-dd55f3322c65@github.com> Message-ID: <0rUbRHQuIv6bhZEiaalc5Qcfq5E7FJb51TtEf9qeYTk=.b084a316-7352-4c1b-8bea-5485740704e9@github.com> On Tue, 1 Apr 2025 08:33:52 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add two tests This PR is ready for review. Included tests cover important functionality of `SoftMaxHeapSize`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2768618593 From manc at openjdk.org Tue Apr 1 08:44:55 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 1 Apr 2025 08:44:55 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Revise test summary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/fc22cbfe..68f03cad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=03-04 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From tschatzl at openjdk.org Tue Apr 1 08:57:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 1 Apr 2025 08:57:58 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:44:55 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Revise test summary Initial comments. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2066: > 2064: size_t G1CollectedHeap::soft_max_capacity() const { > 2065: return clamp(align_up(SoftMaxHeapSize, HeapAlignment), MinHeapSize, > 2066: max_capacity()); Maybe this clamping of `SoftMaxHeapSize` should be part of argument processing. src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 1203: > 1201: size_t max_capacity() const override; > 1202: > 1203: // Print the soft maximum heap capacity. Suggestion: // Returns the soft maximum heap capacity. src/hotspot/share/gc/g1/g1IHOPControl.cpp line 119: > 117: return (size_t)MIN2( > 118: G1CollectedHeap::heap()->soft_max_capacity() * (100.0 - safe_total_heap_percentage) / 100.0, > 119: _target_occupancy * (100.0 - _heap_waste_percent) / 100.0 This looks wrong. G1ReservePercent is in some way similar to soft max heap size, intended to keep the target below the real maximum capacity. I.e. it is not intended that G1 keeps another reserve of G1ReservePercent size below soft max capacity (which is below maximum capacity). There has been some internal discussion about whether the functionality of G1ReservePercent and SoftMaxHeapSize is too similar to warrant the former, but removing it is another issue. Imo, SoftMaxHeapSize should be an separate, actual target for this calculation. (`default_conc_mark_start_threshold()` also does not subtract `G1ReservePercent` from `SoftMaxHeapSize`). test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java line 29: > 27: * @test > 28: * @bug 8236073 > 29: * @requires vm.gc.G1 & vm.opt.ExplicitGCInvokesConcurrent != true It's nicer to put and-ed conditions in separate lines. test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java line 46: > 44: private static final long ALLOCATED_BYTES = 20_000_000; // About 20M > 45: private static final long MAX_HEAP_SIZE = > 46: 200 * 1024 * 1024; // 200MiB, must match -Xmx on command line. Is it possible to get that value from the `MemoryMXBean` instead of relying on manual update? I.e. `getMax()`? ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24211#pullrequestreview-2731934928 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2022415626 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2022415016 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2022430412 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2022434814 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2022438436 From tschatzl at openjdk.org Tue Apr 1 09:24:12 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 1 Apr 2025 09:24:12 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v29] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq - * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update - * fix IR code generation tests that change due to barrier cost changes - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - * more documentation on why we need to rendezvous the gc threads - Merge branch 'master' into 8342381-card-table-instead-of-dcq - ... and 27 more: https://git.openjdk.org/jdk/compare/aff5aa72...51fb6e63 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=28 Stats: 7089 lines in 110 files changed: 2610 ins; 3555 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From iwalulya at openjdk.org Tue Apr 1 10:55:27 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 1 Apr 2025 10:55:27 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:44:55 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Revise test summary With the changes to `young_collection_expansion_amount()`, once we reach the `SoftMaxHeapSize`, we cannot expand the heap except during GC where expansion can happen without regard for `SoftMaxHeapSize`. Thus, after exceeding `SoftMaxHeapSize` we go into a phase of repeated GCs where we expand the heap almost one region at a time. Is this the expected effect of the `SoftMaxHeapSize` as implemented by this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2768966455 From stefan.johansson at oracle.com Tue Apr 1 12:49:10 2025 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 1 Apr 2025 14:49:10 +0200 Subject: [EXTERNAL] Re: RFC: G1 as default collector (for real this time) In-Reply-To: References: <74d05686-9c57-4262-881d-31c269f34bc5@oracle.com> <61CEE33A-6718-479D-A498-697C1063B5AA@oracle.com> Message-ID: <792ad340-5160-413b-b766-c49b4ff6d4c5@oracle.com> Thanks for sharing these results Monica, As Thomas mentioned we have done some testing comparing Serial to G1 in small environments as well. Our conclusions are similar to yours, G1 nowdays handles the small environments pretty good. I used SPECjbb2005, and my focus was to compare throughput given a fixed memory usage. The reason for this is that the low native memory overhead of Serial (no marking bitmap etc) is often used as an argument to use it in small environments. On the other hand, the region based heap layout of G1 can in many cases offer a better out of the box heap utilization compared to Serial. To test this and to make a fair comparison I configure Serial to have a slightly larger heap to get an overall equal memory consumption (using the peak PSS usage in Linux as the measure). SpecJBB2005 by default runs 1 to 8 warehouses, where warehouses corresponds to worker threads. I did run this in a cgroup environment with 1CPU and 1G memory. By default this will give G1 a 256m max heap, which I fixed using Xmx and Xms. To let Serial use as much memory in total as G1 I configured it with a 288MB heap. With this setup Serial and G1 get a very similar score with a recent JDK 25 build. The calculated score only takes warehouse 1 and 2 into account and looking at the result/score for 8 warehouses G1 is ~10% better. So it looks like G1 is able to handle high pressure better compared to Serial. These results are without the new improved barriers for G1, when using a build with the new barrier the G1 results are improved by roughly 3%. This is a use-case not at all caring about latency and the fact the G1 is still performing this good, also points towards it being a suitable default even for small environments. I've also played around a bit with restricting the amount of concurrent work done with G1, to see how a G1 STW-only mode would perform, and on a single CPU system this looks beneficial when we start to run with more worker threads. But I don't suspect it's that common to run small cloud services at 100% load, so having a default that can do concurrent work seems reasonable. Thanks, Stefan On 2025-03-18 00:59, Monica Beckwith wrote: > Hi Thomas, Erik, and all, > > This is an important and timely discussion, and I appreciate the > insights on how the gap between SerialGC and G1GC has diminished over > time. Based on recent comparative tests of out-of-the-box GC > configurations (-Xmx only), I wanted to share some data-backed > observations that might help validate this shift. > > I tested G1GC and SerialGC under 1-core/2GB and 2-core/2GB > containerized environments (512MB < -Xmx <1.5GB), running SPECJBB2015 > with and without stress tests. The key findings: > > *Throughput (max_jOPS & critical_jOPS):* > > * > G1GC consistently outperforms SerialGC. > * > 1 core: G1GC shows a 1.78? increase in max_jOPS. > * > 2 cores: G1GC shows a 2.84? improvement over SerialGC. > > > *Latency and Stop-the-World (STW) Impact:* > > * > SerialGC struggles under stress, with frequent full GCs leading to > long pauses. > * > G1GC?s incremental?collections keep pause times lower, especially > under stress load. > * > critical_jOPS, a key SLA metric, is 4.5? higher for G1GC on 2 cores. > > > *Memory Behavior & Stability:* > > * > In 512MB heap configurations, SerialGC encountered OOM failures > due to heap exhaustion. > > > Given these results, it seems reasonable to reconsider why SerialGC > remains the default in small environments when G1GC offers clear > performance and stability advantages. > > Looking forward to thoughts on this. > > Best, > Monica > > P.S.: I haven?t tested for <512MB heaps yet, as that requires a > different test config I?m still working on. I?d also love to hear from > anyone running single-threaded, CPU-bound workloads if they have > observations to share. > > > ------------------------------------------------------------------------ > *From:*?hotspot-gc-dev on behalf of > Thomas Schatzl > *Sent:*?Monday, February 24, 2025 2:33 AM > *To:*?Erik Osterlund > *Cc:* hotspot-gc-dev at openjdk.org > *Subject:*?[EXTERNAL] Re: RFC: G1 as default collector (for real this > time) > Hi, > > On 21.02.25 15:02, Erik Osterlund wrote: > > Hi Thomas, > > > [...]> There is however a flip side for that argument on the other side > of the scaling spectrum, where ZGC is probably a better fit on the even > larger scale. So while it?s true that the effect of a Serial -> G1 > default change is a static default GC, I just think we should mind the > fact that there is more uncertainty on the larger end of the scale. I?m > not proposing any changes, just saying that maybe we should be careful > about stressing the importance of having a static default GC, if we > don?t know if that is the better strategy on the larger end of the scale > or not, going forward. > > +1 > > Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Tue Apr 1 16:09:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 1 Apr 2025 16:09:20 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:40:09 GMT, Thomas Schatzl wrote: >> Man Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Revise test summary > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2066: > >> 2064: size_t G1CollectedHeap::soft_max_capacity() const { >> 2065: return clamp(align_up(SoftMaxHeapSize, HeapAlignment), MinHeapSize, >> 2066: max_capacity()); > > Maybe this clamping of `SoftMaxHeapSize` should be part of argument processing. Ignore this - `SoftMaxHeapsize` is managable after all. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2023162750 From wkemper at openjdk.org Tue Apr 1 18:21:12 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 1 Apr 2025 18:21:12 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:17:51 GMT, Kelvin Nilsen wrote: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 78: > 76: _live_data(0), > 77: _critical_pins(0), > 78: _mixed_candidate_garbage_words(0), Do we need a new field to track this? During `final_mark`, we call `increase_live_data_alloc_words` to add `TAMS + top` to `_live_data` to account for objects allocated during mark. Could we "fix" `get_live_data` so that it always returned marked objects (counted by `increase_live_data_gc_words`) _plus_ `top - TAMS`. This way, the live data would not become stale after `final_mark` and we wouldn't have another field to manage. What do you think? src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 159: > 157: > 158: inline size_t ShenandoahHeapRegion::get_mixed_candidate_live_data_bytes() const { > 159: assert(SafepointSynchronize::is_at_safepoint(), "Should be at Shenandoah safepoint"); Could we use `shenandoah_assert_safepoint` here (and other places) instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/24319#pullrequestreview-2733584314 PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2023461623 PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2023396124 From manc at openjdk.org Tue Apr 1 20:54:36 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 1 Apr 2025 20:54:36 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v6] In-Reply-To: References: Message-ID: <3tPGLO7tcSAMgLFlLTlQCXWZ1Dvlk4xInkqdxoYTxwM=.5b8740c2-8ed3-4387-8a50-325007ed027e@github.com> > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Address comments and try fixing test failure on macos-aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/68f03cad..0bc55654 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=04-05 Stats: 12 lines in 3 files changed: 2 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From manc at openjdk.org Tue Apr 1 20:54:37 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 1 Apr 2025 20:54:37 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:48:53 GMT, Thomas Schatzl wrote: >> Man Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Revise test summary > > src/hotspot/share/gc/g1/g1IHOPControl.cpp line 119: > >> 117: return (size_t)MIN2( >> 118: G1CollectedHeap::heap()->soft_max_capacity() * (100.0 - safe_total_heap_percentage) / 100.0, >> 119: _target_occupancy * (100.0 - _heap_waste_percent) / 100.0 > > This looks wrong. G1ReservePercent is in some way similar to soft max heap size, intended to keep the target below the real maximum capacity. > I.e. it is not intended that G1 keeps another reserve of G1ReservePercent size below soft max capacity (which is below maximum capacity). > > There has been some internal discussion about whether the functionality of G1ReservePercent and SoftMaxHeapSize is too similar to warrant the former, but removing it is another issue. > > Imo, SoftMaxHeapSize should be an separate, actual target for this calculation. (`default_conc_mark_start_threshold()` also does not subtract `G1ReservePercent` from `SoftMaxHeapSize`). Thanks. Yes, that makes sense. Now it uses `MIN3` to take `soft_max_capacity()` as a separate constraint. > test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java line 46: > >> 44: private static final long ALLOCATED_BYTES = 20_000_000; // About 20M >> 45: private static final long MAX_HEAP_SIZE = >> 46: 200 * 1024 * 1024; // 200MiB, must match -Xmx on command line. > > Is it possible to get that value from the `MemoryMXBean` instead of relying on manual update? I.e. `getMax()`? Yes, it is a good idea. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2023659889 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2023660401 From wkemper at openjdk.org Tue Apr 1 22:27:07 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 1 Apr 2025 22:27:07 GMT Subject: RFR: 8351892: GenShen: Remove enforcement of generation sizes [v2] In-Reply-To: References: <-BEi4FpPLjKx07-J7ix9fHkKVhkcYylA0ojI-a1zrJs=.a3c073d3-7e52-46fd-8e2a-1ea601bd2074@github.com> Message-ID: On Sat, 29 Mar 2025 00:08:06 GMT, Kelvin Nilsen wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Don't let old have the entire heap > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalFullGC.cpp line 120: > >> 118: if (old_capacity > old_usage) { >> 119: size_t excess_old_regions = (old_capacity - old_usage) / ShenandoahHeapRegion::region_size_bytes(); >> 120: gen_heap->transfer_to_young(excess_old_regions); > > should we assert result is successful? Or replace with force_transfer? (just seems bad practice to ignore a status result) Yes, will try an assert here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24268#discussion_r2023754542 From wkemper at openjdk.org Tue Apr 1 22:44:35 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 1 Apr 2025 22:44:35 GMT Subject: RFR: 8351892: GenShen: Remove enforcement of generation sizes [v3] In-Reply-To: References: Message-ID: > * The option to configure minimum and maximum sizes for the young generation have been combined into `ShenandoahInitYoungPercentage`. > * The remaining functionality in `shGenerationSizer` wasn't enough to warrant being its own class, so the functionality was rolled into `shGenerationalHeap`. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Simplify confusing (and confused) comment - Assert that region transfers succeed when expected - Merge remote-tracking branch 'jdk/master' into stop-enforcing-gen-size-limits - Don't let old have the entire heap - Stop enforcing young/old generation sizes. Move what's left of generation sizing logic into shGenerationalHeap. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24268/files - new: https://git.openjdk.org/jdk/pull/24268/files/bc171089..33a2f19d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24268&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24268&range=01-02 Stats: 18299 lines in 378 files changed: 10486 ins; 6499 del; 1314 mod Patch: https://git.openjdk.org/jdk/pull/24268.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24268/head:pull/24268 PR: https://git.openjdk.org/jdk/pull/24268 From wkemper at openjdk.org Tue Apr 1 22:44:36 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 1 Apr 2025 22:44:36 GMT Subject: RFR: 8351892: GenShen: Remove enforcement of generation sizes [v2] In-Reply-To: References: <-BEi4FpPLjKx07-J7ix9fHkKVhkcYylA0ojI-a1zrJs=.a3c073d3-7e52-46fd-8e2a-1ea601bd2074@github.com> Message-ID: On Sat, 29 Mar 2025 00:10:28 GMT, Kelvin Nilsen wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Don't let old have the entire heap > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalHeap.cpp line 134: > >> 132: ShenandoahHeap::initialize_heuristics(); >> 133: >> 134: // Max capacity is the maximum _allowed_ capacity. This means the sum of the maximum > > I don't understand the relevance of this comment. Is there still a mximum allowed for old and a maximum allowed for young? This comment stemmed from own confusion over fields and variables called _max_ `capacity` . I would like to rename the `_max_capacity` field to just `_capacity`. In my mind, the _max_ should be immutable, but that isn't how Shenandoah uses this field. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24268#discussion_r2023766431 From jsikstro at openjdk.org Wed Apr 2 06:57:22 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 2 Apr 2025 06:57:22 GMT Subject: RFR: 8353471: ZGC: Redundant generation id in ZGeneration Message-ID: The ZGeneration class (and in turn ZGenerationOld and ZGenerationYoung) keeps track of its own ZGenerationId, which means that the generation id does not need to be passed along as an argument when calling internal functions. I've removed the id parameter from `ZGeneration::select_relocation_set` in favor of using the member variable `_id`. ------------- Commit messages: - 8353471: ZGC: Redundant generation id in ZGeneration Changes: https://git.openjdk.org/jdk/pull/24374/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24374&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353471 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24374.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24374/head:pull/24374 PR: https://git.openjdk.org/jdk/pull/24374 From stefank at openjdk.org Wed Apr 2 07:06:13 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 07:06:13 GMT Subject: RFR: 8353471: ZGC: Redundant generation id in ZGeneration In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:52:49 GMT, Joel Sikstr?m wrote: > The ZGeneration class (and in turn ZGenerationOld and ZGenerationYoung) keeps track of its own ZGenerationId, which means that the generation id does not need to be passed along as an argument when calling internal functions. > > I've removed the id parameter from `ZGeneration::select_relocation_set` in favor of using the member variable `_id`. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24374#pullrequestreview-2734854851 From eosterlund at openjdk.org Wed Apr 2 10:01:34 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 2 Apr 2025 10:01:34 GMT Subject: RFR: 8353471: ZGC: Redundant generation id in ZGeneration In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:52:49 GMT, Joel Sikstr?m wrote: > The ZGeneration class (and in turn ZGenerationOld and ZGenerationYoung) keeps track of its own ZGenerationId, which means that the generation id does not need to be passed along as an argument when calling internal functions. > > I've removed the id parameter from `ZGeneration::select_relocation_set` in favor of using the member variable `_id`. Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24374#pullrequestreview-2735717264 From ayang at openjdk.org Wed Apr 2 10:15:48 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 2 Apr 2025 10:15:48 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure In-Reply-To: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: On Tue, 25 Mar 2025 10:35:58 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). > > This has been made possible with the refactoring of object array task queues. > > At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). > > Testing: tier1-5, some perf testing with no differences > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24222#pullrequestreview-2735758122 From stefank at openjdk.org Wed Apr 2 11:15:01 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 11:15:01 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 [v2] In-Reply-To: References: Message-ID: > We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). > > The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. > > Thanks to @plummercj for digging into this and proposing the same workaround. > > Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Remove test from ProblemList ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24349/files - new: https://git.openjdk.org/jdk/pull/24349/files/8db3f6d0..fe07a340 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24349&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24349&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24349/head:pull/24349 PR: https://git.openjdk.org/jdk/pull/24349 From stefank at openjdk.org Wed Apr 2 11:15:02 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 11:15:02 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 06:58:56 GMT, Stefan Karlsson wrote: > We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). > > The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. > > Thanks to @plummercj for digging into this and proposing the same workaround. > > Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline I've removed the test and will run tier1-tier3. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24349#issuecomment-2772225278 From stefank at openjdk.org Wed Apr 2 11:47:53 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 11:47:53 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: <4Uyw00r7p9C-1BSfQRNEQ0p5td8RylD7YVLOHj6HODM=.47100abf-8467-4b47-9edb-c30877152c56@github.com> On Wed, 2 Apr 2025 11:35:36 GMT, Stefan Karlsson wrote: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. I moved this PR from hotspot to hotspot-gc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2772303530 From eosterlund at openjdk.org Wed Apr 2 11:58:57 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 2 Apr 2025 11:58:57 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:35:36 GMT, Stefan Karlsson wrote: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24377#pullrequestreview-2736002080 From tschatzl at openjdk.org Wed Apr 2 13:04:08 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 2 Apr 2025 13:04:08 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: <0xr7VMlEH9EAc8XB9HQKPdxOHUcLfwtZkNAkGrTPu_k=.72d5e5be-373f-4db2-bbfb-9026c82e3c94@github.com> On Tue, 1 Apr 2025 20:57:36 GMT, Man Cao wrote: > > With the changes to `young_collection_expansion_amount()`, once we reach the `SoftMaxHeapSize`, we cannot expand the heap except during GC where expansion can happen without regard for `SoftMaxHeapSize`. Thus, after exceeding `SoftMaxHeapSize` we go into a phase of repeated GCs where we expand the heap almost one region at a time. Is this the expected effect of the `SoftMaxHeapSize` as implemented by this patch? > > Yes. This is the expected behavior if user sets `SoftMaxHeapSize` too small. G1 will try its best to respect `SoftMaxHeapSize`, which could cause GC thrashing. However, it won't cause `OutOfMemoryError`. This problem is due to user's misconfiguration of `SoftMaxHeapSize`, which is similar to user misconfiguring `Xmx` to be too small. The original patch on the CR only set the guidance for the marking. It did not interact with heap sizing directly at all like the change does. What is the reason for this change? (Iirc, in tests long time ago, with that original patch, and also adapting `Min/MaxHeapFreeRatio`, did result the desired effect of G1/`SoftMaxHeapSize` decreasing the heap appropriately. Without it, the heap will almost never change, but that is expected how `Mindoes not work). So similar to @walulyai I would strongly prefer for `SoftMaxHeapSize` not interfere that much with the application's performance. To me, this behavior is not "soft", and there seems to be general consensus internally about allowing unbounded cpu usage for GC. Afaiu in ZGC, if heap grows beyond `SoftMaxHeapSize`, GC activity can grow up to 25% of cpu usage (basically maxing out concurrent threads). That could be a reasonable guidance as well here. GC thrashing will also prevent progress with marking, and actually cause more marking because of objects not having enough time to die. This just makes the situation worse until the heap gets scaled back to `SoftMaxHeapSize`. However at the moment, changing the GC activity threshold internally will not automatically shrink the heap as you would expect, since currently shrinking is controlled by marking using the `Min/MaxHeapFreeRatio` flags. That gets us back to (JDK-8238687)[https://bugs.openjdk.org/browse/JDK-8238687] and (JDK-8248324)[https://bugs.openjdk.org/browse/JDK-8248324]... @walulyai is currently working on the former issue again, testing it, maybe you two could together on that to see whether basing this work on what @walulyai is cooking up is a better way forward, if needed modifying `gctimeratio` if we are above `SoftMaxHeapSize`? Otherwise, if there really is need to get this functionality asap, even only making it a guide for the marking should at least give some effect (but I think without changing `Min/MaxHeapFreeRatio` at the same time there is not much effect anyway). But that is a fairly coarse and indirect way of getting the necessary effect to shrink the heap. We should not limit ourselves to what mainline provides at the moment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2772493942 From tschatzl at openjdk.org Wed Apr 2 13:04:11 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 2 Apr 2025 13:04:11 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v6] In-Reply-To: <3tPGLO7tcSAMgLFlLTlQCXWZ1Dvlk4xInkqdxoYTxwM=.5b8740c2-8ed3-4387-8a50-325007ed027e@github.com> References: <3tPGLO7tcSAMgLFlLTlQCXWZ1Dvlk4xInkqdxoYTxwM=.5b8740c2-8ed3-4387-8a50-325007ed027e@github.com> Message-ID: On Tue, 1 Apr 2025 20:54:36 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Address comments and try fixing test failure on macos-aarch64 There also seems to be a concurrency issue with reading the `SoftMaxHeapSize` variable: Since the flag is manageable, at least outside of safepoints (afaict `jcmd` is blocked by safepoints, but I'll ask), the variable can be written to it at any time. So e.g. the assignment of `G1IHOPControl::get_conc_mark_start_threshold` to `marking_initiating_used_threshold` in that call can be inlined in `G1Policy::need_to_start_conc_mark` (called by the mutator in `G1CollectedHeap::attempt_allocation_humongous`) in multiple places, and so `SoftMaxHeapSize` re-read with multiple different values in that method. Probably an `Atomic::load(&SoftMaxHeapSize)` in the getter is sufficient for that. The other multiple re-readings of the `soft_max_capacity()` in the safepoint seem okay - I do not think there is a way to update the value within a safepoint externally. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2772496003 From zgu at openjdk.org Wed Apr 2 13:24:56 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 2 Apr 2025 13:24:56 GMT Subject: RFR: 8353263: Parallel: Remove locking in PSOldGen::resize In-Reply-To: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> References: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> Message-ID: On Mon, 31 Mar 2025 09:45:23 GMT, Albert Mingkun Yang wrote: > Simple removing the use of `PSOldGenExpand_lock` in resizing logic after full-gc, because the calling context is inside a safepoint. > > Test: tier1-5 LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24322#pullrequestreview-2736263356 From stuefe at openjdk.org Wed Apr 2 13:30:51 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 2 Apr 2025 13:30:51 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:35:36 GMT, Stefan Karlsson wrote: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. Okay. Curious, was this a day zero problem? Incidentally, I remember that we had a problem with NUMA on windows where we only released the first NUMA stripe, leaving the other stripes around for future commits to trip over. But ZGC is probably not affected by that, since it does not use os::reserve/release_memory, right? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24377#pullrequestreview-2736284463 From stefank at openjdk.org Wed Apr 2 14:06:06 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 14:06:06 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 13:28:37 GMT, Thomas Stuefe wrote: > Okay. > > Curious, was this a day zero problem? I think it was. For completeness, this is the unreserve paths you need to hit to hit this bug: bool XVirtualMemoryManager::reserve_contiguous(uintptr_t start, size_t size) { assert(is_aligned(size, XGranuleSize), "Must be granule aligned"); // Reserve address views const uintptr_t marked0 = XAddress::marked0(start); const uintptr_t marked1 = XAddress::marked1(start); const uintptr_t remapped = XAddress::remapped(start); // Reserve address space if (!pd_reserve(marked0, size)) { return false; } if (!pd_reserve(marked1, size)) { pd_unreserve(marked0, size); return false; } if (!pd_reserve(remapped, size)) { pd_unreserve(marked0, size); pd_unreserve(marked1, size); return false; } // Register address views with native memory tracker nmt_reserve(marked0, size); nmt_reserve(marked1, size); nmt_reserve(remapped, size); // Make the address range free _manager.free(start, size); return true; } > > Incidentally, I remember that we had a problem with NUMA on windows where we only released the first NUMA stripe, leaving the other stripes around for future commits to trip over. But ZGC is probably not affected by that, since it does not use os::reserve/release_memory, right? It doesn't sound like ZGC would be affected by that. At least not via those APIs. FWIW, I've identified another corner-case bug on Windows that only happens if we end up allocating discontiguous heaps, which only every happens if all our attempts to allocate a contiguous heap fails. I'm in the process of trying to write a test showing this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2772671014 From ayang at openjdk.org Wed Apr 2 14:22:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 2 Apr 2025 14:22:55 GMT Subject: RFR: 8353263: Parallel: Remove locking in PSOldGen::resize In-Reply-To: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> References: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> Message-ID: On Mon, 31 Mar 2025 09:45:23 GMT, Albert Mingkun Yang wrote: > Simple removing the use of `PSOldGenExpand_lock` in resizing logic after full-gc, because the calling context is inside a safepoint. > > Test: tier1-5 Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24322#issuecomment-2772714981 From ayang at openjdk.org Wed Apr 2 14:22:56 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 2 Apr 2025 14:22:56 GMT Subject: Integrated: 8353263: Parallel: Remove locking in PSOldGen::resize In-Reply-To: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> References: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> Message-ID: On Mon, 31 Mar 2025 09:45:23 GMT, Albert Mingkun Yang wrote: > Simple removing the use of `PSOldGenExpand_lock` in resizing logic after full-gc, because the calling context is inside a safepoint. > > Test: tier1-5 This pull request has now been integrated. Changeset: a0677d94 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/a0677d94d8c83a75cee054700e098faa97edca3c Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod 8353263: Parallel: Remove locking in PSOldGen::resize Reviewed-by: tschatzl, zgu ------------- PR: https://git.openjdk.org/jdk/pull/24322 From iwalulya at openjdk.org Wed Apr 2 15:12:02 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 2 Apr 2025 15:12:02 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure In-Reply-To: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: On Tue, 25 Mar 2025 10:35:58 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). > > This has been made possible with the refactoring of object array task queues. > > At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). > > Testing: tier1-5, some perf testing with no differences > > Thanks, > Thomas Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24222#pullrequestreview-2736649318 From manc at openjdk.org Wed Apr 2 16:00:33 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 2 Apr 2025 16:00:33 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v7] In-Reply-To: References: Message-ID: > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Fix test failure on macos-aarch64 by using power-of-two sizes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/0bc55654..4435e89f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=05-06 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From stuefe at openjdk.org Wed Apr 2 16:16:06 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 2 Apr 2025 16:16:06 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: <639NoIyfKt-nwS-Pn2ia-83bQUjAykMzL0YKd8rSO7I=.8973dd8d-686c-42a5-95b5-443ca005ad4f@github.com> On Wed, 2 Apr 2025 14:03:36 GMT, Stefan Karlsson wrote: >> Okay. >> Curious, was this a day zero problem? > I think it was. For completeness, this is the unreserve paths you need to hit to hit this bug: Ah okay, this is probably rare. I wondered whether it affects the unmapper path. Because AFAIU, that would have led to out-of-address space at some point with high probability. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2773087416 From kdnilsen at openjdk.org Wed Apr 2 17:49:49 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 2 Apr 2025 17:49:49 GMT Subject: RFR: 8352181: Shenandoah: Evacuate thread roots after early cleanup In-Reply-To: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> References: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> Message-ID: On Mon, 17 Mar 2025 21:37:14 GMT, William Kemper wrote: > Moving the evacuation of thread roots after early cleanup allows Shenandoah to recycle immediate garbage a bit sooner in the cycle. Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24090#pullrequestreview-2737095478 From kdnilsen at openjdk.org Wed Apr 2 17:55:48 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 2 Apr 2025 17:55:48 GMT Subject: RFR: 8352181: Shenandoah: Evacuate thread roots after early cleanup In-Reply-To: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> References: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> Message-ID: On Mon, 17 Mar 2025 21:37:14 GMT, William Kemper wrote: > Moving the evacuation of thread roots after early cleanup allows Shenandoah to recycle immediate garbage a bit sooner in the cycle. Maybe the "best" tradeoff is "adaptive behavior". If allocatable memory is in "short supply", we should evacuate thread roots early. Otherwise, we should preserve existing behavior. Defining "short supply" might be a bit tricky. There's a related PR that is still in development, to surge GC worker threads when we are at risk of experiencing allocation failures. A lot of heuristic predictions feed into the decision of when and whether to surge. We could use that same feedback mechanism here. If we are under "worker surge" conditions, that suggests memory is in short supply, an this is the ideal time to shift some of the GC work onto the mutators, so this is when we should evacuate thread roots early. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24090#issuecomment-2773302505 From jsikstro at openjdk.org Wed Apr 2 18:20:00 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 2 Apr 2025 18:20:00 GMT Subject: RFR: 8353559: Restructure CollectedHeap error printing Message-ID: Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. The old and new printing orders are shown below for ZGC: # Old # New Testing: * GHA * Tiers 1 & 2 * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt ------------- Commit messages: - Copyright years - 8353559: Restructure CollectedHeap error printing Changes: https://git.openjdk.org/jdk/pull/24387/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24387&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353559 Stats: 141 lines in 16 files changed: 75 ins; 52 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24387.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24387/head:pull/24387 PR: https://git.openjdk.org/jdk/pull/24387 From jsikstro at openjdk.org Wed Apr 2 18:40:57 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 2 Apr 2025 18:40:57 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:35:36 GMT, Stefan Karlsson wrote: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. Should `_has_unreserved` and `test_unreserve` become be static like the other member variables and test methods? ------------- PR Review: https://git.openjdk.org/jdk/pull/24377#pullrequestreview-2737227639 From stefank at openjdk.org Wed Apr 2 20:16:59 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 20:16:59 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:38:34 GMT, Joel Sikstr?m wrote: > Should `_has_unreserved` and `test_unreserve` become be static like the other member variables and test methods? I'll look into that tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2773620954 From stefank at openjdk.org Wed Apr 2 20:16:58 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 20:16:58 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: <639NoIyfKt-nwS-Pn2ia-83bQUjAykMzL0YKd8rSO7I=.8973dd8d-686c-42a5-95b5-443ca005ad4f@github.com> References: <639NoIyfKt-nwS-Pn2ia-83bQUjAykMzL0YKd8rSO7I=.8973dd8d-686c-42a5-95b5-443ca005ad4f@github.com> Message-ID: On Wed, 2 Apr 2025 16:13:30 GMT, Thomas Stuefe wrote: > > > Okay. > > > > Curious, was this a day zero problem? > > > I think it was. For completeness, this is the unreserve paths you need to hit to hit this bug: > > Ah okay, this is probably rare. I wondered whether it affects the unmapper path. The unmapper converts the mapped memory (virtual to the physical memory) to be just reserved memory (but using Window's placeholder mechanism). So, the memory is not unreserved by the unmapper. I hope this makes sense. > Because AFAIU, that would have led to out-of-address space at some point with high probability. If you try to call this faulty unreserve implementation then the JVM will immediately shut down. So, I don't think this bug will cause and address-space leak. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2773620289 From stefank at openjdk.org Wed Apr 2 20:53:48 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 20:53:48 GMT Subject: RFR: 8353559: Restructure CollectedHeap error printing In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:09:12 GMT, Joel Sikstr?m wrote: > Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. > > To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. > > Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. > > To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. > > The old and new printing orders are shown below for ZGC: > > # Old > > > > > > > > > > # New > > > > > > > > Testing: > * GHA > * Tiers 1 & 2 > * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. > > ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt > ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24387#pullrequestreview-2737551377 From manc at openjdk.org Thu Apr 3 06:29:49 2025 From: manc at openjdk.org (Man Cao) Date: Thu, 3 Apr 2025 06:29:49 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 16:00:33 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix test failure on macos-aarch64 by using power-of-two sizes. Re [Thomas' comment](#issuecomment-2772493942): > The original patch on the CR only set the guidance for the marking. It did not interact with heap sizing directly at all like the change does. What is the reason for this change? Because without changing heap sizing directly, setting `SoftMaxHeapSize` alone is ineffective to shrink the heap in most cases. E.g., the included test `test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java` will fail. For other concerns, I think one fundamental issue is the precedence of heap sizing flags: should the JVM respect `SoftMaxHeapSize` over `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio`? My preference is yes, that `SoftMaxHeapSize` should have higher precedence, for the following reasons: 1. Users that set `SoftMaxHeapSize` expect it to be effective to limit heap size. The JVM should do its best to respect user's request. As [JDK-8222181](https://bugs.openjdk.org/browse/JDK-8222181) mentions: "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so." We might interpret "GC decides it's necessary" differently. I think the real necessary case is "the JVM will throw OutOfMemoryError if it does not grow the heap", instead of "the JVM will violate `MinHeapFreeRatio`/`MaxHeapFreeRatio`/`GCTimeRatio` if it does not grow the heap". 1. Having a single flag that makes G1 shrink heap more aggressively, is much more user-friendly than requiring users to tune 3 or more flags to achieve the same effect. As you mentioned, if `SoftMaxHeapSize` only guides marking, user has to also tune `MinHeapFreeRatio`/`MaxHeapFreeRatio` to make G1 shrink more aggressively. It is difficult to figure out a proper value for each flag. Moreover, if user wants to make G1 shrink to a specific heap size, it is a lot harder to achieve that through tuning `MinHeapFreeRatio`/`MaxHeapFreeRatio`. 1. Issues with expansion after young collections from `GCTimeRatio`. `MinHeapFreeRatio`/`MaxHeapFreeRatio` have no effect on how much G1 expands the heap after young collections. Users need to tune `GCTimeRatio` if they want to make G1 expand less aggressively, otherwise aggressive expansion would defeat the purpose of `SoftMaxHeapSize`. However, `GCTimeRatio` is not a manageable flag, so it cannot be changed at run time. If `SoftMaxHeapSize` has precedence, we don't need to bother making `GCTimeRatio` manageable and asking users to tune it at run time. (This is somewhat related to [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) and [email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051004.html). ) > So similar to @walulyai I would strongly prefer for SoftMaxHeapSize not interfere that much with the application's performance. If user sets a too small `SoftMaxHeapSize` and causes performance regression or GC thrashing, it is really user's misconfiguration, and they should take measures to adjust `SoftMaxHeapSize` based on workload. Also misconfiguring `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio` could cause similar regressions (think of `-XX:GCTimeRatio=1 -XX:MinHeapFreeRatio=1 -XX:MaxHeapFreeRatio=1`). However, I can see that `SoftMaxHeapSize` may be easier to misconfigure than the other 3 flags, because it does not adapt to changing live size by itself. I wonder if we could try reaching a middle ground (perhaps this is also what you suggests with ZGC's example of growing up to 25% of cpu usage?): - `SoftMaxHeapSize` still takes higher precedence over `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio`. - G1 could have an internal mechanism to detect GC thrashing, and expands heap above `SoftMaxHeapSize` if thrashing happens. > That gets us back to [JDK-8238687](https://bugs.openjdk.org/browse/JDK-8238687) and [JDK-8248324](https://bugs.openjdk.org/browse/JDK-8248324)... Yes, fixing these two issues would be great regardless of `SoftMaxHeapSize`. However, they do not address the 3 issues above about flag precedence. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2774619383 From manc at openjdk.org Thu Apr 3 07:08:19 2025 From: manc at openjdk.org (Man Cao) Date: Thu, 3 Apr 2025 07:08:19 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Use Atomic::load for flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/4435e89f..c60ade41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From manc at openjdk.org Thu Apr 3 07:30:51 2025 From: manc at openjdk.org (Man Cao) Date: Thu, 3 Apr 2025 07:30:51 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag Re: concurrency issue with reading `SoftMaxHeapSize` I updated to `Atomic::load()`, but not sure if I understand the concern correctly. > So e.g. the assignment of `G1IHOPControl::get_conc_mark_start_threshold` to `marking_initiating_used_threshold` in that call can be inlined in `G1Policy::need_to_start_conc_mark` (called by the mutator in `G1CollectedHeap::attempt_allocation_humongous`) in multiple places, and so `SoftMaxHeapSize` re-read with multiple different values in that method. I don't see where the re-read is. I think in any code path from `G1IHOPControl::get_conc_mark_start_threshold`, `G1CollectedHeap::heap()->soft_max_capacity()` is called only once. `G1CollectedHeap::attempt_allocation_humongous` also appears to call `G1Policy::need_to_start_conc_mark` only once, which calls `G1IHOPControl::get_conc_mark_start_threshold` only once. I agree it is a data race if `soft_max_capacity()` runs outside of a safepoint, so `Atomic::load()` makes sense regardless. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2774731515 From iwalulya at openjdk.org Thu Apr 3 08:11:00 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 3 Apr 2025 08:11:00 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:27:22 GMT, Man Cao wrote: > 1. Users that set `SoftMaxHeapSize` expect it to be effective to limit heap size. The JVM should do its best to respect user's request. As [JDK-8222181](https://bugs.openjdk.org/browse/JDK-8222181) mentions: "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so." We might interpret "GC decides it's necessary" differently. I think the real necessary case is "the JVM will throw OutOfMemoryError if it does not grow the heap", instead of "the JVM will violate `MinHeapFreeRatio`/`MaxHeapFreeRatio`/`GCTimeRatio` if it does not grow the heap". In the current approach, it is not that we are respecting the user's request, we are violating the request just that we do this only during GCs. So eventually you have back to back GCs that will expand the heap to whatever heapsize the application requires. My interpretation of `SoftMaxHeapSize` is that we can meet this limit where possible, but also exceed the limit if required. So I propose we take the same approach as used in other GCs where `SoftMaxHeapSize` is used as a parameter for setting GC pressure but not as a limit to allocations. > > 3. Issues with expansion after young collections from `GCTimeRatio`. `MinHeapFreeRatio`/`MaxHeapFreeRatio` have no effect on how much G1 expands the heap after young collections. Users need to tune `GCTimeRatio` if they want to make G1 expand less aggressively, otherwise aggressive expansion would defeat the purpose of `SoftMaxHeapSize`. However, `GCTimeRatio` is not a manageable flag, so it cannot be changed at run time. If `SoftMaxHeapSize` has precedence, we don't need to bother making `GCTimeRatio` manageable and asking users to tune it at run time. (This is somewhat related to [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) and [email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051004.html). ) Agreed, these ratios are problematic, and we should find a solution that removes them. We also need to agree on the purpose of `SoftMaxHeapSize`, my understanding is that `SoftMaxHeapSize` is meant for the application to be handle spikes in allocations and and quickly release the memory if no longer required. If `SoftMaxHeapSize` has precedence over`GCTimeRatio`, then G1 is changing the objective from balancing latency and throughput to optimizing for memory usage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2774824745 From tschatzl at openjdk.org Thu Apr 3 08:34:00 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 3 Apr 2025 08:34:00 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:28:13 GMT, Man Cao wrote: > Re: concurrency issue with reading `SoftMaxHeapSize` > > I updated to `Atomic::load()`, but not sure if I understand the concern correctly. > > > So e.g. the assignment of `G1IHOPControl::get_conc_mark_start_threshold` to `marking_initiating_used_threshold` in that call can be inlined in `G1Policy::need_to_start_conc_mark` (called by the mutator in `G1CollectedHeap::attempt_allocation_humongous`) in multiple places, and so `SoftMaxHeapSize` re-read with multiple different values in that method. > > I don't see where the re-read is. I think in any code path from `G1IHOPControl::get_conc_mark_start_threshold`, `G1CollectedHeap::heap()->soft_max_capacity()` is called only once. `G1CollectedHeap::attempt_allocation_humongous` also appears to call `G1Policy::need_to_start_conc_mark` only once, which calls `G1IHOPControl::get_conc_mark_start_threshold` only once. > > I agree it is a data race if `soft_max_capacity()` runs outside of a safepoint, so `Atomic::load()` makes sense regardless. The compiler could be(*) free to call `get_conc_mark_start_threshold()` again in any of the uses of the local variable without telling it that one of its components may change between re-reads. (*) Probably not after looking again, given that it's not marked as `const` (not sure why), and a virtual method, and fairly large. The situation would be much worse if somehow `SoftMaxHeapsize` could be changed within a safepoint. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2774885501 From stefank at openjdk.org Thu Apr 3 09:32:12 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Apr 2025 09:32:12 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken [v2] In-Reply-To: References: Message-ID: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8353264_zgc_unreserve - Make addtions static - 8353264: ZGC: Windows heap unreserving is broken ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24377/files - new: https://git.openjdk.org/jdk/pull/24377/files/7e2861b2..bbf83831 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24377&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24377&range=00-01 Stats: 11266 lines in 447 files changed: 7600 ins; 2558 del; 1108 mod Patch: https://git.openjdk.org/jdk/pull/24377.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24377/head:pull/24377 PR: https://git.openjdk.org/jdk/pull/24377 From jsikstro at openjdk.org Thu Apr 3 09:32:12 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 3 Apr 2025 09:32:12 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken [v2] In-Reply-To: References: Message-ID: <-jYFzlEXm9kiqtULRVQFRP1UcAfb_Yscb8s7AelLI98=.b68fb9ed-1a28-4437-8658-40087c134800@github.com> On Thu, 3 Apr 2025 09:29:08 GMT, Stefan Karlsson wrote: >> During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: >> >> If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) >> >> >> Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. >> >> In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. >> >> I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8353264_zgc_unreserve > - Make addtions static > - 8353264: ZGC: Windows heap unreserving is broken Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24377#pullrequestreview-2739122546 From eosterlund at openjdk.org Thu Apr 3 09:53:53 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 3 Apr 2025 09:53:53 GMT Subject: RFR: 8353559: Restructure CollectedHeap error printing In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:09:12 GMT, Joel Sikstr?m wrote: > Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. > > To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. > > Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. > > To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. > > The old and new printing orders are shown below for ZGC: > > # Old > > > > > > > > > > # New > > > > > > > > Testing: > * GHA > * Tiers 1 & 2 > * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. > > ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt > ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24387#pullrequestreview-2739190388 From tschatzl at openjdk.org Thu Apr 3 10:01:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 3 Apr 2025 10:01:54 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag > Re [Thomas' comment](#issuecomment-2772493942): > > > The original patch on the CR only set the guidance for the marking. It did not interact with heap sizing directly at all like the change does. What is the reason for this change? > > Because without changing heap sizing directly, setting `SoftMaxHeapSize` alone is ineffective to shrink the heap in most cases. E.g., the included test `test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java` will fail. > > For other concerns, I think one fundamental issue is the precedence of heap sizing flags: should the JVM respect `SoftMaxHeapSize` over `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio`? My preference is yes, that `SoftMaxHeapSize` should have higher precedence, for the following reasons: > > 1. Users that set `SoftMaxHeapSize` expect it to be effective to limit heap size. The JVM should do its best to respect user's request. As [JDK-8222181](https://bugs.openjdk.org/browse/JDK-8222181) mentions: "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so." We might interpret "GC decides it's necessary" differently. I think the real necessary case is "the JVM will throw OutOfMemoryError if it does not grow the heap", instead of "the JVM will violate `MinHeapFreeRatio`/`MaxHeapFreeRatio`/`GCTimeRatio` if it does not grow the heap". > > 2. Having a single flag that makes G1 shrink heap more aggressively, is much more user-friendly than requiring users to tune 3 or more flags to achieve the same effect. As you mentioned, if `SoftMaxHeapSize` only guides marking, user has to also tune `MinHeapFreeRatio`/`MaxHeapFreeRatio` to make G1 shrink more aggressively. It is difficult to figure out a proper value for each flag. Moreover, if user wants to make G1 shrink to a specific heap size, it is a lot harder to achieve that through tuning `MinHeapFreeRatio`/`MaxHeapFreeRatio`. > > 3. Issues with expansion after young collections from `GCTimeRatio`. `MinHeapFreeRatio`/`MaxHeapFreeRatio` have no effect on how much G1 expands the heap after young collections. Users need to tune `GCTimeRatio` if they want to make G1 expand less aggressively, otherwise aggressive expansion would defeat the purpose of `SoftMaxHeapSize`. However, `GCTimeRatio` is not a manageable flag, so it cannot be changed at run time. If `SoftMaxHeapSize` has precedence, we don't need to bother making `GCTimeRatio` manageable and asking users to tune it at run time. (This is somewhat related to [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) and [email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051004.html). ) > > > > So similar to @walulyai I would strongly prefer for SoftMaxHeapSize not interfere that much with the application's performance. > > If user sets a too small `SoftMaxHeapSize` and causes performance regression or GC thrashing, it is really user's misconfiguration, and they should take measures to adjust `SoftMaxHeapSize` based on workload. Also misconfiguring `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio` could cause similar regressions (think of `-XX:GCTimeRatio=1 -XX:MinHeapFreeRatio=1 -XX:MaxHeapFreeRatio=1`). > > However, I can see that `SoftMaxHeapSize` may be easier to misconfigure than the other 3 flags, because it does not adapt to changing live size by itself. I wonder if we could try reaching a middle ground (perhaps this is also what you suggests with ZGC's example of growing up to 25% of cpu usage?): Exactly. > > * `SoftMaxHeapSize` still takes higher precedence over `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio`. > > * G1 could have an internal mechanism to detect GC thrashing, and expands heap above `SoftMaxHeapSize` if thrashing happens. > > > > That gets us back to [JDK-8238687](https://bugs.openjdk.org/browse/JDK-8238687) and [JDK-8248324](https://bugs.openjdk.org/browse/JDK-8248324)... > > Yes, fixing these two issues would be great regardless of `SoftMaxHeapSize`. However, they do not address the 3 issues above about flag precedence. * JDK-8248324 effectively removes the use of `Min/MaxHeapFreeRatio` (apart of full gc, which obviously they also need to be handled in some way that fits into the system). * JDK-8238687 makes `GCTimeRatio` shrink the heap too, obviating the need for `Min/MaxHeapFreeRatio`, which are currently the knobs that limit excessive memory usage. With no flag to interfere (no `Min/MaxHeapFreeRatio`) with each other, there is no need for considering their precedence. As you mention, there is need for some strategy to reconcile divergent goals - ultimately G1 needs a single value that tells it to resize the heap in which direction in which degree. Incidentally, the way `GCTimeRatio` (or actually the internal gc cpu usage target as an intermediate) is already in use fits these requirements. From that guiding value you can calculate a difference to desired, with some smoothing applied, which gives you both direction and degree of the change in heap size (applying some magic factors/constants). So it seems fairly straightforward to have any outside "memory pressure" effect this intermediate control value instead of everyone overriding each other in multiple places in the code. Now there is some question about the weights of these factors: we (in the gc team) prefer to keep G1's balancing between throughput and latency, particularly if the input this time is some value explicitly containing "soft" in its name. Using the 25% from ZGC as a max limit for gc cpu usage if we are (way) beyond what the user desires seems good enough for an initial guess. Not too high, guaranteeing some application progress in the worst case (for this factor!), not too low, guaranteeing that the intent of the user setting this value is respected. (One can see `Min/MaxHeapFreeRatio` as an old attempt to limit heap size growth without affecting performance too much, changing memory pressure. However they are hard to use. And they are completely dis-associated with the rest of the heap sizing mechanism. `SoftMaxHeapSize` is easier to handle) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2775155378 From stefank at openjdk.org Thu Apr 3 10:38:59 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Apr 2025 10:38:59 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:32:12 GMT, Stefan Karlsson wrote: >> During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: >> >> If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) >> >> >> Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. >> >> In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. >> >> I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8353264_zgc_unreserve > - Make addtions static > - 8353264: ZGC: Windows heap unreserving is broken Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2775290032 From stefank at openjdk.org Thu Apr 3 10:48:01 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Apr 2025 10:48:01 GMT Subject: Integrated: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:35:36 GMT, Stefan Karlsson wrote: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. This pull request has now been integrated. Changeset: ffca4f2d Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/ffca4f2da84cb8711794d8e692d176a7e785e7b1 Stats: 27 lines in 2 files changed: 24 ins; 0 del; 3 mod 8353264: ZGC: Windows heap unreserving is broken Reviewed-by: jsikstro, aboldtch, eosterlund, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/24377 From aboldtch at openjdk.org Thu Apr 3 10:48:01 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 3 Apr 2025 10:48:01 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:32:12 GMT, Stefan Karlsson wrote: >> During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: >> >> If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) >> >> >> Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. >> >> In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. >> >> I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8353264_zgc_unreserve > - Make addtions static > - 8353264: ZGC: Windows heap unreserving is broken lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24377#pullrequestreview-2739381258 From aboldtch at openjdk.org Thu Apr 3 11:15:53 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 3 Apr 2025 11:15:53 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:15:01 GMT, Stefan Karlsson wrote: >> We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). >> >> The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. >> >> Thanks to @plummercj for digging into this and proposing the same workaround. >> >> Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Remove test from ProblemList A good local fix. But I also think `VMError::is_error_reported_in_current_thread()` should do `return is_error_reported() && _first_error_tid == os::current_thread_id();` Given that `current_thread_id` has a non trivial cost. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24349#pullrequestreview-2739468102 From ayang at openjdk.org Thu Apr 3 11:32:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 3 Apr 2025 11:32:55 GMT Subject: RFR: 8353559: Restructure CollectedHeap error printing In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:09:12 GMT, Joel Sikstr?m wrote: > Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. > > To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. > > Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. > > To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. > > The old and new printing orders are shown below for ZGC: > > # Old > > > > > > > > > > # New > > > > > > > > Testing: > * GHA > * Tiers 1 & 2 > * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. > > ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt > ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24387#pullrequestreview-2739525769 From tschatzl at openjdk.org Thu Apr 3 11:33:45 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 3 Apr 2025 11:33:45 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure [v2] In-Reply-To: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: > Hi all, > > please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). > > This has been made possible with the refactoring of object array task queues. > > At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). > > Testing: tier1-5, some perf testing with no differences > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * some additional assert to make sure the scanner is initialized correctly. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24222/files - new: https://git.openjdk.org/jdk/pull/24222/files/e5ce3984..21cc754a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24222&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24222&range=00-01 Stats: 7 lines in 2 files changed: 6 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24222.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24222/head:pull/24222 PR: https://git.openjdk.org/jdk/pull/24222 From iwalulya at openjdk.org Thu Apr 3 13:31:55 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 3 Apr 2025 13:31:55 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure [v2] In-Reply-To: References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: On Thu, 3 Apr 2025 11:33:45 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). >> >> This has been made possible with the refactoring of object array task queues. >> >> At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). >> >> Testing: tier1-5, some perf testing with no differences >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * some additional assert to make sure the scanner is initialized correctly. LGTM! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24222#pullrequestreview-2739853788 From tschatzl at openjdk.org Thu Apr 3 15:09:18 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 3 Apr 2025 15:09:18 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure [v2] In-Reply-To: References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: On Thu, 3 Apr 2025 13:29:18 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * some additional assert to make sure the scanner is initialized correctly. > > LGTM! Thanks @walulyai @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/24222#issuecomment-2776099031 From tschatzl at openjdk.org Thu Apr 3 15:09:19 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 3 Apr 2025 15:09:19 GMT Subject: Integrated: 8271870: G1: Add objArray splitting when scanning object with evacuation failure In-Reply-To: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: <3pkPiCQ3xl43uo_Y6hbpUa8qCjgvId2B6tcL23TZTbI=.69ecc66d-a462-41cc-8914-85dc38308b64@github.com> On Tue, 25 Mar 2025 10:35:58 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). > > This has been made possible with the refactoring of object array task queues. > > At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). > > Testing: tier1-5, some perf testing with no differences > > Thanks, > Thomas This pull request has now been integrated. Changeset: 64b691ab Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/64b691ab619d2d99a9c6492341074d2794563c16 Stats: 106 lines in 4 files changed: 51 ins; 32 del; 23 mod 8271870: G1: Add objArray splitting when scanning object with evacuation failure 8271871: G1 does not try to deduplicate objects that failed evacuation Reviewed-by: iwalulya, ayang ------------- PR: https://git.openjdk.org/jdk/pull/24222 From ysr at openjdk.org Thu Apr 3 21:45:52 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 3 Apr 2025 21:45:52 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> Message-ID: <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> On Mon, 31 Mar 2025 23:09:53 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Can't verify marked object with complete marking after full GC I looked at the files that changed since the last review only, but can look over all of it once again if necessary (just let me know). This looks good; just a few small comments, and in particular a somewhat formalistic and pedantic distinction between the use of `gc_generation()` and `active_generation()` to fetch the marking context (and the use of `global_generation()`). Otherwise looks good to me. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 352: > 350: assert(_from_region != nullptr, "must set before work"); > 351: assert(_heap->active_generation()->complete_marking_context()->is_marked(p), "must be marked"); > 352: assert(!_heap->active_generation()->complete_marking_context()->allocated_after_mark_start(p), "must be truly marked"); I am probably being a bit pedantic here... I would use `gc_generation()` in all code that is executed strictly by GC threads, and `active_generation()` in all code that may possibly be executed by a mutator thread. It seems as if today this code is only executed by GC threads. In general, there is no real distinction between these field at times like these (STW pauses) when heap verification is taking place, but from a syntactic hygiene perspective. We can otherwise file a ticket to separately clean up any confusion in the use of these fields (and add a dynamic check to prevent creeping confusion). The names aren't super well-chosen, but generally think of `_gc_generation` as the generation that is being GC'd, `_active_generation` as one that mutator threads are aware is being the subject of GC. Any assertions by mutator threads should use the latter and by GC threads the former. The fields are reconciled at STW pauses. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 778: > 776: ShenandoahAdjustPointersClosure() : > 777: _heap(ShenandoahHeap::heap()), > 778: _ctx(ShenandoahHeap::heap()->global_generation()->complete_marking_context()) {} I liked the changes in this file that everywhere use the heap's `_gc_generation` (see comment about the distinction between `gc_generation()` and `active_generation()` above) field to fetch the marking context. While I understand that it might be the case that whenever we are here, the `_gc_generation` must necessarily be the `global_generation()`, I am wondering about: 1. using `_gc_generation` here as well to fetch the context, and 2. secondly, asserting also that the `_gc_generation` is in fact the `global_generation()`. I assume (2) must be the case here? If not, it would be good to see if this can be fixed. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1094: > 1092: ShenandoahHeapRegion* region = _regions.next(); > 1093: ShenandoahHeap* heap = ShenandoahHeap::heap(); > 1094: ShenandoahMarkingContext* const ctx = heap->global_generation()->complete_marking_context(); Same comment as at line 778. src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1191: > 1189: _verify_remembered_after_full_gc, // verify read-write remembered set > 1190: _verify_forwarded_none, // all objects are non-forwarded > 1191: _verify_marked_incomplete, // all objects are marked in incomplete bitmap Is the marking bitmap updated as objects are moved to their new locations? Is that done just to satisfy the verifier? ------------- PR Review: https://git.openjdk.org/jdk/pull/23886#pullrequestreview-2741111545 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027772698 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027710108 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027713065 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027777968 From ysr at openjdk.org Thu Apr 3 21:55:07 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 3 Apr 2025 21:55:07 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: References: <7yfWKXewUM1XqWtlnyuPV3nu9bGr5VNJXuXi1aNQGvQ=.4c53d85b-13f3-4bfc-87c3-634d547bb440@github.com> Message-ID: On Thu, 6 Mar 2025 23:09:47 GMT, Xiaolong Peng wrote: >> OK, yes, that makes sense. Why not then use both `ShenandoahHeap::[complete_]marking_context()` as synonyms for `ShehandoahHeap::active_generation()->[complete_]marking_context()`. See other related comments in this review round. > > I feel using `henandoahHeap::complete_marking_context()` as synonyms for `ShehandoahHeap::active_generation()->[complete_]marking_context()` may cause more confusion, just read from the name it seems that it indicates the marking is complete for the whole heap, not just the active generation. ok, that makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027790148 From ysr at openjdk.org Thu Apr 3 22:10:50 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 3 Apr 2025 22:10:50 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: References: <8w22oUPhZEx0iEIeNQ-GUUjx8jNkjXrTHjfjN_sX4HE=.2c391dd5-227e-4755-ba4d-528a7dcefca3@github.com> Message-ID: On Fri, 7 Mar 2025 19:25:33 GMT, William Kemper wrote: >> You proposal will make the impl of the set_mark_complete/is_mark_complete of ShenandoahGeneration cleaner, but the thing is it will change current design and behavior, we may have to update the code where there methods is called, e.g. when we call `set_mark_complete` of gc_generation/active_generation, if it is global generation, we may have to explicitly call the same methods of ShenandoahYoungGeneration and ShenandoahOldGeneration to fan out the status. >> >> How about I follow up it in a separate task and update the implementation if necessary? I want to limit the changes involved in this PR, and only fix the bug. > > The young and old generations are only instantiated in the generational mode, so using them without checking the mode will result in SEGV in non-generational modes. > > Global collections have a lot of overlap with old collections. I think what Ramki is saying, is that if we change all the code that makes assertions about the completion status of young/old marking to use the `active_generation` field instead, then we wouldn't need to update the completion status of young/old during a global collection. The difficulty here is that we need assurances that the old generation mark bitmap is valid in collections subsequent to a global collection. So, I don't think we can rely on completion status of `active_generation` when it was global, in following collections where it may now refer to young or old. I see. Yes, that makes sense to me, thanks William. It would then be the case for the global generation that if is_mark_complete() then in the generational case that's also the case for both of its constituent generations. May be we can assert that when we fetch that at line 204 (and find it's true)? May be I am being paranoid, but the assert would make me feel confident that the state maintenance isn't going awry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027812176 From xpeng at openjdk.org Thu Apr 3 22:33:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 3 Apr 2025 22:33:56 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> Message-ID: On Thu, 3 Apr 2025 21:39:33 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Can't verify marked object with complete marking after full GC > > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1191: > >> 1189: _verify_remembered_after_full_gc, // verify read-write remembered set >> 1190: _verify_forwarded_none, // all objects are non-forwarded >> 1191: _verify_marked_incomplete, // all objects are marked in incomplete bitmap > > Is the marking bitmap updated as objects are moved to their new locations? Is that done just to satisfy the verifier? Yes, making bitmaps has been reset after full GC, except for the for regions with pined objects. _verify_marked_complete requires complete marking context, it might make more sense to change it to _verify_marked_disable after full GC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027835236 From xpeng at openjdk.org Thu Apr 3 22:37:50 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 3 Apr 2025 22:37:50 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> Message-ID: On Thu, 3 Apr 2025 21:34:06 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Can't verify marked object with complete marking after full GC > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 352: > >> 350: assert(_from_region != nullptr, "must set before work"); >> 351: assert(_heap->active_generation()->complete_marking_context()->is_marked(p), "must be marked"); >> 352: assert(!_heap->active_generation()->complete_marking_context()->allocated_after_mark_start(p), "must be truly marked"); > > I am probably being a bit pedantic here... > > I would use `gc_generation()` in all code that is executed strictly by GC threads, and `active_generation()` in all code that may possibly be executed by a mutator thread. It seems as if today this code is only executed by GC threads. > > In general, there is no real distinction between these field at times like these (STW pauses) when heap verification is taking place, but from a syntactic hygiene perspective. > > We can otherwise file a ticket to separately clean up any confusion in the use of these fields (and add a dynamic check to prevent creeping confusion). The names aren't super well-chosen, but generally think of `_gc_generation` as the generation that is being GC'd, `_active_generation` as one that mutator threads are aware is being the subject of GC. Any assertions by mutator threads should use the latter and by GC threads the former. The fields are reconciled at STW pauses. Make sense, I did notice that there is assert `assert(!Thread::current()->is_Java_thread(), "Not allowed");` in `gc_generation()` suggesting that non-Java thread should call `gc_generation()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027837825 From ysr at openjdk.org Thu Apr 3 22:57:50 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 3 Apr 2025 22:57:50 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> Message-ID: <6dN8IY3rHlVn2aiHJwWdB-OKbbx8GABuvau9-Bdw6vU=.a74101a0-845d-4174-a87a-b41674e90579@github.com> On Thu, 3 Apr 2025 22:31:27 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1191: >> >>> 1189: _verify_remembered_after_full_gc, // verify read-write remembered set >>> 1190: _verify_forwarded_none, // all objects are non-forwarded >>> 1191: _verify_marked_incomplete, // all objects are marked in incomplete bitmap >> >> Is the marking bitmap updated as objects are moved to their new locations? Is that done just to satisfy the verifier? > > Yes, making bitmaps has been reset after full GC, except for the for regions with pined objects. > _verify_marked_complete requires complete marking context, it might make more sense to change it to _verify_marked_disable after full GC. Curious; in that case should it not have failed in your testing because the objects not pinned may not have been marked as the verifier would have insisted they were? Why do we leave the regions with pinned objects marked? I am guessing once we have filled in the dead objects, the marks do not serve any purpose? May be I am missing some corner case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027852832 From manc at openjdk.org Fri Apr 4 07:26:54 2025 From: manc at openjdk.org (Man Cao) Date: Fri, 4 Apr 2025 07:26:54 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag Thank you both for the quick and detailed responses! > * JDK-8248324 effectively removes the use of `Min/MaxHeapFreeRatio` (apart of full gc, which obviously they also need to be handled in some way that fits into the system). > * JDK-8238687 makes `GCTimeRatio` shrink the heap too, obviating the need for `Min/MaxHeapFreeRatio`, which are currently the knobs that limit excessive memory usage. > > With no flag to interfere (no `Min/MaxHeapFreeRatio`) with each other, there is no need for considering their precedence. > > As you mention, there is need for some strategy to reconcile divergent goals - ultimately G1 needs a single value that tells it to resize the heap in which direction in which degree. > > Incidentally, the way `GCTimeRatio` (or actually the internal gc cpu usage target as an intermediate) is already in use fits these requirements. From some actual value you can calculate a difference to desired, with some smoothing applied, which gives you both direction and degree of the change in heap size (applying some magic factors/constants). I was unaware that G1 plans to stop using `Min/MaxHeapFreeRatio` until now. Looks like [JDK-8238686](https://bugs.openjdk.org/browse/JDK-8238686) has more relevant description. It sounds good to solve all above-mentioned issues and converge on a single flag such as `GCTimeRatio`, and ensure both incremental and full GCs respect this flag. (We should also fix [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) for converging on `GCTimeRatio`. ) It would be nicer if we have a doc or a master bug that describes the overall plan. In comparison, this PR's approach for a high-precedence, "harder" `SoftMaxHeapSize` is an easier and more expedient approach to improve heap resizing, without solving all other issues. However, it requires users to carefully maintain and dynamically adjust `SoftMaxHeapSize` to prevent GC thrashing. I think if all other issues are resolved, our existing internal use cases that use a separate algorithm to dynamically calculate and set the high-precedence `SoftMaxHeapSize` (or `ProposedHeapSize`) could probably migrate to the `GCTimeRatio` approach, and stop using `SoftMaxHeapSize`. I'll need some discussion with my team about what we would do next. Meanwhile, @mo-beck do you guys have preference on how `SoftMaxHeapSize` should work? > > Now there is some question about the weights of these factors: we (in the gc team) prefer to keep G1's balancing between throughput and latency, particularly if the input this time is some value explicitly containing "soft" in its name. Using the 25% from ZGC as a max limit for gc cpu usage if we are (way) beyond what the user desires seems good enough for an initial guess. Not too high, guaranteeing some application progress in the worst case (for this factor!), not too low, guaranteeing that the intent of the user setting this value is respected. Somewhat related to above, our experience with our internal algorithm that adjusts `SoftMaxHeapSize` based on GC CPU overhead, encountered cases that it behaves poorly. The problem is that some workload have large variance in mutator's CPU usage (e.g. peak hours vs off-peak hours), but smaller variance in GC CPU usage. Then it does not make much sense to maintain a constant % for GC CPU overhead, which could cause excessive heap expansion when mutator CPU usage is low. The workaround is to take live size into consideration when calculating `SoftMaxHeapSize`, which is similar to how `Min/MaxHeapFreeRatio` works. I'm not sure if `GCTimeRatio` using wall time and pause time could run into similar issues. I'm happy to experiment when we make progress on JDK-8238687/JDK-8248324/JDK-8349978. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2777769994 From tschatzl at openjdk.org Fri Apr 4 08:10:34 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 4 Apr 2025 08:10:34 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: - * missing file from merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq - * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update - * fix IR code generation tests that change due to barrier cost changes - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=29 Stats: 7089 lines in 110 files changed: 2610 ins; 3555 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Fri Apr 4 09:03:50 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 4 Apr 2025 09:03:50 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 07:23:45 GMT, Man Cao wrote: > Thank you both for the quick and detailed responses! > > > * JDK-8248324 effectively removes the use of `Min/MaxHeapFreeRatio` (apart of full gc, which obviously they also need to be handled in some way that fits into the system). > > * JDK-8238687 makes `GCTimeRatio` shrink the heap too, obviating the need for `Min/MaxHeapFreeRatio`, which are currently the knobs that limit excessive memory usage. > > > > With no flag to interfere (no `Min/MaxHeapFreeRatio`) with each other, there is no need for considering their precedence. > > As you mention, there is need for some strategy to reconcile divergent goals - ultimately G1 needs a single value that tells it to resize the heap in which direction in which degree. > > Incidentally, the way `GCTimeRatio` (or actually the internal gc cpu usage target as an intermediate) is already in use fits these requirements. From some actual value you can calculate a difference to desired, with some smoothing applied, which gives you both direction and degree of the change in heap size (applying some magic factors/constants). > > I was unaware that G1 plans to stop using `Min/MaxHeapFreeRatio` until now. Looks like [JDK-8238686](https://bugs.openjdk.org/browse/JDK-8238686) has more relevant description. It sounds good to solve all above-mentioned issues and converge on a single flag such as `GCTimeRatio`, and ensure both incremental and full GCs respect this flag. (We should also fix [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) for converging on `GCTimeRatio`. ) It would be nicer if we have a doc or a master bug that describes the overall plan. Last time this has been mentioned in the hotspot-gc-dev list has been [here](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051079.html). I remember giving multiple outlines to everyone involved earlier, each mentioning that `Min/MaxHeapFreeRatio` need to go away because it's in the way, so I was/am a bit surprised on this response. I will look through the existing bugs and see if I there is a need for a(nother) master bug. > > In comparison, this PR's approach for a high-precedence, "harder" `SoftMaxHeapSize` is an easier and more expedient approach to improve heap resizing, without solving all other issues. However, it requires users to carefully maintain and dynamically adjust `SoftMaxHeapSize` to prevent GC thrashing. I think if all other issues are resolved, our existing internal use cases that use a separate algorithm to dynamically calculate and set the high-precedence `SoftMaxHeapSize` (or `ProposedHeapSize`) could probably migrate to the `GCTimeRatio` approach, and stop using `SoftMaxHeapSize`. > > I'll need some discussion with my team about what we would do next. Meanwhile, @mo-beck do you guys have preference on how `SoftMaxHeapSize` should work? > > > Now there is some question about the weights of these factors: we (in the gc team) prefer to keep G1's balancing between throughput and latency, particularly if the input this time is some value explicitly containing "soft" in its name. Using the 25% from ZGC as a max limit for gc cpu usage if we are (way) beyond what the user desires seems good enough for an initial guess. Not too high, guaranteeing some application progress in the worst case (for this factor!), not too low, guaranteeing that the intent of the user setting this value is respected. > > Somewhat related to above, our experience with our internal algorithm that adjusts `SoftMaxHeapSize` based on GC CPU overhead, encountered cases that it behaves poorly. The problem is that some workload have large variance in mutator's CPU usage (e.g. peak hours vs off-peak hours), but smaller variance in GC CPU usage. Then it does not make much sense to maintain a constant % for GC CPU overhead, which could cause excessive heap expansion when mutator CPU usage is low. The workaround is to take live size into consideration when calculating `SoftMaxHeapSize`, which is similar to how `Min/MaxHeapFreeRatio` works. > > I'm not sure if `GCTimeRatio` using wall time and pause time could run into similar issues. I'm happy to experiment when we make progress on JDK-8238687/JDK-8248324/JDK-8349978. Obviously there are issues to sort out. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2778005801 From ayang at openjdk.org Fri Apr 4 09:12:23 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 4 Apr 2025 09:12:23 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 Message-ID: Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. ------------- Commit messages: - tmp - gclocker-nested Changes: https://git.openjdk.org/jdk/pull/24407/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24407&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352116 Stats: 31 lines in 4 files changed: 20 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24407.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24407/head:pull/24407 PR: https://git.openjdk.org/jdk/pull/24407 From eosterlund at openjdk.org Fri Apr 4 09:12:23 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 4 Apr 2025 09:12:23 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Looks good. Would be nice to refactor the if (UseSerialGC || UseParallelGC) code to something that explains why it's there (those are the GCs that use the new improved GC locker). But that's pre existing so I don't mind if it's split to a separate RFE. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24407#pullrequestreview-2739864515 From jsikstro at openjdk.org Fri Apr 4 11:56:07 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 4 Apr 2025 11:56:07 GMT Subject: RFR: 8353471: ZGC: Redundant generation id in ZGeneration In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:52:49 GMT, Joel Sikstr?m wrote: > The ZGeneration class (and in turn ZGenerationOld and ZGenerationYoung) keeps track of its own ZGenerationId, which means that the generation id does not need to be passed along as an argument when calling internal functions. > > I've removed the id parameter from `ZGeneration::select_relocation_set` in favor of using the member variable `_id`. Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24374#issuecomment-2778471557 From jsikstro at openjdk.org Fri Apr 4 11:56:07 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 4 Apr 2025 11:56:07 GMT Subject: Integrated: 8353471: ZGC: Redundant generation id in ZGeneration In-Reply-To: References: Message-ID: <8QZgCh8R7ZycqowtfLbPwmbJz59ni6HckX2dwRW-U7w=.1db6ca63-5edd-4086-be8a-2d55ae6ac0de@github.com> On Wed, 2 Apr 2025 06:52:49 GMT, Joel Sikstr?m wrote: > The ZGeneration class (and in turn ZGenerationOld and ZGenerationYoung) keeps track of its own ZGenerationId, which means that the generation id does not need to be passed along as an argument when calling internal functions. > > I've removed the id parameter from `ZGeneration::select_relocation_set` in favor of using the member variable `_id`. This pull request has now been integrated. Changeset: b92a4436 Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/b92a44364d3a2267f5bc9aef3077805bebdf9fba Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod 8353471: ZGC: Redundant generation id in ZGeneration Reviewed-by: stefank, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24374 From xpeng at openjdk.org Fri Apr 4 18:11:50 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 4 Apr 2025 18:11:50 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: <6dN8IY3rHlVn2aiHJwWdB-OKbbx8GABuvau9-Bdw6vU=.a74101a0-845d-4174-a87a-b41674e90579@github.com> References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> <6dN8IY3rHlVn2aiHJwWdB-OKbbx8GABuvau9-Bdw6vU=.a74101a0-845d-4174-a87a-b41674e90579@github.com> Message-ID: On Thu, 3 Apr 2025 22:55:18 GMT, Y. Srinivas Ramakrishna wrote: >> Yes, making bitmaps has been reset after full GC, except for the for regions with pined objects. >> _verify_marked_complete requires complete marking context, it might make more sense to change it to _verify_marked_disable after full GC. > > Curious; in that case should it not have failed in your testing because the objects not pinned may not have been marked as the verifier would have insisted they were? Why do we leave the regions with pinned objects marked? I am guessing once we have filled in the dead objects, the marks do not serve any purpose? > > May be I am missing some corner case? It does, one of the changes https://github.com/openjdk/jdk/pull/24092 is to set the marking completeness flag to false after Full GC because the bitmaps have been reset, `_verify_marked_complete` requires complete marking marking context so there is assert error. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2029244689 From xpeng at openjdk.org Fri Apr 4 18:18:30 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 4 Apr 2025 18:18:30 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v8] In-Reply-To: References: Message-ID: > With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. > > This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] Tier 1 > - [x] Tier 2 Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address PR comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23886/files - new: https://git.openjdk.org/jdk/pull/23886/files/7c73e121..d4af962a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=06-07 Stats: 8 lines in 2 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23886/head:pull/23886 PR: https://git.openjdk.org/jdk/pull/23886 From sangheki at openjdk.org Fri Apr 4 21:21:22 2025 From: sangheki at openjdk.org (Sangheon Kim) Date: Fri, 4 Apr 2025 21:21:22 GMT Subject: RFR: 8346568: G1: Other time can be negative Message-ID: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. 1. Different scope of measurement - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) - Changed not to be included in sum_of_sub_phases. - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. 2. Duplicated measurement - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. 3. Concurrent task execution time - Sometimes just triggering concurrent work takes 2 digit milliseconds. Changed to add only initiating time on sum_of_sub_phases and keep displaying concurrent tasks' average execution time. Testing: tier 1 ~ 5 ------------- Commit messages: - Separate measurement for cleanup Changes: https://git.openjdk.org/jdk/pull/24454/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24454&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346568 Stats: 61 lines in 4 files changed: 35 ins; 17 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24454/head:pull/24454 PR: https://git.openjdk.org/jdk/pull/24454 From kbarrett at openjdk.org Sat Apr 5 06:29:47 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 5 Apr 2025 06:29:47 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24407#pullrequestreview-2744728350 From duke at openjdk.org Mon Apr 7 05:45:54 2025 From: duke at openjdk.org (Saint Wesonga) Date: Mon, 7 Apr 2025 05:45:54 GMT Subject: RFR: 8350722: Serial GC: Remove duplicate logic for detecting pointers in young gen In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 06:54:19 GMT, Saint Wesonga wrote: > Checking whether a pointer is in the young generation is currently done by comparing the pointer to the end of the young generation reserved space. The duplication of these checks in various places complicates any changes the layout of the young generation since all these locations need to be updated. This PR replaces the duplicated logic with the DefNewGeneration::is_in_reserved method. @tschatzl , I'm closing this PR now that I have an updated approach in https://github.com/openjdk/jdk/pull/23853 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23792#issuecomment-2782077611 From duke at openjdk.org Mon Apr 7 05:45:54 2025 From: duke at openjdk.org (Saint Wesonga) Date: Mon, 7 Apr 2025 05:45:54 GMT Subject: Withdrawn: 8350722: Serial GC: Remove duplicate logic for detecting pointers in young gen In-Reply-To: References: Message-ID: <_hkx74X6j9YnTj9Z_dUXjLPXSMY4IeRk3W4Vo5Ti_KI=.0b979267-53cc-4cc4-8f03-c33d726bedc7@github.com> On Wed, 26 Feb 2025 06:54:19 GMT, Saint Wesonga wrote: > Checking whether a pointer is in the young generation is currently done by comparing the pointer to the end of the young generation reserved space. The duplication of these checks in various places complicates any changes the layout of the young generation since all these locations need to be updated. This PR replaces the duplicated logic with the DefNewGeneration::is_in_reserved method. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23792 From tschatzl at openjdk.org Mon Apr 7 07:55:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 7 Apr 2025 07:55:52 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24407#pullrequestreview-2745840662 From tschatzl at openjdk.org Mon Apr 7 07:57:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 7 Apr 2025 07:57:51 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: <9nwg79xCItPNaMsHRK6VQFl-dkWPP385vHqhvTYK_k0=.a830743a-5fd6-46a3-87c3-fd2a164ddf6a@github.com> On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag Filed [JDK-8353716](https://bugs.openjdk.org/browse/JDK-8353716). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2782349959 From thomas.schatzl at oracle.com Mon Apr 7 09:07:08 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 7 Apr 2025 11:07:08 +0200 Subject: Moving Forward with AHS for G1 In-Reply-To: References: Message-ID: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> Hi all, On 26.03.25 03:33, Monica Beckwith wrote: > Hi Ivan, > Thanks for the note ? and nice to meet you! > > The refinements you're working on around |GCTimeRatio|?and memory > uncommit are valuable contributions to the broader AHS direction we've > been shaping. They align closely with the multi-input heap sizing model > Thomas and I outlined ? especially the emphasis on GC cost (via | > GCTimeRatio|) and memory responsiveness as primary drivers. > > These kinds of enhancements are central to making G1?s heap sizing more > adaptive and responsive, particularly in environments with shifting > workload patterns. I?m especially interested in your work around > improving the GC time-base ? it seems like a crucial piece for > coordinating GC-triggered adjustments more precisely. > > Given the growing collaboration across contributors, I?ve been thinking > of opening an umbrella issue to track these efforts and possibly > drafting a JEP to help clarify and unify the overall scope. With Oracle, > Google, and others actively contributing, it?s exciting to see a shared > vision taking shape ? and your work is clearly part of it. > I created an umbrella CR at https://bugs.openjdk.org/browse/JDK-8353716 supposed to contain latest info on the effort. Feel free to add to it. If possible, I would like to keep the more free-form discussion here in the mailing list though. My bad for not following up on this much much earlier. > I?m genuinely excited to see this come together. Looking forward to > continuing the discussion and shaping the future of G1 ergonomics together. > Hth, Thomas From ayang at openjdk.org Mon Apr 7 09:19:03 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 7 Apr 2025 09:19:03 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24407#issuecomment-2782605636 From ayang at openjdk.org Mon Apr 7 09:19:03 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 7 Apr 2025 09:19:03 GMT Subject: Integrated: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. This pull request has now been integrated. Changeset: 39549f89 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/39549f89905019fa90dd20ff8b6822c1351cbaa6 Stats: 31 lines in 4 files changed: 20 ins; 7 del; 4 mod 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 Reviewed-by: kbarrett, tschatzl, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24407 From tschatzl at openjdk.org Mon Apr 7 09:22:50 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 7 Apr 2025 09:22:50 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag Also collected thoughts and existing documents with some additional rough explanations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2782661911 From shade at openjdk.org Mon Apr 7 10:33:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 10:33:35 GMT Subject: RFR: 8348278: Trim InitialRAMPercentage to improve startup in default modes [v3] In-Reply-To: References: Message-ID: > See bug for discussion. This is the code change, which is simple. What is not simple is deciding what the new value should be. The change would probably require CSR, which I can file after we agree on the value. > > I think cutting to 0.2% of RAM size gets us into good sweet spot: > - On huge 1024G machine, this yields 2G initial heap > - On reasonably sized 128G machine, this gives 256M initial heap > - On smaller 1G container, this gives 2M initial heap > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8348278-trim-iramp - Also man page - Merge branch 'master' into JDK-8348278-trim-iramp - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23262/files - new: https://git.openjdk.org/jdk/pull/23262/files/d3a327ae..6a6c3ab8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23262&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23262&range=01-02 Stats: 152480 lines in 3423 files changed: 68119 ins; 65042 del; 19319 mod Patch: https://git.openjdk.org/jdk/pull/23262.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23262/head:pull/23262 PR: https://git.openjdk.org/jdk/pull/23262 From shade at openjdk.org Mon Apr 7 10:48:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 10:48:51 GMT Subject: RFR: 8348278: Trim InitialRAMPercentage to improve startup in default modes [v3] In-Reply-To: References: Message-ID: <_J82bhnQOjixO9UDu2Mm0CsGVNe9gXXBxayIyv2TFz8=.2deea0ff-c51b-499d-a8fd-1ebc253a9e2d@github.com> On Mon, 7 Apr 2025 10:33:35 GMT, Aleksey Shipilev wrote: >> See bug for discussion. This is the code change, which is simple. What is not simple is deciding what the new value should be. The change would probably require CSR, which I can file after we agree on the value. >> >> I think cutting to 0.2% of RAM size gets us into good sweet spot: >> - On huge 1024G machine, this yields 2G initial heap >> - On reasonably sized 128G machine, this gives 256M initial heap >> - On smaller 1G container, this gives 2M initial heap >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8348278-trim-iramp > - Also man page > - Merge branch 'master' into JDK-8348278-trim-iramp > - Fix CSR filed: [JDK-8353837](https://bugs.openjdk.org/browse/JDK-8353837) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23262#issuecomment-2782893464 From jsikstro at openjdk.org Mon Apr 7 11:33:57 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 7 Apr 2025 11:33:57 GMT Subject: RFR: 8353559: Restructure CollectedHeap error printing In-Reply-To: References: Message-ID: <9tbw7_56t4aDDTVE-KI9b84ccG_Iky2LRhsMmL0gXF0=.f03a1ac0-099f-465d-977d-751f7b5cf7ff@github.com> On Wed, 2 Apr 2025 18:09:12 GMT, Joel Sikstr?m wrote: > Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. > > To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. > > Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. > > To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. > > The old and new printing orders are shown below for ZGC: > > # Old > > > > > > > > > > # New > > > > > > > > Testing: > * GHA > * Tiers 1 & 2 > * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. > > ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt > ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt Since this is a relatively small change, I'm hoping that the Shenandoah devs are on board. I am going to integrate this now so that we can continue working in this area in ZGC. I am happy to follow up on this if there are any more opinions in the future. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24387#issuecomment-2783006637 From jsikstro at openjdk.org Mon Apr 7 11:33:58 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 7 Apr 2025 11:33:58 GMT Subject: Integrated: 8353559: Restructure CollectedHeap error printing In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:09:12 GMT, Joel Sikstr?m wrote: > Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. > > To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. > > Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. > > To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. > > The old and new printing orders are shown below for ZGC: > > # Old > > > > > > > > > > # New > > > > > > > > Testing: > * GHA > * Tiers 1 & 2 > * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. > > ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt > ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt This pull request has now been integrated. Changeset: c494a00a Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/c494a00a66d21d2e403fd9ce253eb132c34e455d Stats: 141 lines in 16 files changed: 75 ins; 52 del; 14 mod 8353559: Restructure CollectedHeap error printing Reviewed-by: stefank, eosterlund, ayang ------------- PR: https://git.openjdk.org/jdk/pull/24387 From ysr at openjdk.org Tue Apr 8 01:29:09 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 8 Apr 2025 01:29:09 GMT Subject: RFR: 8353218: Shenandoah: Out of date comment references Brooks pointers In-Reply-To: References: Message-ID: <-zSlCWIHyxeR9-mjP1si49UGzRl9qMSSWscVELQxYAQ=.8f6e1108-bd47-49f8-918b-c2f6c9eb640b@github.com> On Fri, 28 Mar 2025 23:33:48 GMT, William Kemper wrote: > Trivial change, comment only. Thanks for fixing this! ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24304#pullrequestreview-2748445404 From tschatzl at openjdk.org Tue Apr 8 11:57:21 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 8 Apr 2025 11:57:21 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 [v2] In-Reply-To: References: Message-ID: <_1K7Q1L9cPr-wd5jefhS6rBjR0sJvbBWsjf71YbR6k4=.0c0c89ae-15a8-4e9b-a3fb-7c028740b15c@github.com> On Wed, 2 Apr 2025 11:15:01 GMT, Stefan Karlsson wrote: >> We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). >> >> The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. >> >> Thanks to @plummercj for digging into this and proposing the same workaround. >> >> Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Remove test from ProblemList Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24349#pullrequestreview-2749704531 From stefank at openjdk.org Tue Apr 8 15:22:49 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 8 Apr 2025 15:22:49 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 [v3] In-Reply-To: References: Message-ID: > We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). > > The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. > > Thanks to @plummercj for digging into this and proposing the same workaround. > > Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8352994_is_error_reported - Remove test from ProblemList - 8352994: ZGC: Fix regression introduced in JDK-8350572 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24349/files - new: https://git.openjdk.org/jdk/pull/24349/files/fe07a340..4720444d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24349&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24349&range=01-02 Stats: 26029 lines in 781 files changed: 18783 ins; 5145 del; 2101 mod Patch: https://git.openjdk.org/jdk/pull/24349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24349/head:pull/24349 PR: https://git.openjdk.org/jdk/pull/24349 From ysr at openjdk.org Tue Apr 8 21:54:25 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 8 Apr 2025 21:54:25 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 00:33:58 GMT, Xiaolong Peng wrote: >> Right, active_generation should be used instead of global_generation to get the complete marking context, with the context of full GC, even we know it active_generation is the global gen, but it's better not to use global_generation directly for better maintainable code. > > Updated it to use active_generation. Thanks for the fixes; this looks good! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2034084571 From wkemper at openjdk.org Tue Apr 8 22:04:34 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 8 Apr 2025 22:04:34 GMT Subject: Integrated: 8353218: Shenandoah: Out of date comment references Brooks pointers In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 23:33:48 GMT, William Kemper wrote: > Trivial change, comment only. This pull request has now been integrated. Changeset: b4ab964b Author: William Kemper URL: https://git.openjdk.org/jdk/commit/b4ab964b72c631632511e6f01cdd5a47fb2e31fa Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8353218: Shenandoah: Out of date comment references Brooks pointers Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/24304 From ysr at openjdk.org Tue Apr 8 23:30:27 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 8 Apr 2025 23:30:27 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v8] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 18:18:30 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address PR comments LGTM! Thanks for your patience! ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23886#pullrequestreview-2751608049 From ysr at openjdk.org Tue Apr 8 23:30:28 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 8 Apr 2025 23:30:28 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> <6dN8IY3rHlVn2aiHJwWdB-OKbbx8GABuvau9-Bdw6vU=.a74101a0-845d-4174-a87a-b41674e90579@github.com> Message-ID: On Fri, 4 Apr 2025 18:09:36 GMT, Xiaolong Peng wrote: >> Curious; in that case should it not have failed in your testing because the objects not pinned may not have been marked as the verifier would have insisted they were? Why do we leave the regions with pinned objects marked? I am guessing once we have filled in the dead objects, the marks do not serve any purpose? >> >> May be I am missing some corner case? > > It does, one of the changes in https://github.com/openjdk/jdk/pull/24092 is to set the marking completeness flag to false after Full GC because the bitmaps have been reset, `_verify_marked_complete` requires complete marking marking context so there is assert error. Thanks; I looked through the code and see where I had confused myself above. This looks good to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2034163269 From xpeng at openjdk.org Tue Apr 8 23:45:33 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 8 Apr 2025 23:45:33 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v8] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 18:18:30 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address PR comments thanks all for the reviews and suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23886#issuecomment-2787875051 From duke at openjdk.org Tue Apr 8 23:45:33 2025 From: duke at openjdk.org (duke) Date: Tue, 8 Apr 2025 23:45:33 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v8] In-Reply-To: References: Message-ID: <3CxQWRmVEeYX_O3D2Lh5-1GiTRLSZRkaNKDc3ztM2ZE=.68ecc2fe-b5e1-4e62-a58e-0de858d9dc5f@github.com> On Fri, 4 Apr 2025 18:18:30 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address PR comments @pengxiaolong Your change (at version d4af962adb11c03281af80ecfc12344dac01b11a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23886#issuecomment-2787877229 From xpeng at openjdk.org Tue Apr 8 23:45:34 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 8 Apr 2025 23:45:34 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> <6dN8IY3rHlVn2aiHJwWdB-OKbbx8GABuvau9-Bdw6vU=.a74101a0-845d-4174-a87a-b41674e90579@github.com> Message-ID: <6D387djX5BAacoBeaJCLj1HGYsNoRm3lTWVipWp6vn0=.ed5303a1-0107-405f-a0d0-e1360315fc46@github.com> On Tue, 8 Apr 2025 23:27:33 GMT, Y. Srinivas Ramakrishna wrote: >> It does, one of the changes in https://github.com/openjdk/jdk/pull/24092 is to set the marking completeness flag to false after Full GC because the bitmaps have been reset, `_verify_marked_complete` requires complete marking marking context so there is assert error. > > Thanks; I looked through the code and see where I had confused myself above. This looks good to me. thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2034171337 From kdnilsen at openjdk.org Wed Apr 9 00:20:29 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 00:20:29 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 17:49:38 GMT, William Kemper wrote: >> The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. >> >> However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. >> >> This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 159: > >> 157: >> 158: inline size_t ShenandoahHeapRegion::get_mixed_candidate_live_data_bytes() const { >> 159: assert(SafepointSynchronize::is_at_safepoint(), "Should be at Shenandoah safepoint"); > > Could we use `shenandoah_assert_safepoint` here (and other places) instead? Good call. I'll make this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2034198164 From kdnilsen at openjdk.org Wed Apr 9 00:29:25 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 00:29:25 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 18:16:43 GMT, William Kemper wrote: >> The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. >> >> However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. >> >> This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 78: > >> 76: _live_data(0), >> 77: _critical_pins(0), >> 78: _mixed_candidate_garbage_words(0), > > Do we need a new field to track this? During `final_mark`, we call `increase_live_data_alloc_words` to add `TAMS + top` to `_live_data` to account for objects allocated during mark. Could we "fix" `get_live_data` so that it always returned marked objects (counted by `increase_live_data_gc_words`) _plus_ `top - TAMS`. This way, the live data would not become stale after `final_mark` and we wouldn't have another field to manage. What do you think? This is a good idea. Let me experiment with this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2034208988 From xpeng at openjdk.org Wed Apr 9 01:02:41 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 9 Apr 2025 01:02:41 GMT Subject: Integrated: 8351091: Shenandoah: global marking context completeness is not accurately maintained In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:34:16 GMT, Xiaolong Peng wrote: > With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. > > This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] Tier 1 > - [x] Tier 2 This pull request has now been integrated. Changeset: aec1fe0a Author: Xiaolong Peng Committer: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/jdk/commit/aec1fe0a17fa6801e26a517d4d21656353409f7c Stats: 71 lines in 17 files changed: 7 ins; 34 del; 30 mod 8351091: Shenandoah: global marking context completeness is not accurately maintained Reviewed-by: ysr, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/23886 From kdnilsen at openjdk.org Wed Apr 9 01:55:48 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 01:55:48 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: <8UF5sC8lbb-hBUpkbzDarvFxOlbQU0nDPbTqWhAedM0=.e078bb2a-2331-47f7-aa67-807d09c4ca11@github.com> > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Experiment with reviewer suggestion Redefine the way ShenandoahHeapRegion::get_live_data_ works to simplify changes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/70613882..3c1f788a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=00-01 Stats: 28 lines in 5 files changed: 15 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From stefank at openjdk.org Wed Apr 9 06:22:35 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Apr 2025 06:22:35 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 15:22:49 GMT, Stefan Karlsson wrote: >> We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). >> >> The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. >> >> Thanks to @plummercj for digging into this and proposing the same workaround. >> >> Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8352994_is_error_reported > - Remove test from ProblemList > - 8352994: ZGC: Fix regression introduced in JDK-8350572 Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24349#issuecomment-2788390540 From stefank at openjdk.org Wed Apr 9 06:22:35 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Apr 2025 06:22:35 GMT Subject: Integrated: 8352994: ZGC: Fix regression introduced in JDK-8350572 In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 06:58:56 GMT, Stefan Karlsson wrote: > We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). > > The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. > > Thanks to @plummercj for digging into this and proposing the same workaround. > > Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline This pull request has now been integrated. Changeset: 3340e13f Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/3340e13fd0a8d25212003e8371a135471b2f44b3 Stats: 2 lines in 2 files changed: 0 ins; 1 del; 1 mod 8352994: ZGC: Fix regression introduced in JDK-8350572 Reviewed-by: aboldtch, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/24349 From manc at openjdk.org Wed Apr 9 07:27:33 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 9 Apr 2025 07:27:33 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag Thank you for creating [JDK-8353716](https://bugs.openjdk.org/browse/JDK-8353716)! > Last time this has been mentioned in the hotspot-gc-dev list has been [here](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051079.html). I remember giving multiple outlines to everyone involved earlier, each mentioning that `Min/MaxHeapFreeRatio` need to go away because it's in the way, so I was/am a bit surprised on this response. Apology for overlooking previous mentions about `Min/MaxHeapFreeRatio`. Previous mentions were mostly inside responses to complicated issues, and I have hardly got the time to follow hotspot-gc-dev closely. To be honest, we didn't pay much attention to `Min/MaxHeapFreeRatio` before I started working on this PR. I guess this is a good example that a one-pager doc/umbrella bug provides cleaner communication and additional values over email discussion, especially when one party already has a pretty detailed plan for how it should be done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2788609820 From manc at google.com Wed Apr 9 07:44:08 2025 From: manc at google.com (Man Cao) Date: Wed, 9 Apr 2025 00:44:08 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> Message-ID: Hi all, Thank you Thomas for creating the umbrella CR at https://bugs.openjdk.org/browse/JDK-8353716. While waiting a bit on SoftMaxHeapSize PR ( https://github.com/openjdk/jdk/pull/24211) to see if others have feedback, I could start working on CurrentMaxHeapSize ( https://bugs.openjdk.org/browse/JDK-8204088). I also agree that CurrentMaxHeapSize may not need a JEP due to its small size and low complexity. Should it proceed similarly to how SoftMaxHeapSize was introduced? I.e, https://bugs.openjdk.org/browse/JDK-8222145, and creating a CSR (https://bugs.openjdk.org/browse/JDK-8222181) for it. Separately, for removing support for Min/MaxHeapFreeRatio for G1 (mentioned in https://bugs.openjdk.org/browse/JDK-8353716 and https://bugs.openjdk.org/browse/JDK-8238686), how do we handle existing users that set these two flags? (We have very few internal users setting these two flags. But yesterday I ran into a use case that sets -XX:MinHeapFreeRatio=0 -XX:MaxHeapFreeRatio=0 for G1...) Best, Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Wed Apr 9 07:56:37 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 07:56:37 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 07:24:43 GMT, Man Cao wrote: > > Last time this has been mentioned in the hotspot-gc-dev list has been [here](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051079.html). I remember giving multiple outlines to everyone involved earlier, each mentioning that `Min/MaxHeapFreeRatio` need to go away because it's in the way, so I was/am a bit surprised on this response. > > Apology for overlooking previous mentions about `Min/MaxHeapFreeRatio`. Previous mentions were mostly inside responses to complicated issues, and I have hardly got the time to follow hotspot-gc-dev closely. To be honest, we didn't pay much attention to `Min/MaxHeapFreeRatio` before I started working on this PR. > > I guess this is a good example that a one-pager doc/umbrella bug provides cleaner communication and additional values over email discussion, especially when one party already has a pretty detailed plan for how it should be done. Don't worry, I should have been better with following up with that summary about thoughts/plans communicated so far somewhere publicly. Let's go forward with that CR summarizing the respective (current) general direction. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2788687511 From ayang at openjdk.org Wed Apr 9 10:36:44 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 9 Apr 2025 10:36:44 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 08:10:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: > > - * missing file from merge > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq > - * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update > - * fix IR code generation tests that change due to barrier cost changes > - * factor out card table and refinement table merging into a single > method > - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 > - * obsolete G1UpdateBufferSize > > G1UpdateBufferSize has previously been used to size the refinement > buffers and impose a minimum limit on the number of cards per thread > that need to be pending before refinement starts. > > The former function is now obsolete with the removal of the dirty > card queues, the latter functionality has been taken over by the new > diagnostic option `G1PerThreadPendingCardThreshold`. > > I prefer to make this a diagnostic option is better than a product option > because it is something that is only necessary for some test cases to > produce some otherwise unwanted behavior (continuous refinement). > > CSR is pending. > - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 170: > 168: } > 169: return result; > 170: } I see in `G1ConcurrentRefineThread::do_refinement`: // The yielding may have completed the task, check. if (!state.is_in_progress()) { I wonder if it's simpler to use `is_in_progress` consistently to detect whether we should restart sweep, instead of `_sweep_start_epoch`. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 349: > 347: } > 348: > 349: bool has_sweep_rt_work = is_in_progress() && _state == State::SweepRT; Why `is_in_progress()`? src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 79: > 77: > 78: void inc_cards_scanned(size_t increment = 1) { _cards_scanned += increment; } > 79: void inc_cards_clean(size_t increment = 1) { _cards_clean += increment; } The sole caller always passes in arg, so no need for default-arg-value. src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 87: > 85: void add_atomic(G1ConcurrentRefineStats* other); > 86: > 87: G1ConcurrentRefineStats& operator+=(const G1ConcurrentRefineStats& other); Seems that these operators are not used after this PR. src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 83: > 81: break; > 82: } > 83: case G1RemSet::HasRefToOld : break; // Nothing special to do. Why doesn't call `inc_cards_clean_again` in this case? The card is cleared also. (In fact, I don't get why this needs to a separate case from `NoInteresting`.) src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 156: > 154: > 155: _refine_stats.inc_cards_scanned(claim.size()); > 156: _refine_stats.inc_cards_clean(claim.size() - scanned); I feel these two "scanned" mean sth diff; the local var should probably be sth like `num_dirty_cards`. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 207: > 205: > 206: if (!interrupted_by_gc) { > 207: state.add_yield_duration(G1CollectedHeap::heap()->safepoint_duration() - synchronize_duration_at_sweep_start); I think this is recorded to later calculate actual refine-time, i.e. sweep-time - yield-time. However, why can't yield-duration be recorded in this refine-control-thread directly -- accumulation of `jlong yield_duration = os::elapsed_counter() - yield_start`. I feel that is easier to reason than going through g1heap. src/hotspot/share/gc/g1/g1ReviseYoungListTargetLengthTask.cpp line 75: > 73: { > 74: MutexLocker x(G1ReviseYoungLength_lock, Mutex::_no_safepoint_check_flag); > 75: G1Policy* p = g1h->policy(); Can probably use the existing `policy`. src/hotspot/share/gc/g1/g1ReviseYoungListTargetLengthTask.cpp line 88: > 86: } > 87: > 88: G1ReviseYoungLengthTargetLengthTask::G1ReviseYoungLengthTargetLengthTask(const char* name) : I wonder if the class name can be shortened a bit, sth like `G1ReviseYoungLengthTask`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033251162 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033222407 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033929489 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033975054 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033934399 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033910496 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2032008908 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2029855278 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2029855435 From duke at openjdk.org Wed Apr 9 10:48:48 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Wed, 9 Apr 2025 10:48:48 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly Message-ID: After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. before this patch: ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops bool UseCompressedOops = false {product lp64_product} {default} openjdk version "25-internal" 2025-09-16 OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) after this patch: ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops bool UseCompressedOops = true {product lp64_product} {ergonomic} openjdk version "25-internal" 2025-09-16 OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) ------------- Commit messages: - 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly Changes: https://git.openjdk.org/jdk/pull/24541/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354145 Stats: 8 lines in 3 files changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24541/head:pull/24541 PR: https://git.openjdk.org/jdk/pull/24541 From tschatzl at openjdk.org Wed Apr 9 11:26:24 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 11:26:24 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 10:37:24 GMT, Tongbao Zhang wrote: > After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. > So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. > > When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. > > Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. > > before this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = false {product lp64_product} {default} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > > after this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = true {product lp64_product} {ergonomic} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) Would it be possible to add a regression test that checks the value of the `UseCompressedOops` flag after running a VM with these settings? ------------- PR Review: https://git.openjdk.org/jdk/pull/24541#pullrequestreview-2753132517 From duke at openjdk.org Wed Apr 9 11:37:39 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Wed, 9 Apr 2025 11:37:39 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 11:23:56 GMT, Thomas Schatzl wrote: > Would it be possible to add a regression test that checks the value of the `UseCompressedOops` flag after running a VM with these settings? Thanks for your suggestion, I will add a test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24541#issuecomment-2789382638 From rcastanedalo at openjdk.org Wed Apr 9 12:03:49 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 9 Apr 2025 12:03:49 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> On Fri, 4 Apr 2025 08:10:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: > > - * missing file from merge > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq > - * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update > - * fix IR code generation tests that change due to barrier cost changes > - * factor out card table and refinement table merging into a single > method > - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 > - * obsolete G1UpdateBufferSize > > G1UpdateBufferSize has previously been used to size the refinement > buffers and impose a minimum limit on the number of cards per thread > that need to be pending before refinement starts. > > The former function is now obsolete with the removal of the dirty > card queues, the latter functionality has been taken over by the new > diagnostic option `G1PerThreadPendingCardThreshold`. > > I prefer to make this a diagnostic option is better than a product option > because it is something that is only necessary for some test cases to > produce some otherwise unwanted behavior (continuous refinement). > > CSR is pending. > - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f Hi Thomas, great simplification and encouraging results! I reviewed the compiler-related parts of the changeset, including x64 and aarch64 changes. src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 246: > 244: __ cbz(new_val, done); > 245: } > 246: // Storing region crossing non-null, is card young? Suggestion: // Storing region crossing non-null. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101: > 99: } > 100: > 101: void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators, Have you measured the performance impact of inlining this assembly code instead of resorting to a runtime call as done before? Is it worth the maintenance cost (for every platform), risk of introducing bugs, etc.? src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 145: > 143: > 144: __ bind(is_clean_card); > 145: // Card was clean. Dirty card and go to next.. This code seems unreachable if `!UseCondCardMark`, meaning we only dirty cards here if `UseCondCardMark` is enabled. Is that intentional? src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 319: > 317: const Register thread, > 318: const Register tmp1, > 319: const Register tmp2, Since `tmp2` is not needed in the x64 post-barrier, I suggest not passing it around for this platform, for simplicity and also to make optimization opportunities more visible in the future. Here is my suggestion: https://github.com/robcasloz/jdk/commit/855ec8df4a641f8c491c5c09acea3ee434b7e230, feel free to merge if you agree. src/hotspot/share/gc/g1/c1/g1BarrierSetC1.cpp line 38: > 36: #include "c1/c1_LIRAssembler.hpp" > 37: #include "c1/c1_MacroAssembler.hpp" > 38: #endif // COMPILER1 I suggest removing the conditional compilation directives and grouping these includes together with the above `c1` ones. src/hotspot/share/gc/g1/c1/g1BarrierSetC1.cpp line 147: > 145: state->do_input(_thread); > 146: > 147: // Use temp registers to ensure these they use different registers. Suggestion: // Use temps to enforce different registers. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 307: > 305: + 6 // same region check: Uncompress (new_val) oop, xor, shr, (cmp), jmp > 306: + 4 // new_val is null check > 307: + 4; // card not clean check. It probably does not affect the unrolling heuristics too much, but you may want to make the last cost component conditional on `UseCondCardMark`. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 396: > 394: bool needs_liveness_data(const MachNode* mach) const { > 395: return G1BarrierStubC2::needs_pre_barrier(mach) || > 396: G1BarrierStubC2::needs_post_barrier(mach); Suggestion: // Liveness data is only required to compute registers that must be // preserved across the runtime call in the pre-barrier stub. return G1BarrierStubC2::needs_pre_barrier(mach); src/hotspot/share/gc/g1/g1BarrierSet.hpp line 56: > 54: // > 55: // The refinement threads mark cards in the current collection set specially on the > 56: // card table - this is fine wrt to synchronization with the mutator, because at Suggestion: // card table - this is fine wrt synchronization with the mutator, because at test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java line 521: > 519: phase = CompilePhase.FINAL_CODE) > 520: @IR(counts = {IRNode.COUNTED_LOOP, "2"}, > 521: phase = CompilePhase.FINAL_CODE) I suggest to remove this extra IR check to avoid over-specifying the expected loop shape. For example, running this test with loop unrolling disabled (`-XX:LoopUnrollLimit=0`) would now fail because only one counted loop would be found. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23739#pullrequestreview-2753154117 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035174209 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035175921 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035177738 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035183250 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035186980 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035192666 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035210464 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035196251 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035198219 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035201056 From tschatzl at openjdk.org Wed Apr 9 12:41:40 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 12:41:40 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Wed, 9 Apr 2025 11:35:26 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: >> >> - * missing file from merge >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq >> - * make young gen length revising independent of refinement thread >> * use a service task >> * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update >> - * fix IR code generation tests that change due to barrier cost changes >> - * factor out card table and refinement table merging into a single >> method >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 >> - * obsolete G1UpdateBufferSize >> >> G1UpdateBufferSize has previously been used to size the refinement >> buffers and impose a minimum limit on the number of cards per thread >> that need to be pending before refinement starts. >> >> The former function is now obsolete with the removal of the dirty >> card queues, the latter functionality has been taken over by the new >> diagnostic option `G1PerThreadPendingCardThreshold`. >> >> I prefer to make this a diagnostic option is better than a product option >> because it is something that is only necessary for some test cases to >> produce some otherwise unwanted behavior (continuous refinement). >> >> CSR is pending. >> - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 145: > >> 143: >> 144: __ bind(is_clean_card); >> 145: // Card was clean. Dirty card and go to next.. > > This code seems unreachable if `!UseCondCardMark`, meaning we only dirty cards here if `UseCondCardMark` is enabled. Is that intentional? Great find! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035280909 From tschatzl at openjdk.org Wed Apr 9 12:50:42 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 12:50:42 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Wed, 9 Apr 2025 11:34:09 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: >> >> - * missing file from merge >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq >> - * make young gen length revising independent of refinement thread >> * use a service task >> * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update >> - * fix IR code generation tests that change due to barrier cost changes >> - * factor out card table and refinement table merging into a single >> method >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 >> - * obsolete G1UpdateBufferSize >> >> G1UpdateBufferSize has previously been used to size the refinement >> buffers and impose a minimum limit on the number of cards per thread >> that need to be pending before refinement starts. >> >> The former function is now obsolete with the removal of the dirty >> card queues, the latter functionality has been taken over by the new >> diagnostic option `G1PerThreadPendingCardThreshold`. >> >> I prefer to make this a diagnostic option is better than a product option >> because it is something that is only necessary for some test cases to >> produce some otherwise unwanted behavior (continuous refinement). >> >> CSR is pending. >> - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101: > >> 99: } >> 100: >> 101: void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators, > > Have you measured the performance impact of inlining this assembly code instead of resorting to a runtime call as done before? Is it worth the maintenance cost (for every platform), risk of introducing bugs, etc.? I remember significant impact in some microbenchmark. It's also inlined in Parallel GC. I do not consider it a big issue wrt to maintenance - these things never really change, and the method is small and contained. I will try to redo numbers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035298557 From thomas.schatzl at oracle.com Wed Apr 9 14:05:56 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 9 Apr 2025 16:05:56 +0200 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> Message-ID: <13c7d913-e61f-47af-a299-6c6b6e2d45f6@oracle.com> Hi Man, On 09.04.25 09:44, Man Cao wrote: > Hi all, > > Thank you Thomas for creating the umbrella CR at https:// > bugs.openjdk.org/browse/JDK-8353716 JDK-8353716>. > While waiting a bit on SoftMaxHeapSize PR (https://github.com/openjdk/ > jdk/pull/24211) to see if others have feedback, I could start working on > CurrentMaxHeapSize (https://bugs.openjdk.org/browse/JDK-8204088). > I also agree that?CurrentMaxHeapSize may not need a JEP due to its small > size and low complexity. Should it proceed similarly to how > SoftMaxHeapSize was introduced? I.e, https://bugs.openjdk.org/browse/ > JDK-8222145, and creating > a CSR (https://bugs.openjdk.org/browse/JDK-8222181) for it. I think this is the best way forward. There is no need for a JEP from me either. Exact behavior in various situations needs to be defined in the CSR. > > Separately, for removing support for?Min/MaxHeapFreeRatio for G1 > (mentioned in https://bugs.openjdk.org/browse/JDK-8353716 and https://bugs.openjdk.org/ > browse/JDK-8238686), how > do we handle existing users that set these two flags? After searching the web a little, it seems that these flags are actually in use, and recommended to be used (e.g. in default settings). So we need some transition strategy to get off them, and can't just remove. One option is to translate these options into other values impacting heap size "similarly". E.g. have Min/MaxHeapFreeRatio translate to internal pressure at the time the changes are noticed, but that is just a potential solution that hand-waves away the effort for that. Then start deprecating and remove; depends a little on how useful (or how much in the way) they are for Serial and Parallel GC (other collectors don't support them). It is unlikely that ZGC and Shenandoah will adopt these. Even already in JDK-8238687 Min/MaxHeapFreeRatio happily work to counter the cpu based sizing, so some solution needs to be found there already. That change will already be quite disruptive in terms of impact on heap sizing, so another option is to remove support in G1. > (We have very few internal users setting these two flags. But yesterday > I ran into a use case that sets -XX:MinHeapFreeRatio=0 - > XX:MaxHeapFreeRatio=0 for G1...) What would be the use case for setting it to these values? There seem to be little upside and lots of downside for this choice, because it likely causes a lot of GC activity since the VM will need GC to expand the heap little by little all the time, and full gc/Remark will immediately reset these expansion efforts. Thomas From tschatzl at openjdk.org Wed Apr 9 14:38:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 14:38:46 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 19:59:09 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: >> >> - * missing file from merge >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq >> - * make young gen length revising independent of refinement thread >> * use a service task >> * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update >> - * fix IR code generation tests that change due to barrier cost changes >> - * factor out card table and refinement table merging into a single >> method >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 >> - * obsolete G1UpdateBufferSize >> >> G1UpdateBufferSize has previously been used to size the refinement >> buffers and impose a minimum limit on the number of cards per thread >> that need to be pending before refinement starts. >> >> The former function is now obsolete with the removal of the dirty >> card queues, the latter functionality has been taken over by the new >> diagnostic option `G1PerThreadPendingCardThreshold`. >> >> I prefer to make this a diagnostic option is better than a product option >> because it is something that is only necessary for some test cases to >> produce some otherwise unwanted behavior (continuous refinement). >> >> CSR is pending. >> - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f > > src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 83: > >> 81: break; >> 82: } >> 83: case G1RemSet::HasRefToOld : break; // Nothing special to do. > > Why doesn't call `inc_cards_clean_again` in this case? The card is cleared also. (In fact, I don't get why this needs to a separate case from `NoInteresting`.) "NoInteresting" means that the card contains no interesting reference at all. "HasRefToOld" means that there has been an interesting reference in the card. The distinction between these groups of cards seems interesting to me. E.g. out of X non-clean cards, there were A with a reference to the collection set, B that were already marked as containing a card to the collection, C not having any interesting card any more (transitioned from clean -> dirty -> clean, and cleared by the mutator), D being non-parsable, and E having references to old (and no other references). I could add a separate counter for these type of cards too - they can be inferred from the total number of scanned minus the others though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035512686 From erik.osterlund at oracle.com Wed Apr 9 15:22:12 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 9 Apr 2025 15:22:12 +0000 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> Message-ID: <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> Hi Man, Sorry to butt in. A high level question about the AHS plan for G1? are we interested in the intermediate functionality (SoftMaxHeapSize and CurrentMaxHeapSize), or is it AHS that we are interested in? The reason I ask is that each incremental feature comes with some baggage due to being a (somewhat) static and manually set limit, which the AHS solution won?t need to deal with. For example, it?s unclear how a *static* SoftMaxHeapSize should behave when the livee set is larger than the limit. While that can maybe be solved in some reasonable way, it?s worth noting that AHS won?t need the solution, because there it?s a dynamic limit that the GC simply won?t set lower than the memory usage after GC. It will however get in the way because the user can now also set a SoftMaxHeapSize that conflicts with the AHS soft heap size that the JVM wants to use, and then we gotta deal with that. Similarly, the CurrentMaxHeapSize adds another way for users to control (read: mess up) the JVM behaviour that we need to respect. In the end, AHS will compute this dynamically instead depending on environment circumstances. I suspect the fact that it can also be manually set in a way that conflicts with what the JVM wants to do, will end up being a pain. I?m not against the plan of building these incremental features, especially if we want them in isolation. But if it?s AHS we want, then I wonder if it would be easier to go straight for what we need for AHS without the intermediate user exposed steps, because they might introduce unnecessary problems along the way. My 50c, no strong opinion though. /Erik On 9 Apr 2025, at 09:44, Man Cao wrote: Hi all, Thank you Thomas for creating the umbrella CR at https://bugs.openjdk.org/browse/JDK-8353716. While waiting a bit on SoftMaxHeapSize PR (https://github.com/openjdk/jdk/pull/24211) to see if others have feedback, I could start working on CurrentMaxHeapSize (https://bugs.openjdk.org/browse/JDK-8204088). I also agree that CurrentMaxHeapSize may not need a JEP due to its small size and low complexity. Should it proceed similarly to how SoftMaxHeapSize was introduced? I.e, https://bugs.openjdk.org/browse/JDK-8222145, and creating a CSR (https://bugs.openjdk.org/browse/JDK-8222181) for it. Separately, for removing support for Min/MaxHeapFreeRatio for G1 (mentioned in https://bugs.openjdk.org/browse/JDK-8353716 and https://bugs.openjdk.org/browse/JDK-8238686), how do we handle existing users that set these two flags? (We have very few internal users setting these two flags. But yesterday I ran into a use case that sets -XX:MinHeapFreeRatio=0 -XX:MaxHeapFreeRatio=0 for G1...) Best, Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirk at kodewerk.com Wed Apr 9 16:14:18 2025 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Wed, 9 Apr 2025 09:14:18 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> Message-ID: <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> > On Apr 9, 2025, at 8:22?AM, Erik Osterlund wrote: > > Hi Man, > > Sorry to butt in. A high level question about the AHS plan for G1? are we interested in the > intermediate functionality (SoftMaxHeapSize and CurrentMaxHeapSize), or is it AHS that > we are interested in? > > The reason I ask is that each incremental feature comes with some baggage due to being > a (somewhat) static and manually set limit, which the AHS solution won?t need to deal with. > > For example, it?s unclear how a *static* SoftMaxHeapSize should behave when the livee set > is larger than the limit. While that can maybe be solved in some reasonable way, it?s worth > noting that AHS won?t need the solution, because there it?s a dynamic limit that the GC simply > won?t set lower than the memory usage after GC. It will however get in the way because the > user can now also set a SoftMaxHeapSize that conflicts with the AHS soft heap size that > the JVM wants to use, and then we gotta deal with that. > > Similarly, the CurrentMaxHeapSize adds another way for users to control (read: mess up) > the JVM behaviour that we need to respect. In the end, AHS will compute this dynamically > instead depending on environment circumstances. I suspect the fact that it can also be > manually set in a way that conflicts with what the JVM wants to do, will end up being a pain. I would agree and to this point, I?ve rarely found ratios to be useful. In general, eden, survivor, and old each play a different role in object life cycle and as such each should be tuned separately from each other. Min/Max heap is the sum of the needs of the parts. Being able to meet the needs of eden, survivor and old by simply setting a max heap and relying on ratios is wishful thinking that sometimes comes true. Might I suggest that an entirely new (experimental?) adaptive size policy be introduced that makes use of current flags in a manner that is appropriate to the new policy. That policy would calculate a size of Eden to control GC frequency, a size of survivor to limit promotion of transients, and a tenured large enough to accommodate the live set as well as manage the expected number of humongous allocations. If global heap pressure won?t support the ensuing max heap size, then the cost could be smaller eden implying higher GC overhead due to increased frequency. Metrics to support eden sizing would be allocation rate. The age table with premature promotion rates would be used to estimate the size of survivor. Live set size with a recent history of humongous allocations would be used for tenured. There will need to be a dampening strategy in play. My current (dumb) idea for Serial is to set an overhead threshold delta that needs to be exceeded to trigger a resize. > > I?m not against the plan of building these incremental features, especially if we want them > in isolation. But if it?s AHS we want, then I wonder if it would be easier to go straight for what > we need for AHS without the intermediate user exposed steps, because they might introduce > unnecessary problems along the way. I would agree with this. And I would suggest that the way to achieve it is to introduce a new experimental ASP. > > My 50c, no strong opinion though. From kdnilsen at openjdk.org Wed Apr 9 17:05:38 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 17:05:38 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 00:27:17 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 78: >> >>> 76: _live_data(0), >>> 77: _critical_pins(0), >>> 78: _mixed_candidate_garbage_words(0), >> >> Do we need a new field to track this? During `final_mark`, we call `increase_live_data_alloc_words` to add `TAMS + top` to `_live_data` to account for objects allocated during mark. Could we "fix" `get_live_data` so that it always returned marked objects (counted by `increase_live_data_gc_words`) _plus_ `top - TAMS`. This way, the live data would not become stale after `final_mark` and we wouldn't have another field to manage. What do you think? > > This is a good idea. Let me experiment with this. My experiment with an initial attempt at this failed with over 60 failures. The "problem" is that we often consult get_live_data() in contexts from which it is "not appropriate" to add (top- TAMS) to the atomic volatile ShenandoahHeapRegion::_live_data() . I think most of these are asserts. I have so far confirmed that there are at least two different places that need to be fixed. Not sure how many total scenarios. I'm willing to move forward with changes to the failing asserts to make this change work. I think the code would be cleaner with your suggested refactor. It just makes this PR a little more far-reaching than the original. See the most recent commit on this PR to see the direction this would move us. Let me know if you think I should move forward with more refactoring, or revert this most recent change. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2035784267 From ayang at openjdk.org Wed Apr 9 17:38:54 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 9 Apr 2025 17:38:54 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio Message-ID: Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. Test: tier1-7 ------------- Commit messages: - pgc-min-initial-fix Changes: https://git.openjdk.org/jdk/pull/24556/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24556&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354228 Stats: 15 lines in 3 files changed: 12 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24556.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24556/head:pull/24556 PR: https://git.openjdk.org/jdk/pull/24556 From wkemper at openjdk.org Wed Apr 9 17:53:31 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 9 Apr 2025 17:53:31 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 17:02:40 GMT, Kelvin Nilsen wrote: >> This is a good idea. Let me experiment with this. > > My experiment with an initial attempt at this failed with over 60 failures. The "problem" is that we often consult get_live_data() in contexts from which it is "not appropriate" to add (top- TAMS) to the atomic volatile ShenandoahHeapRegion::_live_data() . I think most of these are asserts. I have so far confirmed that there are at least two different places that need to be fixed. Not sure how many total scenarios. > > I'm willing to move forward with changes to the failing asserts to make this change work. I think the code would be cleaner with your suggested refactor. It just makes this PR a little more far-reaching than the original. > > See the most recent commit on this PR to see the direction this would move us. Let me know if you think I should move forward with more refactoring, or revert this most recent change. > > Thanks. It does look simpler. Do you have an example of one of the failing asserts? One thing I hadn't considered is how "hot" `ShenandoahHeapRegion::get_live_data_words` is. Is there going to be a significant performance hit if we make this method do more work? It does look like this method is called frequently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2035852703 From kdnilsen at openjdk.org Wed Apr 9 18:03:47 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 18:03:47 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 17:51:06 GMT, William Kemper wrote: >> My experiment with an initial attempt at this failed with over 60 failures. The "problem" is that we often consult get_live_data() in contexts from which it is "not appropriate" to add (top- TAMS) to the atomic volatile ShenandoahHeapRegion::_live_data() . I think most of these are asserts. I have so far confirmed that there are at least two different places that need to be fixed. Not sure how many total scenarios. >> >> I'm willing to move forward with changes to the failing asserts to make this change work. I think the code would be cleaner with your suggested refactor. It just makes this PR a little more far-reaching than the original. >> >> See the most recent commit on this PR to see the direction this would move us. Let me know if you think I should move forward with more refactoring, or revert this most recent change. >> >> Thanks. > > It does look simpler. Do you have an example of one of the failing asserts? > > One thing I hadn't considered is how "hot" `ShenandoahHeapRegion::get_live_data_words` is. Is there going to be a significant performance hit if we make this method do more work? It does look like this method is called frequently. Examples: FullGC worker: void ShenandoahMCResestCompleteBitmapTask::work(uint worker_id) { ShenandoahParallelWorkerSession worker_session(worker_id); ShenandoahHeapRegion* region = _regions.next(); ShenandoahHeap* heap = ShenandoahHeap::heap(); ShenandoahMarkingContext* const ctx = heap->complete_marking_context(); while (region != nullptr) { if (heap->is_bitmap_slice_committed(region) && !region->is_pinned() && region->has_marked()) { // kelvin replacing has_live() with new method has_marked() because has_live() calls get_live_data_words() // and pointer_delta() asserts out because TAMS is not less than top(). has_marked() does what has_live() // used to do... ctx->clear_bitmap(region); } region = _regions.next(); } } ShenandoahInitMarkUpdateRegionStateClosure::heap_region_do() { - assert(!r->has_live(), "Region %zu should have no live data", r->index()); + assert(!r->has_marked(), "Region %zu should have no marked data", r->index()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2035869108 From kdnilsen at openjdk.org Wed Apr 9 18:18:27 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 18:18:27 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 18:01:03 GMT, Kelvin Nilsen wrote: >> It does look simpler. Do you have an example of one of the failing asserts? >> >> One thing I hadn't considered is how "hot" `ShenandoahHeapRegion::get_live_data_words` is. Is there going to be a significant performance hit if we make this method do more work? It does look like this method is called frequently. > > Examples: > FullGC worker: > void ShenandoahMCResestCompleteBitmapTask::work(uint worker_id) { > ShenandoahParallelWorkerSession worker_session(worker_id); > ShenandoahHeapRegion* region = _regions.next(); > ShenandoahHeap* heap = ShenandoahHeap::heap(); > ShenandoahMarkingContext* const ctx = heap->complete_marking_context(); > while (region != nullptr) { > if (heap->is_bitmap_slice_committed(region) && !region->is_pinned() && region->has_marked()) { > // kelvin replacing has_live() with new method has_marked() because has_live() calls get_live_data_words() > // and pointer_delta() asserts out because TAMS is not less than top(). has_marked() does what has_live() > // used to do... > ctx->clear_bitmap(region); > } > region = _regions.next(); > } > } > > ShenandoahInitMarkUpdateRegionStateClosure::heap_region_do() { > - assert(!r->has_live(), "Region %zu should have no live data", r->index()); > + assert(!r->has_marked(), "Region %zu should have no marked data", r->index()); Not sure about performance impact, other than implementing and testing... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2035888970 From kdnilsen at openjdk.org Wed Apr 9 18:24:36 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 18:24:36 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 18:15:38 GMT, Kelvin Nilsen wrote: >> Examples: >> FullGC worker: >> void ShenandoahMCResestCompleteBitmapTask::work(uint worker_id) { >> ShenandoahParallelWorkerSession worker_session(worker_id); >> ShenandoahHeapRegion* region = _regions.next(); >> ShenandoahHeap* heap = ShenandoahHeap::heap(); >> ShenandoahMarkingContext* const ctx = heap->complete_marking_context(); >> while (region != nullptr) { >> if (heap->is_bitmap_slice_committed(region) && !region->is_pinned() && region->has_marked()) { >> // kelvin replacing has_live() with new method has_marked() because has_live() calls get_live_data_words() >> // and pointer_delta() asserts out because TAMS is not less than top(). has_marked() does what has_live() >> // used to do... >> ctx->clear_bitmap(region); >> } >> region = _regions.next(); >> } >> } >> >> ShenandoahInitMarkUpdateRegionStateClosure::heap_region_do() { >> - assert(!r->has_live(), "Region %zu should have no live data", r->index()); >> + assert(!r->has_marked(), "Region %zu should have no marked data", r->index()); > > Not sure about performance impact, other than implementing and testing... i suspect performance impact is minimal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2035896982 From mdoerr at openjdk.org Wed Apr 9 22:26:31 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 9 Apr 2025 22:26:31 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 08:10:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: > > - * missing file from merge > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq > - * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update > - * fix IR code generation tests that change due to barrier cost changes > - * factor out card table and refinement table merging into a single > method > - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 > - * obsolete G1UpdateBufferSize > > G1UpdateBufferSize has previously been used to size the refinement > buffers and impose a minimum limit on the number of cards per thread > that need to be pending before refinement starts. > > The former function is now obsolete with the removal of the dirty > card queues, the latter functionality has been taken over by the new > diagnostic option `G1PerThreadPendingCardThreshold`. > > I prefer to make this a diagnostic option is better than a product option > because it is something that is only necessary for some test cases to > produce some otherwise unwanted behavior (continuous refinement). > > CSR is pending. > - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f This PR needs an update for x86 platforms when merging: g1BarrierSetAssembler_x86.cpp:117:6: error: 'class MacroAssembler' has no member named 'get_thread' ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2791114662 From kdnilsen at openjdk.org Wed Apr 9 22:32:46 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 22:32:46 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v3] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Experiment 2: refinements to reduce regressions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/3c1f788a..8ff388d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=01-02 Stats: 30 lines in 4 files changed: 23 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From kdnilsen at openjdk.org Thu Apr 10 04:36:38 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 10 Apr 2025 04:36:38 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v4] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix garbage_before_padded_for_promote() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/8ff388d1..8e820f29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=02-03 Stats: 6 lines in 1 file changed: 3 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From tschatzl at openjdk.org Thu Apr 10 07:26:28 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 07:26:28 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v31] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits: - * fixes after merge related to 32 bit x86 removal - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review: revising young gen length * robcasloz review: various minor refactorings - Do not unnecessarily pass around tmp2 in x86 - Refine needs_liveness_data - Reorder includes - * missing file from merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - ... and 35 more: https://git.openjdk.org/jdk/compare/45b7c748...39aa903f ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=30 Stats: 7118 lines in 110 files changed: 2586 ins; 3598 del; 934 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Thu Apr 10 07:28:31 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 07:28:31 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 22:24:10 GMT, Martin Doerr wrote: > This PR needs an update for x86 platforms when merging: g1BarrierSetAssembler_x86.cpp:117:6: error: 'class MacroAssembler' has no member named 'get_thread' I fixed this for now, but it will be broken again in just a bit with Aleksey's ongoing removal of x86 32 bit platform efforts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2791807489 From shade at openjdk.org Thu Apr 10 08:36:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 08:36:33 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: <03K6ui5yP3iy8HS_C4nurnsrbOymrm_962YA0-U92IM=.0f83b0ac-5895-4e1a-bb22-0006bd5dd888@github.com> On Thu, 10 Apr 2025 07:25:47 GMT, Thomas Schatzl wrote: > I fixed this for now, but it will be broken again in just a bit with Aleksey's ongoing removal of x86 32 bit platform efforts. I think all x86 cleanups related to GC and adjacent code have landed in mainline now. So I expect no more major conflicts with this PR :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2791985351 From manc at google.com Thu Apr 10 08:45:58 2025 From: manc at google.com (Man Cao) Date: Thu, 10 Apr 2025 01:45:58 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> Message-ID: Re Thomas's comments: I think this is the best way forward. There is no need for a JEP from me > either. > Exact behavior in various situations needs to be defined in the CSR. Thanks. Should I edit https://bugs.openjdk.org/browse/JDK-8204088 in place to change it to a CSR, or do you prefer creating a separate issue? One option is to translate these options into other values impacting > heap size "similarly". E.g. have Min/MaxHeapFreeRatio translate to > internal pressure at the time the changes are noticed, but that is just > a potential solution that hand-waves away the effort for that. > Then start deprecating and remove; depends a little on how useful (or > how much in the way) they are for Serial and Parallel GC (other > collectors don't support them). It is unlikely that ZGC and Shenandoah > will adopt these. I feel like both approaches have additional problems: For the first approach, even with a translation mechanism, it still has the problem of GCTimeRatio and Min/MaxHeapFreeRatio overriding each other. I think the only solution is to translate Min/MaxHeapFreeRatio directly to a value for GCTimeRatio, as well as making GCTimeRatio a manageable flag. Agree that the effort to implement this approach is nontrivial. For the second approach, Min/MaxHeapFreeRatio are pretty popular flags for Parallel GC, so it could be difficult to remove them for Parallel GC. Even already in JDK-8238687 Min/MaxHeapFreeRatio happily work to counter > the cpu based sizing, so some solution needs to be found there already. That change will already be quite disruptive in terms of impact on heap > sizing, so another option is to remove support in G1. I think removing support for Min/MaxHeapFreeRatio only for G1 is feasible, as long as we provide a replacement approach. Some high-level guidance like "if you set Min/MaxHeapFreeRatio to small values such as XX, try lowering GCTimeRatio to YY" may be acceptable. The downside is that it requires users of Min/MaxHeapFreeRatio to re-tune JVM parameters. One unresolved use case is dynamically changing Min/MaxHeapFreeRatio due to them being manageable. Perhaps we could make GCTimeRatio manageable? But Parallel GC and Shenandoah also use GCTimeRatio, so it could be difficult. Or if we reconsider the high-precedence SoftMaxHeapSize implementation ( https://github.com/openjdk/jdk/pull/24211), perhaps users who dynamically set Min/MaxHeapFreeRatio could move to set SoftMaxHeapSize instead. > (We have very few internal users setting these two flags. But yesterday > > I ran into a use case that sets -XX:MinHeapFreeRatio=0 - > > XX:MaxHeapFreeRatio=0 for G1...) > What would be the use case for setting it to these values? There seem to be little upside and lots of downside for this choice, > because it likely causes a lot of GC activity since the VM will need GC > to expand the heap little by little all the time, and full gc/Remark > will immediately reset these expansion efforts. The use case is to create a process snapshot image via CRIU (checkpoint/restore), like what https://openjdk.org/projects/crac does. The application wants G1 to shrink the heap as much as possible, to reduce the size of the snapshot. It sets both flags to zero, performs several System.gc(), then sets both flags back to previous values, then creates the snapshot. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Thu Apr 10 09:07:39 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 09:07:39 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v32] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 46 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * fixes after merge related to 32 bit x86 removal - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review: revising young gen length * robcasloz review: various minor refactorings - Do not unnecessarily pass around tmp2 in x86 - Refine needs_liveness_data - Reorder includes - * missing file from merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - ... and 36 more: https://git.openjdk.org/jdk/compare/f94a4f7a...fcf96a2a ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=31 Stats: 7112 lines in 110 files changed: 2592 ins; 3594 del; 926 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From ayang at openjdk.org Thu Apr 10 09:12:32 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 10 Apr 2025 09:12:32 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 14:32:43 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 83: >> >>> 81: break; >>> 82: } >>> 83: case G1RemSet::HasRefToOld : break; // Nothing special to do. >> >> Why doesn't call `inc_cards_clean_again` in this case? The card is cleared also. (In fact, I don't get why this needs to a separate case from `NoInteresting`.) > > "NoInteresting" means that the card contains no interesting reference at all. "HasRefToOld" means that there has been an interesting reference in the card. > > The distinction between these groups of cards seems interesting to me. E.g. out of X non-clean cards, there were A with a reference to the collection set, B that were already marked as containing a card to the collection, C not having any interesting card any more (transitioned from clean -> dirty -> clean, and cleared by the mutator), D being non-parsable, and E having references to old (and no other references). > > I could add a separate counter for these type of cards too - they can be inferred from the total number of scanned minus the others though. I see; "clean again" means the existing interesting pointer was overwritten by mutator. I misinterpret the comment as cards transitioned from dirty to clean. ` size_t _cards_clean_again; // Dirtied cards that were cleaned.` To prevent misunderstanding, what do you think of renaming "NoInteresting" to "NoCrossRegion" and "_cards_clean_again" to "_cards_no_cross_region", or sth alike so that the 1:1 mapping is clearer? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2036885633 From manc at google.com Thu Apr 10 09:30:54 2025 From: manc at google.com (Man Cao) Date: Thu, 10 Apr 2025 02:30:54 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> Message-ID: Re Eric's comments: Sorry to butt in. A high level question about the AHS plan for G1? are we > interested in the > intermediate functionality (SoftMaxHeapSize and CurrentMaxHeapSize), or is > it AHS that > we are interested in? No worries, and I appreciate the comment. The high-level rationale is that JVM should provide at least one of SoftMaxHeapSize or CurrentMaxHeapSize as a high-precedence, manageable flag, so that the JVM could take customized input signal for heap sizing decisions. Even with fully-developed AHS algorithm, it cannot satisfy all deployment environments. E.g. custom container system or custom OS, in which the JVM cannot detect system memory pressure via standard approaches. So these flags are not necessarily intermediate solutions, and they could allow more deployment environments to use AHS. For SoftMaxHeapSize for G1, based on discussion in https://github.com/openjdk/jdk/pull/24211, it will likely become just hint to trigger concurrent marks, which will be unlikely to interfere with other parts of G1 AHS. For my original proposal of high-precedence SoftMaxHeapSize (as currently implemented in the PR), the guidance for users is that they should either provide a mechanism to adjust SoftMaxHeapSize dynamically to prevent GC thrashing, or only set it temporarily and accept the risk of GC thrashing. It is not intended as a static value that the user "sets and forgets". For CurrentMaxHeapSize, it has similar issues as high-precedence SoftMaxHeapSize, that it is not "sets and forgets". However, I can see that clearly-specified OutOfMemoryError behavior from CurrentMaxHeapSize could be more favorable than hard-to-define potential GC thrashing condition that a high-precedence SoftMaxHeapSize could cause. Re Kirk's comments: > Might I suggest that an entirely new (experimental?) adaptive size policy > be introduced that makes use of current flags in a manner that is > appropriate to the new policy. That policy would calculate a size of Eden > to control GC frequency, a size of survivor to limit promotion of > transients, and a tenured large enough to accommodate the live set as well > as manage the expected number of humongous allocations. If global heap > pressure won?t support the ensuing max heap size, then the cost could be > smaller eden implying higher GC overhead due to increased frequency. > Metrics to support eden sizing would be allocation rate. The age table > with premature promotion rates would be used to estimate the size of > survivor. Live set size with a recent history of humongous allocations > would be used for tenured. > There will need to be a dampening strategy in play. My current (dumb) idea > for Serial is to set an overhead threshold delta that needs to be exceeded > to trigger a resize. I don't quite understand how this adaptive size policy (ASP) solves the problems AHS tries to solve. AHS tries solve the problem of reaching an appropriate target *total* heap size, based on multiple inputs (JVM flags, environment circumstances). Once a total heap size is determined, G1 uses existing algorithms to determine young-gen and old-gen sizes. However, the ASP seems to focus on determining young-gen and old-gen sizes using a new algorithm. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Thu Apr 10 10:02:40 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 10:02:40 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v33] In-Reply-To: References: Message-ID: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - * indentation fix - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/fcf96a2a..068d2a37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=31-32 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Thu Apr 10 10:02:41 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 10:02:41 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: <03K6ui5yP3iy8HS_C4nurnsrbOymrm_962YA0-U92IM=.0f83b0ac-5895-4e1a-bb22-0006bd5dd888@github.com> References: <03K6ui5yP3iy8HS_C4nurnsrbOymrm_962YA0-U92IM=.0f83b0ac-5895-4e1a-bb22-0006bd5dd888@github.com> Message-ID: On Thu, 10 Apr 2025 08:34:00 GMT, Aleksey Shipilev wrote: > > I fixed this for now, but it will be broken again in just a bit with Aleksey's ongoing removal of x86 32 bit platform efforts. > > I think all x86 cleanups related to GC and adjacent code have landed in mainline now. So I expect no more major conflicts with this PR :) Thanks. :) @TheRealMDoerr: should be fixed now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2792213039 From tschatzl at openjdk.org Thu Apr 10 11:01:42 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 11:01:42 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Wed, 9 Apr 2025 12:48:10 GMT, Thomas Schatzl wrote: >> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101: >> >>> 99: } >>> 100: >>> 101: void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators, >> >> Have you measured the performance impact of inlining this assembly code instead of resorting to a runtime call as done before? Is it worth the maintenance cost (for every platform), risk of introducing bugs, etc.? > > I remember significant impact in some microbenchmark. It's also inlined in Parallel GC. I do not consider it a big issue wrt to maintenance - these things never really change, and the method is small and contained. > I will try to redo numbers. >From our microbenchmarks (higher numbers are better): Current code: Benchmark (size) Mode Cnt Score Error Units ArrayCopyObject.conjoint_micro 31 thrpt 15 166136.959 ? 5517.157 ops/ms ArrayCopyObject.conjoint_micro 63 thrpt 15 108880.108 ? 4331.112 ops/ms ArrayCopyObject.conjoint_micro 127 thrpt 15 93159.977 ? 5025.458 ops/ms ArrayCopyObject.conjoint_micro 2047 thrpt 15 17234.842 ? 831.344 ops/ms ArrayCopyObject.conjoint_micro 4095 thrpt 15 9202.216 ? 292.612 ops/ms ArrayCopyObject.conjoint_micro 8191 thrpt 15 3565.705 ? 121.116 ops/ms ArrayCopyObject.disjoint_micro 31 thrpt 15 159106.245 ? 5965.576 ops/ms ArrayCopyObject.disjoint_micro 63 thrpt 15 95475.658 ? 5415.267 ops/ms ArrayCopyObject.disjoint_micro 127 thrpt 15 84249.979 ? 6313.007 ops/ms ArrayCopyObject.disjoint_micro 2047 thrpt 15 10682.650 ? 381.832 ops/ms ArrayCopyObject.disjoint_micro 4095 thrpt 15 4471.940 ? 216.439 ops/ms ArrayCopyObject.disjoint_micro 8191 thrpt 15 1378.296 ? 33.421 ops/ms ArrayCopy.arrayCopyObject N/A avgt 15 13.880 ? 0.517 ns/op ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 14.844 ? 0.751 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 11.080 ? 0.703 ns/op ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 11.003 ? 0.135 ns/op Runtime call: Benchmark (size) Mode Cnt Score Error Units ArrayCopyObject.conjoint_micro 31 thrpt 15 73100.230 ? 11079.381 ops/ms ArrayCopyObject.conjoint_micro 63 thrpt 15 65039.431 ? 1996.832 ops/ms ArrayCopyObject.conjoint_micro 127 thrpt 15 58336.711 ? 2260.660 ops/ms ArrayCopyObject.conjoint_micro 2047 thrpt 15 17035.419 ? 524.445 ops/ms ArrayCopyObject.conjoint_micro 4095 thrpt 15 9207.661 ? 286.526 ops/ms ArrayCopyObject.conjoint_micro 8191 thrpt 15 3264.491 ? 73.848 ops/ms ArrayCopyObject.disjoint_micro 31 thrpt 15 84587.219 ? 3007.310 ops/ms ArrayCopyObject.disjoint_micro 63 thrpt 15 62815.254 ? 1214.310 ops/ms ArrayCopyObject.disjoint_micro 127 thrpt 15 58423.470 ? 285.670 ops/ms ArrayCopyObject.disjoint_micro 2047 thrpt 15 10720.462 ? 617.173 ops/ms ArrayCopyObject.disjoint_micro 4095 thrpt 15 4178.195 ? 178.942 ops/ms ArrayCopyObject.disjoint_micro 8191 thrpt 15 1374.268 ? 44.290 ops/ms ArrayCopy.arrayCopyObject N/A avgt 15 19.667 ? 0.740 ns/op ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 21.243 ? 1.891 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 16.645 ? 0.504 ns/op ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 17.409 ? 0.705 ns/op Obviously with larger arrays, the impact diminishes, but it's always there. I think the inlined code is worth the effort in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2037086410 From rcastanedalo at openjdk.org Thu Apr 10 11:22:36 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 10 Apr 2025 11:22:36 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Thu, 10 Apr 2025 10:58:24 GMT, Thomas Schatzl wrote: >> I remember significant impact in some microbenchmark. It's also inlined in Parallel GC. I do not consider it a big issue wrt to maintenance - these things never really change, and the method is small and contained. >> I will try to redo numbers. > > From our microbenchmarks (higher numbers are better): > > Current code: > > Benchmark (size) Mode Cnt Score Error Units > ArrayCopyObject.conjoint_micro 31 thrpt 15 166136.959 ? 5517.157 ops/ms > ArrayCopyObject.conjoint_micro 63 thrpt 15 108880.108 ? 4331.112 ops/ms > ArrayCopyObject.conjoint_micro 127 thrpt 15 93159.977 ? 5025.458 ops/ms > ArrayCopyObject.conjoint_micro 2047 thrpt 15 17234.842 ? 831.344 ops/ms > ArrayCopyObject.conjoint_micro 4095 thrpt 15 9202.216 ? 292.612 ops/ms > ArrayCopyObject.conjoint_micro 8191 thrpt 15 3565.705 ? 121.116 ops/ms > ArrayCopyObject.disjoint_micro 31 thrpt 15 159106.245 ? 5965.576 ops/ms > ArrayCopyObject.disjoint_micro 63 thrpt 15 95475.658 ? 5415.267 ops/ms > ArrayCopyObject.disjoint_micro 127 thrpt 15 84249.979 ? 6313.007 ops/ms > ArrayCopyObject.disjoint_micro 2047 thrpt 15 10682.650 ? 381.832 ops/ms > ArrayCopyObject.disjoint_micro 4095 thrpt 15 4471.940 ? 216.439 ops/ms > ArrayCopyObject.disjoint_micro 8191 thrpt 15 1378.296 ? 33.421 ops/ms > ArrayCopy.arrayCopyObject N/A avgt 15 13.880 ? 0.517 ns/op > ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 14.844 ? 0.751 ns/op > ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 11.080 ? 0.703 ns/op > ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 11.003 ? 0.135 ns/op > > Runtime call: > > Benchmark (size) Mode Cnt Score Error Units > ArrayCopyObject.conjoint_micro 31 thrpt 15 73100.230 ? 11079.381 ops/ms > ArrayCopyObject.conjoint_micro 63 thrpt 15 65039.431 ? 1996.832 ops/ms > ArrayCopyObject.conjoint_micro 127 thrpt 15 58336.711 ? 2260.660 ops/ms > ArrayCopyObject.conjoint_micro 2047 thrpt 15 17035.419 ? 524.445 ops/ms > ArrayCopyObject.conjoint_micro 4095 thrpt 15 9207.661 ? 286.526 ops/ms > ArrayCopyObject.conjoint_micro 8191 thrpt 15 3264.491 ? 73.848 ops/ms > ArrayCopyObject.disjoint_micro 31 thrpt 15 84587.219 ? 3007.310 ops/ms > ArrayCopyObject.disjoint_micro ... Fair enough, thanks for the measurements! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2037121277 From tschatzl at openjdk.org Thu Apr 10 11:41:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 11:41:33 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio In-Reply-To: References: Message-ID: <89h5aK0Oop82whqONpjyoqsYaLnShKKDmPSpxhMpVJQ=.b29ac864-000f-4987-bf6c-27c9299c7730@github.com> On Wed, 9 Apr 2025 17:33:07 GMT, Albert Mingkun Yang wrote: > Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. > > Test: tier1-7 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/shared/gc_globals.hpp line 415: > 413: product(uintx, InitialSurvivorRatio, 8, \ > 414: "Initial ratio of young generation/survivor space size") \ > 415: range(0, max_uintx) \ There is code somewhere which sets InitialSurvivorRatio to 3 if it is smaller than that. It should be removed. Somewhere around `parallelArguments.cpp:108). There is similar code next to it for `MinSurvivorRatio` which is dead code too (`MinSurvivorRatio` is already bounded with 3 at minimum). Also, previously this value has been overridden silently, bailing out is a behavioral change that requires a CSR. ------------- PR Review: https://git.openjdk.org/jdk/pull/24556#pullrequestreview-2756365732 PR Review Comment: https://git.openjdk.org/jdk/pull/24556#discussion_r2037149128 From ayang at openjdk.org Thu Apr 10 11:59:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 10 Apr 2025 11:59:52 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v2] In-Reply-To: References: Message-ID: > Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. > > Test: tier1-7 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24556/files - new: https://git.openjdk.org/jdk/pull/24556/files/6dfd92bf..1cd03d17 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24556&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24556&range=00-01 Stats: 11 lines in 1 file changed: 0 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24556.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24556/head:pull/24556 PR: https://git.openjdk.org/jdk/pull/24556 From kdnilsen at openjdk.org Thu Apr 10 16:28:21 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 10 Apr 2025 16:28:21 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v5] In-Reply-To: References: Message-ID: <5jhoXMiuinw50NFwWr_kQdOudqZTx-3rfX8-4eCr4OY=.565602e3-8dc6-47eb-aa36-ddc5b9f27a08@github.com> > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Refactor for better abstraction - Fix set_live() after full gc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/8e820f29..eb2679aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=03-04 Stats: 13 lines in 3 files changed: 3 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From erik.osterlund at oracle.com Thu Apr 10 17:30:09 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 10 Apr 2025 17:30:09 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> Message-ID: <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> > On 10 Apr 2025, at 11:31, Man Cao wrote: > > Even with fully-developed AHS algorithm, it cannot satisfy all deployment environments. E.g. custom container system or custom OS, in which the JVM cannot detect system memory pressure via standard approaches. So these flags are not necessarily intermediate solutions, and they could allow more deployment environments to use AHS. Could you elaborate the concrete scenario you have in mind? What use case do you have in mind where AHS is not enough, while external heap control is? /Erik From manc at google.com Thu Apr 10 20:18:07 2025 From: manc at google.com (Man Cao) Date: Thu, 10 Apr 2025 13:18:07 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: Re Erik: > Could you elaborate the concrete scenario you have in mind? What use case do you have in mind where AHS is not enough, while external heap control is? One example is a customized container environment that requires non-standard approaches to read container memory usage and container memory limit, i.e., the application cannot use standard cgroup's memory.memsw.usage_in_bytes, memory.memsw.max_usage_in_bytes control files. Instead, the customized container could provide its own library for the application to get container usage and limit. Without CurrentMaxHeapSize or a high-precedence SoftMaxHeapSize, the JVM has no way to use the container-provided library to get signals for memory pressure. With such JVM flags, the application could use the container-provided library to calculate a value for those JVM flags based on memory pressure, and pass that information to the JVM. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.osterlund at oracle.com Thu Apr 10 21:02:08 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 10 Apr 2025 21:02:08 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: On 10 Apr 2025, at 22:18, Man Cao wrote: One example is a customized container environment that requires non-standard approaches to read container memory usage and container memory limit, i.e., the application cannot use standard cgroup's memory.memsw.usage_in_bytes, memory.memsw.max_usage_in_bytes control files. Instead, the customized container could provide its own library for the application to get container usage and limit. If the custom container app allocates 300 GB native memory with, for example, panama APIs or JNI, what will happen? Is it allowed, or limited? /Erik -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdnilsen at openjdk.org Thu Apr 10 21:55:45 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 10 Apr 2025 21:55:45 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v6] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Remove deprecation conditional compiles - Adjust candidate live memory for each mixed evac ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/eb2679aa..ef783d48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=04-05 Stats: 85 lines in 6 files changed: 24 ins; 61 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From manc at google.com Thu Apr 10 22:15:03 2025 From: manc at google.com (Man Cao) Date: Thu, 10 Apr 2025 15:15:03 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > If the custom container app allocates 300 GB native memory with, for example, panama APIs or JNI, what will happen? Is it allowed, or limited? I suppose the more accurate way to put it is "if an app inside the custom container environment allocates 300 GB native memory ...". The custom container environment itself is not a Java app. If the container memory limit is 310GiB, container usage is 305GiB, and the app's current Java heap size is 3GiB, and Xmx is 20GiB, then the app could set CurrentMaxHeapSize=8G (310 - 305 + 3), or CurrentMaxHeapSize=7G (to give 1GiB head room for growth from other non-heap memory: code cache, thread stack, metaspace, etc.), to prevent running out of container memory limit. Note that the app should actively monitor container usage to adjust CurrentMaxHeapSize, e.g. increasing CurrentMaxHeapSize when container usage drops. If the app keeps allocating more native memory, CurrentMaxHeapSize will further drop, and it will eventually die with Java OutOfMemoryError. In the above case, the JVM is unaware of the 310G container limit or the 305G container usage. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From ysr at openjdk.org Thu Apr 10 22:36:25 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 10 Apr 2025 22:36:25 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v6] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 21:55:45 GMT, Kelvin Nilsen wrote: >> The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. >> >> However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. >> >> This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - Remove deprecation conditional compiles > - Adjust candidate live memory for each mixed evac Haven't started looking at these changes, but I do wonder if it might be worthwhile to also consider (and implement under a tunable flag) the alternative policy of never adding to the collection set any regions that are still "active" at the point when the collection set for a marking cycle is first assembled at the end of the final marking. That way we don't have to do any re-computing, and the criterion for evacuation is garbage-first (or liveness-least) both of which remain invariant (and complements of each other) throughout the duration of evacuation and obviating entirely the need for recomputing the goodness/choice metric afresh. The downside is that we may leave some garbage on the table in the active regions, but this is probably a minor price for most workloads and heap configurations, and doesn't unnecessarily complicate or overengineer the solution. One question to consider is how G1 does this. May be regions placed in the collection set are retired (i.e. made inactive?) -- I prefer not to forcibly retire active regions as this wastes space that may have been usable. Thoughts? (Can add this comment and discuss on the ticket if that is logistically preferable.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24319#issuecomment-2795315167 From mbeckwit at openjdk.org Fri Apr 11 02:19:33 2025 From: mbeckwit at openjdk.org (Monica Beckwith) Date: Fri, 11 Apr 2025 02:19:33 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: <3x_0x1y1pPb4CI4eSx1FUDNoqPCbWhv-Se1FwbC5mlE=.a0ccd4e9-8ea1-4540-8e55-4b992c58b8b1@github.com> On Fri, 4 Apr 2025 09:01:30 GMT, Thomas Schatzl wrote: > Meanwhile, @mo-beck do you guys have preference on how SoftMaxHeapSize should work? Thanks for the thoughtful work here ? this PR is a solid step toward strengthening G1?s memory footprint management, and I support it. This patch adds support for `SoftMaxHeapSize` in both expansion and shrinkage paths, as well as IHOP calculation, ensuring it's part of the regular heap policy logic. As I outlined in my [original note](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050191.html) and follow-up on [AHS integration](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html), my intent has been to use `SoftMaxHeapSize` as a guiding input ? a soft signal ? within a broader dynamic heap sizing controller that considers GC overhead, mutator behavior, and memory availability. This patch lays the groundwork for that direction. The behavior when the live set exceeds the soft target has come up in the discussion. My view remains that the heap should be influenced by the value, not strictly bound to it. That?s the balance I?ve been aiming for in describing how it integrates into the control loop ? SoftMax helps inform decisions, but doesn?t unconditionally restrict them. I agree that we?ll want to follow up with logic that can respond to GC pressure and workload needs, to avoid any unintended performance issues. I?ll update [JDK-8353716](https://bugs.openjdk.org/browse/JDK-8353716) to reflect this, and I?ll continue the thread on the mailing list to coordinate the next phase. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2795676870 From erik.osterlund at oracle.com Fri Apr 11 05:52:27 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Fri, 11 Apr 2025 05:52:27 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: Okay it seems to me that the use case you are describing is wanting a container with an enforced memory limit. It should quack like a cgroup and walk like a cgroup but must not actually use cgroups for some reason. Cgroups were seemingly built for this use case and has a complete view of the memory usage in the container due to being an OS feature. Conversely, if the custom ad-hoc container environment does not have OS support for the memory limit, then the app can temporarily exceed the memory limit, and hence won?t be as effective of a limit. But if you want to actually enforce a memory limit such that the app dies if it exceeds the limit I can?t help but wonder? why not use a cgroup to declare that limit though? Regardless, I wonder if what you actually want for your use case is a way to tell AHS what the max memory of the entire JVM should be, similar to the -XX:RssLimit Thomas Stuefe proposed: https://bugs.openjdk.org/browse/JDK-8321266 In other words, letting the JVM know that it has a bound on memory, and have AHS know about and try to adapt the heap such that the JVM memory usage is below the limit when native memory goes up and down. In other words, let the heap heuristics live in the JVM. Perhaps then the limit would also be static, or do the containers themselves actually grow and shrink at runtime, or was the dynamic nature of CurrentMaxHeapSize mostly an artifact of out sourcing the heap heuristics of an otherwise static custom container limit? /Erik On 11 Apr 2025, at 00:15, Man Cao wrote: ? > If the custom container app allocates 300 GB native memory with, for example, panama APIs or JNI, what will happen? Is it allowed, or limited? I suppose the more accurate way to put it is "if an app inside the custom container environment allocates 300 GB native memory ...". The custom container environment itself is not a Java app. If the container memory limit is 310GiB, container usage is 305GiB, and the app's current Java heap size is 3GiB, and Xmx is 20GiB, then the app could set CurrentMaxHeapSize=8G (310 - 305 + 3), or CurrentMaxHeapSize=7G (to give 1GiB head room for growth from other non-heap memory: code cache, thread stack, metaspace, etc.), to prevent running out of container memory limit. Note that the app should actively monitor container usage to adjust CurrentMaxHeapSize, e.g. increasing CurrentMaxHeapSize when container usage drops. If the app keeps allocating more native memory, CurrentMaxHeapSize will further drop, and it will eventually die with Java OutOfMemoryError. In the above case, the JVM is unaware of the 310G container limit or the 305G container usage. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From aboldtch at openjdk.org Fri Apr 11 06:20:11 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 11 Apr 2025 06:20:11 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly Message-ID: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` Currently running this through testing. ------------- Commit messages: - 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly Changes: https://git.openjdk.org/jdk/pull/24589/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24589&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354358 Stats: 31 lines in 2 files changed: 7 ins; 2 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24589/head:pull/24589 PR: https://git.openjdk.org/jdk/pull/24589 From stefank at openjdk.org Fri Apr 11 07:02:39 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 11 Apr 2025 07:02:39 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly In-Reply-To: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Fri, 11 Apr 2025 06:14:42 GMT, Axel Boldt-Christmas wrote: > Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. > > However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. > > Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 > `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` > > Currently running this through testing. Looks good. As a follow-up, we might want to move the pre-touching so that we don't start and stop threads multiple times. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24589#pullrequestreview-2759342921 From jsikstro at openjdk.org Fri Apr 11 07:05:40 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 11 Apr 2025 07:05:40 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly In-Reply-To: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Fri, 11 Apr 2025 06:14:42 GMT, Axel Boldt-Christmas wrote: > Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. > > However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. > > Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 > `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` > > Currently running this through testing. src/hotspot/share/gc/z/zPageAllocator.cpp line 1011: > 1009: const size_t claimed_size = claim_virtual(size, &vmems); > 1010: > 1011: // Each partition must have at least size total vmems available when priming. Maybe something like "The partition must have size available in virtual memory when priming"? I'm reading this as the number of vmems, not the size of them combined. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24589#discussion_r2038925034 From aboldtch at openjdk.org Fri Apr 11 07:45:03 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 11 Apr 2025 07:45:03 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v2] In-Reply-To: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: > Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. > > However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. > > Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 > `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` > > Currently running this through testing. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Update Comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24589/files - new: https://git.openjdk.org/jdk/pull/24589/files/0abce51a..70b0e923 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24589&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24589&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24589/head:pull/24589 PR: https://git.openjdk.org/jdk/pull/24589 From jsikstro at openjdk.org Fri Apr 11 07:47:31 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 11 Apr 2025 07:47:31 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v2] In-Reply-To: References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: <-P89Vbi7uncmcA5LSlyADETTuDB5EJWG3NaarpyAouk=.7364df7e-5e0d-484a-b53e-44614f2eabe6@github.com> On Fri, 11 Apr 2025 07:45:03 GMT, Axel Boldt-Christmas wrote: >> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. >> >> However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. >> >> Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 >> `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` >> >> Currently running this through testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update Comment Looks good. As you say, this is nicely implemented with features from the Mapped Cache. ------------- Marked as reviewed by jsikstro (Committer). PR Review: https://git.openjdk.org/jdk/pull/24589#pullrequestreview-2759442646 From stefank at openjdk.org Fri Apr 11 07:52:25 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 11 Apr 2025 07:52:25 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v2] In-Reply-To: References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Fri, 11 Apr 2025 07:45:03 GMT, Axel Boldt-Christmas wrote: >> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. >> >> However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. >> >> Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 >> `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` >> >> Currently running this through testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update Comment Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24589#pullrequestreview-2759456953 From eosterlund at openjdk.org Fri Apr 11 10:37:41 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 11 Apr 2025 10:37:41 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v2] In-Reply-To: References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Fri, 11 Apr 2025 07:45:03 GMT, Axel Boldt-Christmas wrote: >> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. >> >> However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. >> >> Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 >> `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` >> >> Currently running this through testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update Comment Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24589#pullrequestreview-2759908182 From jsikstro at openjdk.org Fri Apr 11 11:38:08 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 11 Apr 2025 11:38:08 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing Message-ID: Hello, > This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth making, considering that memory for buffers of strings usually outweigh this extra memory cost. Additionally, when factoring in the improved code understandability and maintainability, I feel like it's a change worth making. Some new changes in the way the printing looks are: * Epsilon has received indentation in its print_on, which was not there before, in an effort to look similar to other GCs and also improve readability. * Shenandoah has also received indentation to behave similar to other GCs. * "the space" in Serial's output was indented by two spaces, now it's one. * With the removal of print_on from print_on_error, I've also removed Epsilon's barrier set printing, making it's print_on_error empty. Before this, Serial printed two spaces between the sections in the hs_err file. Code re-structure: * PSOldGen::print_on had an inlined version of virtual_space()->print_space_boundaries_on(st), which is now called instead. * PSYoungGen::print_on had its name inlined. Now, name() is called instead, which is how PSOldGen::print_on does it. * I've added a common print_space_boundaries_on for the virtual space used in Serial's DefNewGeneration and TenuredGeneration, like how Parallel does it. * I've opted to use fill_to() in Metaspace printing so that it works well with ZGC printing. This does not really affect other GCs since only ZGC aligns with the same column as Metaspace. Testing: * GHA, Oracle's tier 1-3 * Manual inspection of printed content * Exit printing `-Xlog:gc+heap+exit=info` * Periodic printing `-Xlog:gc+heap=debug` * jcmd `jcmd GC.heap_info` * jcmd `jcmd VM.info` * hs_err file, both "Heap:" and "Heap before/after invocations=" printing, `-XX:ErrorHandlerTest=14` ------------- Commit messages: - 8354362: Use automatic indentation in CollectedHeap printing Changes: https://git.openjdk.org/jdk/pull/24593/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354362 Stats: 239 lines in 26 files changed: 88 ins; 88 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Fri Apr 11 11:38:08 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 11 Apr 2025 11:38:08 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 11:28:12 GMT, Joel Sikstr?m wrote: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Ping @tstuefe regarding changes for `StreamAutoIndentor`. Would be nice to get your opinion since you are the author of it and its uses :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24593#issuecomment-2796653117 From rcastanedalo at openjdk.org Fri Apr 11 13:01:49 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Apr 2025 13:01:49 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v33] In-Reply-To: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> References: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> Message-ID: On Thu, 10 Apr 2025 10:02:40 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * indentation fix > - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade Thank you for addressing my comments, Thomas! The new x64 version of `G1BarrierSetAssembler::gen_write_ref_array_post_barrier` looks correct, but I think it could be significantly simplified, here is my suggestion which is more similar to the aarch64 version: https://github.com/robcasloz/jdk/commit/fbedc0ae1ec5fcfa95b00ad354986885c7a56ce0 (note: did not test it thoroughly). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2796850628 From rcastanedalo at openjdk.org Fri Apr 11 13:10:33 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Apr 2025 13:10:33 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v33] In-Reply-To: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> References: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> Message-ID: On Thu, 10 Apr 2025 10:02:40 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * indentation fix > - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade > G1 sets UseCondCardMark to true by default. The conditional card mark corresponds to the third filter in the write barrier now, and since I decided to keep all filters for this change, it makes sense to directly use this mechanism. Do you have performance results for `-UseCondCardMark` vs. `+UseCondCardMark`? The benefit of `+UseCondCardMark` is not obvious from looking at the generated barrier code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2796872496 From rcastanedalo at openjdk.org Fri Apr 11 14:30:32 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Apr 2025 14:30:32 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v33] In-Reply-To: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> References: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> Message-ID: On Thu, 10 Apr 2025 10:02:40 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * indentation fix > - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade The compiler-related parts of this change (including x64 and aarch64 changes) look good! These are the files I reviewed: - `src/hotspot/share/gc/g1/g1BarrierSet*` - `src/hotspot/share/gc/g1/{c1,c2}` - `src/hotspot/cpu/{x86,aarch64}` - `test/hotspot/jtreg/compiler` - `test/hotspot/jtreg/testlibrary_tests` ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23739#pullrequestreview-2760546283 From wkemper at openjdk.org Fri Apr 11 20:46:01 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 11 Apr 2025 20:46:01 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times Message-ID: Without enforcing limits on `ShenandoahControlIntervalMin` and `ShenandoahControlIntervalMax`, the user may supply values that cause assertions to fail. ------------- Commit messages: - Enforce limits on control thread's minimum and maximum sleep times Changes: https://git.openjdk.org/jdk/pull/24602/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24602&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354452 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24602.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24602/head:pull/24602 PR: https://git.openjdk.org/jdk/pull/24602 From ysr at openjdk.org Fri Apr 11 20:59:30 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 11 Apr 2025 20:59:30 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 20:41:00 GMT, William Kemper wrote: > Without enforcing limits on `ShenandoahControlIntervalMin` and `ShenandoahControlIntervalMax`, the user may supply values that cause assertions to fail. > > This assertion failure has been observed in Genshen's regulator thread: > > #0 0x000028e8062d021a in ShenandoahRegulatorThread::regulator_sleep (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:125 > #1 0x000028e8062d0027 in ShenandoahRegulatorThread::regulate_young_and_old_cycles (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:95 > #2 0x000028e8062cfd06 in ShenandoahRegulatorThread::run_service (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:51 > > But it could just as easily happen in other modes to the `ShenandoahControlThread` instance. Left a comment for consideration but changes look fine if this changes doesn't interfere with potential tuning space etc. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 1: > 1: /* Change looks fine, but I wonder about using a `naked_sleep()` and allowing longer durations without triggering asserts in those cases? Not sure where this could be used and whether 1-second is the maximum we might like for these numbers regardless. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24602#pullrequestreview-2761556102 PR Review Comment: https://git.openjdk.org/jdk/pull/24602#discussion_r2040287010 From wkemper at openjdk.org Fri Apr 11 21:06:30 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 11 Apr 2025 21:06:30 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: Message-ID: <3BSSSNcGzbGojKBsi0fMQ9y4CXR3xnGWMlsMVixnbSo=.fcaca705-04f0-45c7-b0c7-ed1355265edb@github.com> On Fri, 11 Apr 2025 20:55:31 GMT, Y. Srinivas Ramakrishna wrote: >> Without enforcing limits on `ShenandoahControlIntervalMin` and `ShenandoahControlIntervalMax`, the user may supply values that cause assertions to fail. >> >> This assertion failure has been observed in Genshen's regulator thread: >> >> #0 0x000028e8062d021a in ShenandoahRegulatorThread::regulator_sleep (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:125 >> #1 0x000028e8062d0027 in ShenandoahRegulatorThread::regulate_young_and_old_cycles (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:95 >> #2 0x000028e8062cfd06 in ShenandoahRegulatorThread::run_service (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:51 >> >> But it could just as easily happen in other modes to the `ShenandoahControlThread` instance. > > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 1: > >> 1: /* > > Change looks fine, but I wonder about using a `naked_sleep()` and allowing longer durations without triggering asserts in those cases? Not sure where this could be used and whether 1-second is the maximum we might like for these numbers regardless. 1 second is enforced by `naked_sleep` itself, so raising it would impact all callers. Not using `naked_sleep` would be possible here, but the default maximum sleep time is 10ms. Even 1 second (well, 999ms) would make the heuristics dangerously slow to respond. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24602#discussion_r2040294574 From ysr at openjdk.org Fri Apr 11 21:12:25 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 11 Apr 2025 21:12:25 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: <3BSSSNcGzbGojKBsi0fMQ9y4CXR3xnGWMlsMVixnbSo=.fcaca705-04f0-45c7-b0c7-ed1355265edb@github.com> References: <3BSSSNcGzbGojKBsi0fMQ9y4CXR3xnGWMlsMVixnbSo=.fcaca705-04f0-45c7-b0c7-ed1355265edb@github.com> Message-ID: On Fri, 11 Apr 2025 21:03:59 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 1: >> >>> 1: /* >> >> Change looks fine, but I wonder about using a `naked_sleep()` and allowing longer durations without triggering asserts in those cases? Not sure where this could be used and whether 1-second is the maximum we might like for these numbers regardless. > > 1 second is enforced by `naked_sleep` itself, so raising it would impact all callers. Not using `naked_sleep` would be possible here, but the default maximum sleep time is 10ms. Even 1 second (well, 999ms) would make the heuristics dangerously slow to respond. Hmm, curious, I see this: // Convenience wrapper around naked_short_sleep to allow for longer sleep // times. Only for use by non-JavaThreads. void os::naked_sleep(jlong millis) { assert(!Thread::current()->is_Java_thread(), "not for use by JavaThreads"); const jlong limit = 999; while (millis > limit) { naked_short_sleep(limit); millis -= limit; } naked_short_sleep(millis); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24602#discussion_r2040297668 From ysr at openjdk.org Fri Apr 11 21:12:25 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 11 Apr 2025 21:12:25 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: <3BSSSNcGzbGojKBsi0fMQ9y4CXR3xnGWMlsMVixnbSo=.fcaca705-04f0-45c7-b0c7-ed1355265edb@github.com> Message-ID: On Fri, 11 Apr 2025 21:08:04 GMT, Y. Srinivas Ramakrishna wrote: >> 1 second is enforced by `naked_sleep` itself, so raising it would impact all callers. Not using `naked_sleep` would be possible here, but the default maximum sleep time is 10ms. Even 1 second (well, 999ms) would make the heuristics dangerously slow to respond. > > Hmm, curious, I see this: > > // Convenience wrapper around naked_short_sleep to allow for longer sleep > // times. Only for use by non-JavaThreads. > void os::naked_sleep(jlong millis) { > assert(!Thread::current()->is_Java_thread(), "not for use by JavaThreads"); > const jlong limit = 999; > while (millis > limit) { > naked_short_sleep(limit); > millis -= limit; > } > naked_short_sleep(millis); > } Still if ppl aren't gonna need longer than 1 sec, and longer is a bad idea, then limiting it is a good idea. Reviewed. ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24602#discussion_r2040298962 From wkemper at openjdk.org Fri Apr 11 21:12:26 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 11 Apr 2025 21:12:26 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: <3BSSSNcGzbGojKBsi0fMQ9y4CXR3xnGWMlsMVixnbSo=.fcaca705-04f0-45c7-b0c7-ed1355265edb@github.com> Message-ID: On Fri, 11 Apr 2025 21:09:36 GMT, Y. Srinivas Ramakrishna wrote: >> Hmm, curious, I see this: >> >> // Convenience wrapper around naked_short_sleep to allow for longer sleep >> // times. Only for use by non-JavaThreads. >> void os::naked_sleep(jlong millis) { >> assert(!Thread::current()->is_Java_thread(), "not for use by JavaThreads"); >> const jlong limit = 999; >> while (millis > limit) { >> naked_short_sleep(limit); >> millis -= limit; >> } >> naked_short_sleep(millis); >> } > > Still if ppl aren't gonna need longer than 1 sec, and longer is a bad idea, then limiting it is a good idea. > Reviewed. ? Aye - we _could_ use that, but I don't think we _should_. Having the heuristics sleep longer than this between evaluations wouldn't do anyone any good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24602#discussion_r2040299379 From kdnilsen at openjdk.org Fri Apr 11 21:28:12 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 11 Apr 2025 21:28:12 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v7] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix uninitialized variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/ef783d48..e6e44b67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From wkemper at openjdk.org Fri Apr 11 21:28:31 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 11 Apr 2025 21:28:31 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 20:56:26 GMT, Y. Srinivas Ramakrishna wrote: >> Without enforcing limits on `ShenandoahControlIntervalMin` and `ShenandoahControlIntervalMax`, the user may supply values that cause assertions to fail. >> >> This assertion failure has been observed in Genshen's regulator thread: >> >> #0 0x000028e8062d021a in ShenandoahRegulatorThread::regulator_sleep (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:125 >> #1 0x000028e8062d0027 in ShenandoahRegulatorThread::regulate_young_and_old_cycles (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:95 >> #2 0x000028e8062cfd06 in ShenandoahRegulatorThread::run_service (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:51 >> >> But it could just as easily happen in other modes to the `ShenandoahControlThread` instance. > > Left a comment for consideration but changes look fine if this changes doesn't interfere with potential tuning space etc. Appreciate the careful review @ysramakrishna ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24602#issuecomment-2798035149 From wkemper at openjdk.org Fri Apr 11 21:28:32 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 11 Apr 2025 21:28:32 GMT Subject: Integrated: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 20:41:00 GMT, William Kemper wrote: > Without enforcing limits on `ShenandoahControlIntervalMin` and `ShenandoahControlIntervalMax`, the user may supply values that cause assertions to fail. > > This assertion failure has been observed in Genshen's regulator thread: > > #0 0x000028e8062d021a in ShenandoahRegulatorThread::regulator_sleep (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:125 > #1 0x000028e8062d0027 in ShenandoahRegulatorThread::regulate_young_and_old_cycles (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:95 > #2 0x000028e8062cfd06 in ShenandoahRegulatorThread::run_service (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:51 > > But it could just as easily happen in other modes to the `ShenandoahControlThread` instance. This pull request has now been integrated. Changeset: e8bcedb0 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/e8bcedb09b0e5eeb77bf1dc3a87bb61d7a5e8404 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times Reviewed-by: ysr ------------- PR: https://git.openjdk.org/jdk/pull/24602 From kdnilsen at openjdk.org Fri Apr 11 21:30:28 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 11 Apr 2025 21:30:28 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v6] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 22:33:28 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove deprecation conditional compiles >> - Adjust candidate live memory for each mixed evac > > Haven't started looking at these changes, but I do wonder if it might be worthwhile to also consider (and implement under a tunable flag) the alternative policy of never adding to the collection set any regions that are still "active" at the point when the collection set for a marking cycle is first assembled at the end of the final marking. That way we don't have to do any re-computing, and the criterion for evacuation is garbage-first (or liveness-least) both of which remain invariant (and complements of each other) throughout the duration of evacuation and obviating entirely the need for recomputing the goodness/choice metric afresh. > > The downside is that we may leave some garbage on the table in the active regions, but this is probably a minor price for most workloads and heap configurations, and doesn't unnecessarily complicate or overengineer the solution. > > One question to consider is how G1 does this. May be regions placed in the collection set are retired (i.e. made inactive?) -- I prefer not to forcibly retire active regions as this wastes space that may have been usable. > > Thoughts? (Can add this comment and discuss on the ticket if that is logistically preferable.) @ysramakrishna : Interesting idea. Definitely worthy of an experiment. On the upside, this can make GC more "efficient" by procrastinating until the GC effort maximizes the returns of allocatable memory. On the downside, this can allow garbage to hide out for arbitrarily long times in regions that are not "fully used". I'd be in favor of proposing these experiments and possible feature enhancements in the context of a separate JBS ticket. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24319#issuecomment-2798040688 From manc at google.com Sat Apr 12 08:07:27 2025 From: manc at google.com (Man Cao) Date: Sat, 12 Apr 2025 01:07:27 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > Okay it seems to me that the use case you are describing is wanting a container with an enforced memory limit. It should quack like a cgroup and walk like a cgroup but must not actually use cgroups for some reason. > Cgroups were seemingly built for this use case and has a complete view of the memory usage in the container due to being an OS feature. > Conversely, if the custom ad-hoc container environment does not have OS support for the memory limit, then the app can temporarily exceed the memory limit, and hence won?t be as effective of a limit. > But if you want to actually enforce a memory limit such that the app dies if it exceeds the limit I can?t help but wonder? why not use a cgroup to declare that limit though? The custom container has additional features that cgroup does not have. Enforcing memory limit is only a basic feature. Other features of the custom container are largely irrelevant to the AHS discussion (and I'm not sure if I could publicly share those features). In fact, the custom container is more of an extension or wrapper on top of cgroup. It is quite likely we have internal patches to the OS kernel to support the custom container. > Regardless, I wonder if what you actually want for your use case is a way to tell AHS what the max memory of the entire JVM should be, similar to the -XX:RssLimit Thomas Stuefe proposed: https://bugs.openjdk.org/browse/JDK-8321266 > In other words, letting the JVM know that it has a bound on memory, and have AHS know about and try to adapt the heap such that the JVM memory usage is below the limit when native memory goes up and down. In other words, let the heap heuristics live in the JVM. Perhaps then the limit would also be static, or do the containers themselves actually grow and shrink at runtime, or was the dynamic nature of CurrentMaxHeapSize mostly an artifact of out sourcing the heap heuristics of an otherwise static custom container limit? The custom container's memory limit could dynamically change at runtime, thus -XX:RssLimit or -XX:CurrentMaxHeapSize must be a manageable flag. In fact, cgroup also supports changing memory limit dynamically: https://unix.stackexchange.com/questions/555080/using-cgroup-to-limit-program-memory-as-its-running . Having a manageable -XX:RssLimit, and making the JVM adjust heap size according to RssLimit, could in theory replace CurrentMaxHeapSize. However, I could think of the following issues with the RssLimit approach: 1. Description of https://bugs.openjdk.org/browse/JDK-8321266 indicates RssLimit is intended for debugging and regression testing, to abort the JVM when it uses more Rss than expected. It does not involve resizing the heap to survive the RssLimit. Adding heap resizing seems a significant change to the original intended use. 2. Calculating an appropriate heap size based on RssLimit seems challenging. Typically only part of the heap memory mapping contributes to Rss. The JVM probably has to continuously monitor the total Rss, as well as Rss from heap memory mappings, then apply a heuristic to compute a target heap size. 3. Applications still need a mechanism to dynamically adjust values for RssLimit, just as for CurrentMaxHeapSize. Providing a value for RssLimit is not really easier than for CurrentMaxHeapSize, e.g. when a Java process and several non-Java processes run inside the same container (this is the common case in our deployment). It seems that RssLimit is not necessarily easier to use than CurrentMaxHeapSize, but definitely more complicated to implement (due to 1 and 2). -Man > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.osterlund at oracle.com Sat Apr 12 09:48:01 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Sat, 12 Apr 2025 09:48:01 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > On 12 Apr 2025, at 10:07, Man Cao wrote: > > In fact, the custom container is more of an extension or wrapper on top of cgroup. It is quite likely we have internal patches to the OS kernel to support the custom container. Okay, that makes sense. So you do use cgroups for your containers. And you do want to limit their memory. So why don?t you want to use the cgroup memory limits? > It seems that RssLimit is not necessarily easier to use than CurrentMaxHeapSize, but definitely more complicated to implement (due to 1 and 2). Okay. /Erik From aboldtch at openjdk.org Mon Apr 14 12:53:18 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 14 Apr 2025 12:53:18 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v3] In-Reply-To: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: > Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. > > However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. > > Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 > `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` > > Currently running this through testing. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Update outdated TestZNMT.java comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24589/files - new: https://git.openjdk.org/jdk/pull/24589/files/70b0e923..a33c7e39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24589&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24589&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24589/head:pull/24589 PR: https://git.openjdk.org/jdk/pull/24589 From stefank at openjdk.org Mon Apr 14 13:14:01 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 14 Apr 2025 13:14:01 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v3] In-Reply-To: References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Mon, 14 Apr 2025 12:53:18 GMT, Axel Boldt-Christmas wrote: >> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. >> >> However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. >> >> Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 >> `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` >> >> Currently running this through testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update outdated TestZNMT.java comment Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24589#pullrequestreview-2764278721 From duke at openjdk.org Mon Apr 14 13:16:24 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Mon, 14 Apr 2025 13:16:24 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v2] In-Reply-To: References: Message-ID: > After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. > So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. > > When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. > > Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. > > before this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = false {product lp64_product} {default} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > > after this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = true {product lp64_product} {ergonomic} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) Tongbao Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24541/files - new: https://git.openjdk.org/jdk/pull/24541/files/6b139085..c31c7340 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=00-01 Stats: 88 lines in 1 file changed: 88 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24541/head:pull/24541 PR: https://git.openjdk.org/jdk/pull/24541 From aboldtch at openjdk.org Mon Apr 14 13:31:59 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 14 Apr 2025 13:31:59 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v3] In-Reply-To: References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: <5cRwpuZ7f9ZUWRVpNawcpP9AEOpiT-Uy-RJGdRu8KlY=.670800f5-f2e6-45fd-ac78-66e89f9c5719@github.com> On Mon, 14 Apr 2025 12:53:18 GMT, Axel Boldt-Christmas wrote: >> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. >> >> However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. >> >> Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 >> `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` >> >> Currently running this through testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update outdated TestZNMT.java comment Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24589#issuecomment-2801714127 From aboldtch at openjdk.org Mon Apr 14 13:31:59 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 14 Apr 2025 13:31:59 GMT Subject: Integrated: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly In-Reply-To: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Fri, 11 Apr 2025 06:14:42 GMT, Axel Boldt-Christmas wrote: > Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. > > However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. > > Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 > `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` > > Currently running this through testing. This pull request has now been integrated. Changeset: 97e10757 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/97e10757392859a46360b4ab379429212fbc34b3 Stats: 34 lines in 3 files changed: 7 ins; 4 del; 23 mod 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly Reviewed-by: stefank, jsikstro, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24589 From kdnilsen at openjdk.org Mon Apr 14 16:40:47 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 14 Apr 2025 16:40:47 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v7] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 18:21:43 GMT, Kelvin Nilsen wrote: >> Not sure about performance impact, other than implementing and testing... > > i suspect performance impact is minimal. I've committed changes that endeavor to implement the suggested refactor. Performance impact does appear to be minimal. This broader refactoring does change behavior slightly. In particular: 1. We now have a better understanding of live-memory evacuated during mixed evacuations. This allows the selection of old-candidates for mixed evacuations to be more conservative. We'll have fewer old regions in order to honor the intended budget. 2. Potentially, this will result in more mixed evacuations, but each mixed evacuation should take less time. 3. There should be no impact on behavior of traditional Shenandoah. On one recently completed test run, we observed the following impacts compared to tip: Shenandoah ------------------------------------------------------------------------------------------------------- +80.69% specjbb2015/trigger_failure p=0.00000 Control: 58.250 (+/- 13.48 ) 110 Test: 105.250 (+/- 33.13 ) 30 Genshen ------------------------------------------------------------------------------------------------------- -19.46% jme/context_switch_count p=0.00176 Control: 117.420 (+/- 28.01 ) 108 Test: 98.292 (+/- 32.76 ) 30 Perhaps we need more data to decide whether this is "significant". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2042510606 From lmesnik at openjdk.org Tue Apr 15 01:35:04 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 15 Apr 2025 01:35:04 GMT Subject: RFR: 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API Message-ID: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> Just minor clean up of WB API usage. Also changed othervm to driver. ------------- Commit messages: - use driver - 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API Changes: https://git.openjdk.org/jdk/pull/24642/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24642&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354559 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24642/head:pull/24642 PR: https://git.openjdk.org/jdk/pull/24642 From lmesnik at openjdk.org Tue Apr 15 01:57:57 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 15 Apr 2025 01:57:57 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v2] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 13:16:24 GMT, Tongbao Zhang wrote: >> After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. >> So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. >> >> When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. >> >> Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. >> >> before this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = false {product lp64_product} {default} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) >> >> >> after this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = true {product lp64_product} {ergonomic} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > Tongbao Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly Marked as reviewed by lmesnik (Reviewer). Sorry, I wanted to ask you to change test, not approve it yet. test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 30: > 28: * @test TestG1CompressedOops > 29: * @bug 8354145 > 30: * @requires vm.gc.G1 & vm.opt.G1HeapRegionSize == null The test ignores external VM flags, so vm.opt.G1HeapRegionSize is not needed. But it is needed to add `* @requires vm.flagless` test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 32: > 30: * @requires vm.gc.G1 & vm.opt.G1HeapRegionSize == null > 31: * @summary Verify that the flag TestG1CompressedOops is updated properly > 32: * @modules java.base/jdk.internal.misc Is any of those 2 modules is used by tests? I don't see it in the test. test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 35: > 33: * @modules java.management/sun.management > 34: * @library /test/lib > 35: * @library / Why this line is needed? I don't see any dependencies on "/" If you use some test code outside directory, better to build them. ------------- PR Review: https://git.openjdk.org/jdk/pull/24541#pullrequestreview-2766273464 Changes requested by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24541#pullrequestreview-2766313637 PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043328713 PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043315584 PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043311695 From duke at openjdk.org Tue Apr 15 02:47:24 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Tue, 15 Apr 2025 02:47:24 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v3] In-Reply-To: References: Message-ID: > After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. > So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. > > When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. > > Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. > > before this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = false {product lp64_product} {default} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > > after this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = true {product lp64_product} {ergonomic} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: remove useless jtreg tags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24541/files - new: https://git.openjdk.org/jdk/pull/24541/files/c31c7340..f08e4177 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24541/head:pull/24541 PR: https://git.openjdk.org/jdk/pull/24541 From duke at openjdk.org Tue Apr 15 02:52:42 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Tue, 15 Apr 2025 02:52:42 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v2] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 01:52:22 GMT, Leonid Mesnik wrote: >> Tongbao Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly > > test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 30: > >> 28: * @test TestG1CompressedOops >> 29: * @bug 8354145 >> 30: * @requires vm.gc.G1 & vm.opt.G1HeapRegionSize == null > > The test ignores external VM flags, so vm.opt.G1HeapRegionSize is not needed. > But it is needed to add > `* @requires vm.flagless` done > test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 32: > >> 30: * @requires vm.gc.G1 & vm.opt.G1HeapRegionSize == null >> 31: * @summary Verify that the flag TestG1CompressedOops is updated properly >> 32: * @modules java.base/jdk.internal.misc > > Is any of those 2 modules is used by tests? I don't see it in the test. removed these two modules > Why this line is needed? I don't see any dependencies on "/" If you use some test code outside directory, better to build them. Yes, the GCArguments depends on the ```@library /``` , many tests in ``` test/hotspot/jtreg/gc/arguments``` use this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043407145 PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043406996 PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043406500 From duke at openjdk.org Tue Apr 15 03:01:40 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Tue, 15 Apr 2025 03:01:40 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v2] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 01:54:35 GMT, Leonid Mesnik wrote: > Sorry, I wanted to ask you to change test, not approve it yet. Got it, thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24541#issuecomment-2803626376 From duke at openjdk.org Tue Apr 15 03:31:48 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Tue, 15 Apr 2025 03:31:48 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v4] In-Reply-To: References: Message-ID: > After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. > So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. > > When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. > > Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. > > before this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = false {product lp64_product} {default} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > > after this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = true {product lp64_product} {ergonomic} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24541/files - new: https://git.openjdk.org/jdk/pull/24541/files/f08e4177..17c0a8a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24541/head:pull/24541 PR: https://git.openjdk.org/jdk/pull/24541 From ayang at openjdk.org Tue Apr 15 07:58:56 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 15 Apr 2025 07:58:56 GMT Subject: RFR: 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API In-Reply-To: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> References: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> Message-ID: On Tue, 15 Apr 2025 01:29:50 GMT, Leonid Mesnik wrote: > Just minor clean up of WB API usage. > Also changed othervm to driver. Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24642#pullrequestreview-2767242168 From kbarrett at openjdk.org Tue Apr 15 09:09:46 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 15 Apr 2025 09:09:46 GMT Subject: RFR: 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API In-Reply-To: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> References: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> Message-ID: On Tue, 15 Apr 2025 01:29:50 GMT, Leonid Mesnik wrote: > Just minor clean up of WB API usage. > Also changed othervm to driver. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24642#pullrequestreview-2767475519 From jbhateja at openjdk.org Tue Apr 15 13:57:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 15 Apr 2025 13:57:38 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Message-ID: ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. Please review and share your feedback. Best Regards, Jatin PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. ------------- Commit messages: - 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Changes: https://git.openjdk.org/jdk/pull/24664/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354668 Stats: 16 lines in 4 files changed: 5 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24664/head:pull/24664 PR: https://git.openjdk.org/jdk/pull/24664 From aboldtch at openjdk.org Tue Apr 15 14:52:58 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 15 Apr 2025 14:52:58 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 13:50:40 GMT, Jatin Bhateja wrote: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. Looks good but need to communicate with JVMCI implementors. Also pre-exisiting but maybe `ZBarrierRelocationFormatLoadGoodAfterShl` should be called `ZBarrierRelocationFormatLoadGoodAfterShX` as we use it for both shr and shl. src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.hpp line 52: > 50: #endif // COMPILER2 > 51: > 52: const int ZBarrierRelocationFormatLoadGoodAfterShl = 0; Suggestion: const int ZBarrierRelocationFormatLoadGoodAfterShl = 0; src/hotspot/cpu/x86/jvmciCodeInstaller_x86.cpp line 223: > 221: return true; > 222: #if INCLUDE_ZGC > 223: case Z_BARRIER_RELOCATION_FORMAT_LOAD_GOOD_BEFORE_SHL: Should probably communicate with the JVMCI / Graal @dougxc so we can both update this exported symbol name to reflect the new behaviour, and give them the opportunity to adapt to the new relocation patching. ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24664#pullrequestreview-2768666320 PR Review Comment: https://git.openjdk.org/jdk/pull/24664#discussion_r2044778342 PR Review Comment: https://git.openjdk.org/jdk/pull/24664#discussion_r2044814373 From manc at google.com Tue Apr 15 19:24:58 2025 From: manc at google.com (Man Cao) Date: Tue, 15 Apr 2025 12:24:58 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > Okay, that makes sense. So you do use cgroups for your containers. And you do want to limit their memory. So why don?t you want to use the cgroup memory limits? One example is that the custom container has a custom implementation for soft limit, and still uses cgroup memory limits as hard limit. Apps that are "good citizens" should strive to stay below the soft limit. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.osterlund at oracle.com Tue Apr 15 20:38:36 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Tue, 15 Apr 2025 20:38:36 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: Hi Man, > On 15 Apr 2025, at 21:25, Man Cao wrote: > > ? > > Okay, that makes sense. So you do use cgroups for your containers. And you do want to limit their memory. So why don?t you want to use the cgroup memory limits? > > One example is that the custom container has a custom implementation for soft limit, and still uses cgroup memory limits as hard limit. Apps that are "good citizens" should strive to stay below the soft limit. That?s exactly what the purpose of memory.high is. With cgroups v2, memory.high is a soft limit while memory.max is a hard limit. AHS should respect both really. /Erik From manc at google.com Tue Apr 15 21:27:14 2025 From: manc at google.com (Man Cao) Date: Tue, 15 Apr 2025 14:27:14 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > > One example is that the custom container has a custom implementation for soft limit, and still uses cgroup memory limits as hard limit. Apps that are "good citizens" should strive to stay below the soft limit. > That?s exactly what the purpose of memory.high is. With cgroups v2, memory.high is a soft limit while memory.max is a hard limit. AHS should respect both really. Supporting both memory.high and memory.max for AHS sounds great. The soft limit for the custom container is only one example. The custom container also has "strange" use cases where the actual limit is larger than cgroup's hard memory limit. Going back to the high level, the point is that it is impractical for organizations such as us to change deployment environments (e.g. migrating from custom container to standard container) in order to use AHS. A flag such as CurrentMaxHeapSize will definitely help these use cases adopt AHS. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlong at openjdk.org Wed Apr 16 02:01:49 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Apr 2025 02:01:49 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 13:50:40 GMT, Jatin Bhateja wrote: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. This looks OK, but we could do better. Instead of making the relocation point to the end of the instruction and then looking up the offset with patch_barrier_relocation_offset(), why not make the offset always 0 and have the relocation point to the data offset inside the instruction? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2807988702 From stuefe at openjdk.org Wed Apr 16 05:28:47 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Apr 2025 05:28:47 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 11:28:12 GMT, Joel Sikstr?m wrote: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Hi @jsikstro, good cleanup, some small nits remain. Cheers, Thomas src/hotspot/share/gc/shared/collectedHeap.cpp line 119: > 117: heap->print_on(&st); > 118: MetaspaceUtils::print_on(&st); > 119: } Pre-existing, the other cases of printing in this file have a preceding ResourceMark. It is either needed here or not needed there. src/hotspot/share/memory/metaspace.cpp line 221: > 219: MetaspaceCombinedStats stats = get_combined_statistics(); > 220: out->print("Metaspace"); > 221: out->fill_to(17); We rely on absolute position here? Will not work well with different indentation levels. src/hotspot/share/utilities/vmError.cpp line 1399: > 1397: st->cr(); > 1398: } > 1399: Universe::heap()->print_on_error(st); Why is print_on_error called outside the indentation scope? ------------- PR Review: https://git.openjdk.org/jdk/pull/24593#pullrequestreview-2770781675 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2046093409 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2046096635 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2046084544 From jbhateja at openjdk.org Wed Apr 16 07:52:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Apr 2025 07:52:09 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: review comment resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24664/files - new: https://git.openjdk.org/jdk/pull/24664/files/1a5a73c0..ffd92c37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24664/head:pull/24664 PR: https://git.openjdk.org/jdk/pull/24664 From jbhateja at openjdk.org Wed Apr 16 07:52:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Apr 2025 07:52:09 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 01:58:53 GMT, Dean Long wrote: > This looks OK, but we could do better. Instead of making the relocation point to the end of the instruction and then looking up the offset with patch_barrier_relocation_offset(), why not make the offset always 0 and have the relocation point to the data offset inside the instruction? Hi @dean-long , As of now, barrier relocations are placed either before[1] or after[2] the instructions, offset is then added to compute the effective address of the patch site. I think you are suggesting to extend the barrier structure itself to cache the patch site address. For this bug fix PR I intend to make the patch offset agnostic to REX/REX2 prefix without disturbing the existing implimentation. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L394 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L397 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2808697302 From stuefe at openjdk.org Wed Apr 16 08:30:51 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Apr 2025 08:30:51 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 11:28:12 GMT, Joel Sikstr?m wrote: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Notes: - We may want to simplify at some point and merge streamIndentor and streamAutoIndentor. That includes checking which existing call sites use streamIndentor *without* wanting auto indentation. Not sure but I guess there are none. I think the existing cases fall into two categories: where streamIndentor was used on a stream that had already autoindent enabled, and where the code uses "cr_indent()" or "indent" to manually indent. - It would be nice to have a short comment in collectedHeap.hpp about when print_on resp print_on_error is used. From your explanation, I expect print_on_error to be used for information that should only be printed in case of a fatal error, right? - To simplify and prevent mistakes, we should consider making set_autoindent in outputStream private and make the indentor RAII classes friends of outputStream. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24593#issuecomment-2808804192 From duke at openjdk.org Wed Apr 16 09:10:53 2025 From: duke at openjdk.org (duke) Date: Wed, 16 Apr 2025 09:10:53 GMT Subject: Withdrawn: 8340434: Excessive Young GCs Triggered by CodeCache GC Threshold In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:43:50 GMT, sli-x wrote: > The trigger of _codecache_GC_threshold in CodeCache::gc_on_allocation is the key to this problem. > > if (used_ratio > threshold) { > // After threshold is reached, scale it by free_ratio so that more aggressive > // GC is triggered as we approach code cache exhaustion > threshold *= free_ratio; > } > // If code cache has been allocated without any GC at all, let's make sure > // it is eventually invoked to avoid trouble. > if (allocated_since_last_ratio > threshold) { > // In case the GC is concurrent, we make sure only one thread requests the GC. > if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { > log_info(codecache)("Triggering threshold (%.3f%%) GC due to allocating %.3f%% since last unloading (%.3f%% used -> %.3f%% used)", > threshold * 100.0, allocated_since_last_ratio * 100.0, last_used_ratio * 100.0, used_ratio * 100.0); > Universe::heap()->collect(GCCause::_codecache_GC_threshold); > } > } > > Here with the limited codecache size, the free_ratio will get lower and lower (so as the threshold) if no methods can be swept and thus leads to a more and more frequent collection behavior. Since the collection happens in stw, the whole performance of gc will also be degraded. > > So a simple solution is to delete the scaling logic here. However, I think here lies some problems worth further exploring. > > There're two options to control a code cache sweeper, StartAggressiveSweepingAt and SweeperThreshold. StartAggressiveSweepingAt is a sweeper triggered for little space in codeCache and does little harm. However, SweeperThreshold, first introduced by [JDK-8244660](https://bugs.openjdk.org/browse/JDK-8244660), was designed for a regular sweep for codecache, when codeCache sweeper and heap collection are actually individual. After [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) and some patches related, the old mechanism of codeCache sweeper is merged into a concurrent heap collection. So the Code cache sweeper heuristics and the unloading behavior will be promised by the concurrent collection. There's no longer any "zombie" methods to be counted. Considering it will introduce lots of useless collection jobs, I think SweeperThreshold should be deleted now. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21084 From erik.osterlund at oracle.com Wed Apr 16 09:45:39 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 16 Apr 2025 09:45:39 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > On 15 Apr 2025, at 23:27, Man Cao wrote: > > ? > > > One example is that the custom container has a custom implementation for soft limit, and still uses cgroup memory limits as hard limit. Apps that are "good citizens" should strive to stay below the soft limit. > > That?s exactly what the purpose of memory.high is. With cgroups v2, memory.high is a soft limit while memory.max is a hard limit. AHS should respect both really. > > Supporting both memory.high and memory.max for AHS sounds great. > The soft limit for the custom container is only one example. The custom container also has "strange" use cases where the actual limit is larger than cgroup's hard memory limit. Okay, great. Sounds like AHS + actually using the standardized cgroups memory limits as the way of limiting memory is a viable path forward then? > Going back to the high level, the point is that it is impractical for organizations such as us to change deployment environments (e.g. migrating from custom container to standard container) in order to use AHS. A flag such as CurrentMaxHeapSize will definitely help these use cases adopt AHS. So the main point for introducing CurrentMaxHeapSize, as opposed to going directly to AHS, would be to support all the people out there that already built their own adaptive container infrastructure that doesn?t use industry standard cgroup technology to limit memory. Instead, this group of users use the very proposed CurrentMaxHeapSize functionality (which obviously does not exist in mainline yet) to limit memory adaptively instead. I have to be honest? this sounds like a niche feature to me with a ticking clock attached to it. Yet if it gets integrated, we will not be able to get rid of it for decades and it will cost maintenance overheads along the way. So I think it would be good to see a prominent use case that might be interesting for a long time going forward as well, and not just a way to help you guys stop using the proposed feature in the transition to AHS, which seems to be where we are going. I think what will reach a much broader audience going forward, is AHS. And if that?s the feature we really want, I can?t help but wonder if exposing this user configurable stuff along the way is helping towards that goal rather than slowing us down by inventing yet another set of manually set handcuffs that the JVM and AHS will have to respect for ages, way past its best before date. /Erik From jsikstro at openjdk.org Wed Apr 16 13:25:47 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 13:25:47 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 05:21:41 GMT, Thomas Stuefe wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > src/hotspot/share/gc/shared/collectedHeap.cpp line 119: > >> 117: heap->print_on(&st); >> 118: MetaspaceUtils::print_on(&st); >> 119: } > > Pre-existing, the other cases of printing in this file have a preceding ResourceMark. It is either needed here or not needed there. The ResourceMarks that are used in other places in this file are not needed anymore. The reason they are placed where they are is because previously (a long time ago, since before [this](https://github.com/openjdk/jdk/commit/d12604111ccd6a5da38602077f4574adc850d9b8#diff-f9496186f2b54da5514e073a08b00afe2e2f8fbae899b13c182c8fbccc7aa7a6) commit), they were next to creating a debug stream. When the debug stream was replaced with a LogStream, the ResourceMark should have followed the LogStream, but it didn't in the changes for print_heap_{before,after}_gc(), see universe.cpp in [this](https://github.com/openjdk/jdk/commit/d12604111ccd6a5da38602077f4574adc850d9b8#diff-f9496186f2b54da5514e073a08b00afe2e2f8fbae899b13c182c8fbccc7aa7a6) commit, where the printing methods were before being moved to collectedHeap.cpp. The ResourceMarks should be removed, like Casper has done in [JDK-8294954](https://github.com/openjdk/jdk/pull/24162). I talked with Casper about the ResourceMarks, as he have looked over why the ResourceMarks are there in his patch and he agrees that they should be removed from print_heap_{before,after}_gc(), as they are likely there only for the LogStream. To summarise, no, ResourceMarks are not needed here, and they should be removed in the other places in this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2046931370 From jsikstro at openjdk.org Wed Apr 16 13:56:07 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 13:56:07 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 05:15:40 GMT, Thomas Stuefe wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > src/hotspot/share/utilities/vmError.cpp line 1399: > >> 1397: st->cr(); >> 1398: } >> 1399: Universe::heap()->print_on_error(st); > > Why is print_on_error called outside the indentation scope? This is because print_on() is in its "own" block, inside "Heap:", while print_on_error() prints its own blocks, like "ZGC Globals:" below. Other GCs behave in the same way. Heap: ZHeap used 7740M, capacity 9216M, max capacity 9216M Cache 1476M (2) size classes 128M (1), 1G (1) Metaspace used 18526K, committed 18816K, reserved 1114112K class space used 1603K, committed 1728K, reserved 1048576K ZGC Globals: Young Collection: Mark/51 Old Collection: Mark/18 Offset Max: 144G (0x0000002400000000) Page Size Small: 2M Page Size Medium: 32M ZGC Metadata Bits: LoadGood: 0x000000000000d000 LoadBad: 0x0000000000002000 ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2046992916 From jsikstro at openjdk.org Wed Apr 16 14:08:47 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 14:08:47 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 05:25:31 GMT, Thomas Stuefe wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > src/hotspot/share/memory/metaspace.cpp line 221: > >> 219: MetaspaceCombinedStats stats = get_combined_statistics(); >> 220: out->print("Metaspace"); >> 221: out->fill_to(17); > > We rely on absolute position here? Will not work well with different indentation levels. This was intended to align good with how ZGC does it. After some thought I think a better strategy is to add a space at the end of the string before filling, like: ```c++ out->print("Metaspace "); out->fill_to(17); This still aligns to the 17th column, but will not break printing for deeper indentation levels (currently 6 or more). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2047019290 From jsikstro at openjdk.org Wed Apr 16 14:19:03 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 14:19:03 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v2] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request incrementally with four additional commits since the last revision: - Safety padding for deep indentation - Remove superfluous ResourceMarks - Comment for print_on_error() - Merge 'master' into JDK-8354362_autoindent_collectedheap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/2c0c0b2b..9fea46ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=00-01 Stats: 180592 lines in 408 files changed: 10159 ins; 169115 del; 1318 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Wed Apr 16 14:19:49 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 14:19:49 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v3] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8354362_autoindent_collectedheap - Safety padding for deep indentation - Remove superfluous ResourceMarks - Comment for print_on_error() - 8354362: Use automatic indentation in CollectedHeap printing ------------- Changes: https://git.openjdk.org/jdk/pull/24593/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=02 Stats: 246 lines in 27 files changed: 88 ins; 89 del; 69 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Wed Apr 16 14:19:49 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 14:19:49 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 11:28:12 GMT, Joel Sikstr?m wrote: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Sorry for the force-push, made a mistake when merging with master. No comments should have been removed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24593#issuecomment-2809736966 From jsikstro at openjdk.org Wed Apr 16 14:28:46 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 14:28:46 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:28:22 GMT, Thomas Stuefe wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > Notes: > > - We may want to simplify at some point and merge streamIndentor and streamAutoIndentor. That includes checking which existing call sites use streamIndentor *without* wanting auto indentation. Not sure but I guess there are none. > I think the existing cases fall into two categories: where streamIndentor was used on a stream that had already autoindent enabled, and where the code uses "cr_indent()" or "indent" to manually indent. > > - It would be nice to have a short comment in collectedHeap.hpp about when print_on resp print_on_error is used. From your explanation, I expect print_on_error to be used for information that should only be printed in case of a fatal error, right? > > - To simplify and prevent mistakes, we should consider making set_autoindent in outputStream private and make the indentor RAII classes friends of outputStream. Thank you for looking at this @tstuefe! I've addressed some of your comments with new commits. I agree that we likely want to merge streamIndentor and StreamAutoIndentor in a follow up RFE, where it also would be good to look at making set_autoindent() private. I haven't looked into it, but it feels weird to have an indentation level on an outputStream and use it only explicitly via indent() and not via a StreamAutoIndentor. I think a good solution would be to only allow indentation via the StreamAutoIndentor API like you're proposing, and look into whether there should be some API for temporarily disabling indentation with a RAII object (or just some parameters to StreamAutoIndentor) if there are cases that require it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24593#issuecomment-2809764389 From dlong at openjdk.org Wed Apr 16 21:13:52 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Apr 2025 21:13:52 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 07:52:09 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > review comment resolutions Yes, I am suggesting doing something like: __ relocate(__ pc() - 4, barrier_Relocation::spec(), ZBarrierRelocationFormatStoreGoodAfterOr); which would be a bigger change to the implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2810802951 From lmesnik at openjdk.org Wed Apr 16 23:07:53 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 16 Apr 2025 23:07:53 GMT Subject: Integrated: 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API In-Reply-To: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> References: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> Message-ID: <1ZIcsJTnCri0LVBjSYa15TA8IpyrQxmw0K-SAFtBr5E=.e3f1835a-a3d6-4afa-80d1-fecb9751c859@github.com> On Tue, 15 Apr 2025 01:29:50 GMT, Leonid Mesnik wrote: > Just minor clean up of WB API usage. > Also changed othervm to driver. This pull request has now been integrated. Changeset: db2dffb6 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/db2dffb6e5fed3773080581350f7f5c0bcff8f35 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/24642 From jbhateja at openjdk.org Thu Apr 17 02:22:42 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Apr 2025 02:22:42 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: <1iR9_nrbk0iFlgy28u4dO4-7OWjEkO__AoZ9zHqtm8I=.ae8b0a68-0f85-472d-a810-e9c8417097d9@github.com> On Wed, 16 Apr 2025 21:10:38 GMT, Dean Long wrote: > Yes, I am suggesting doing something like: > > ``` > __ relocate(__ pc() - 4, barrier_Relocation::spec(), ZBarrierRelocationFormatStoreGoodAfterOr); > ``` > > which would be a bigger change to the implementation. Yes, this is what I mean by address caching in my above comment. we already have an existing interface for it in place; the intent of this bug fix PR is not to improve upon the infrastructure but to align the fix with the current scheme. Do you suggest doing that in a follow up PR ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2811561649 From jbhateja at openjdk.org Thu Apr 17 03:21:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Apr 2025 03:21:08 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24664/files - new: https://git.openjdk.org/jdk/pull/24664/files/ffd92c37..dc2b2b16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=01-02 Stats: 10 lines in 4 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24664/head:pull/24664 PR: https://git.openjdk.org/jdk/pull/24664 From stuefe at openjdk.org Thu Apr 17 05:25:54 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Apr 2025 05:25:54 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v3] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 14:06:21 GMT, Joel Sikstr?m wrote: >> src/hotspot/share/memory/metaspace.cpp line 221: >> >>> 219: MetaspaceCombinedStats stats = get_combined_statistics(); >>> 220: out->print("Metaspace"); >>> 221: out->fill_to(17); >> >> We rely on absolute position here? Will not work well with different indentation levels. > > This was intended to align good with how ZGC does it. After some thought I think a better strategy is to add a space at the end of the string before filling, like: > > ```c++ > out->print("Metaspace "); > out->fill_to(17); > > This still aligns to the 17th column, but will not break printing for deeper indentation levels (currently 6 or more). Yes that sounds better >> src/hotspot/share/utilities/vmError.cpp line 1399: >> >>> 1397: st->cr(); >>> 1398: } >>> 1399: Universe::heap()->print_on_error(st); >> >> Why is print_on_error called outside the indentation scope? > > This is because print_on() is in its "own" block, inside "Heap:", while print_on_error() prints its own blocks, like "ZGC Globals:" below. Other GCs behave in the same way. > > > Heap: > ZHeap used 7740M, capacity 9216M, max capacity 9216M > Cache 1476M (2) > size classes 128M (1), 1G (1) > Metaspace used 18526K, committed 18816K, reserved 1114112K > class space used 1603K, committed 1728K, reserved 1048576K > > ZGC Globals: > Young Collection: Mark/51 > Old Collection: Mark/18 > Offset Max: 144G (0x0000002400000000) > Page Size Small: 2M > Page Size Medium: 32M > > ZGC Metadata Bits: > LoadGood: 0x000000000000d000 > LoadBad: 0x0000000000002000 > ... Hmm, that may be an indication that this should be in its own error reporting STEP, then? Probably does not matter much, just aesthetics ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048252477 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048252150 From jsikstro at openjdk.org Thu Apr 17 09:13:34 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 09:13:34 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v4] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request incrementally with two additional commits since the last revision: - Separate print_heap_on and print_gc_on in VMError printing - Rename print_on and print_on_error to print_heap_on and print_gc_on ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/c1140b86..2979316c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=02-03 Stats: 71 lines in 15 files changed: 19 ins; 6 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Thu Apr 17 09:13:35 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 09:13:35 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v4] In-Reply-To: References: Message-ID: <4Swh7By1eRJ19p7ULrAryORBm97i0783ErfLDJhdnKw=.1a0e864e-e74f-4491-a153-fc1c049688be@github.com> On Thu, 17 Apr 2025 05:23:10 GMT, Thomas Stuefe wrote: >> This is because print_on() is in its "own" block, inside "Heap:", while print_on_error() prints its own blocks, like "ZGC Globals:" below. Other GCs behave in the same way. >> >> >> Heap: >> ZHeap used 7740M, capacity 9216M, max capacity 9216M >> Cache 1476M (2) >> size classes 128M (1), 1G (1) >> Metaspace used 18526K, committed 18816K, reserved 1114112K >> class space used 1603K, committed 1728K, reserved 1048576K >> >> ZGC Globals: >> Young Collection: Mark/51 >> Old Collection: Mark/18 >> Offset Max: 144G (0x0000002400000000) >> Page Size Small: 2M >> Page Size Medium: 32M >> >> ZGC Metadata Bits: >> LoadGood: 0x000000000000d000 >> LoadBad: 0x0000000000002000 >> ... > > Hmm, that may be an indication that this should be in its own error reporting STEP, then? Probably does not matter much, just aesthetics I agree. With some suggestions from @stefank, I've renamed print_on() to print_heap_on() and print_on_error() to print_gc_on() to better reflect their purpose. I've also separated print_heap_on() and print_gc_on() into their own "STEPs" in the printing in vmError.cpp: STEP("printing heap information") ... print_heap_on(); ... STEP("printing GC information") ... print_gc_on() ... With this change it would make better sense to print the precious log in the GC section rather than the heap section. This would change the printing order, which I have not yet done in this patch, so I think it would be better in a follow up RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048556654 From jsikstro at openjdk.org Thu Apr 17 09:47:17 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 09:47:17 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v5] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Shenandoah print rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/2979316c..33d20641 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=03-04 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From stuefe at openjdk.org Thu Apr 17 10:44:53 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Apr 2025 10:44:53 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v5] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 09:47:17 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Shenandoah print rename Looks fine to me, but GC people should look at this too. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24593#pullrequestreview-2775319824 From jsikstro at openjdk.org Thu Apr 17 10:49:39 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 10:49:39 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v6] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Rename ZGC printing to match print_heap_on() and print_gc_on() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/33d20641..0824712c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=04-05 Stats: 24 lines in 5 files changed: 1 ins; 1 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From sjohanss at openjdk.org Thu Apr 17 10:53:29 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 17 Apr 2025 10:53:29 GMT Subject: RFR: 8354929: Update collection stats while holding page allocator lock Message-ID: Please review this change to restructure some code in the mark start pause to do updates while holding the lock. **Summary** We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. **Testing** Mach5 tier1-5 ------------- Commit messages: - Move collection stat update under lock Changes: https://git.openjdk.org/jdk/pull/24719/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24719&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354929 Stats: 45 lines in 3 files changed: 17 ins; 15 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24719/head:pull/24719 PR: https://git.openjdk.org/jdk/pull/24719 From stefank at openjdk.org Thu Apr 17 11:22:55 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Apr 2025 11:22:55 GMT Subject: RFR: 8354922: ZGC: Use MAP_FIXED_NOREPLACE when reserving memory Message-ID: We have seen that some versions of the Linux kernel does not honor the address hint when mmapping memory without MAP_FIXED, if there is an adjacent memory area above the requested memory area. If we use MAP_FIXED_NOREPLACE, the reservation succeeds. I propose that we start using MAP_FIXED_NOREPLACE. Tested via GHA, which runs the gtest that performs a discontiguous, but adjacent reservation. I will run this through a bunch of tiers before integrating. ------------- Commit messages: - 8354922: ZGC: Use MAP_FIXED_NOREPLACE when reserving memory Changes: https://git.openjdk.org/jdk/pull/24716/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24716&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354922 Stats: 10 lines in 2 files changed: 9 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24716/head:pull/24716 PR: https://git.openjdk.org/jdk/pull/24716 From stefank at openjdk.org Thu Apr 17 11:26:51 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Apr 2025 11:26:51 GMT Subject: RFR: 8354929: Update collection stats while holding page allocator lock In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 10:48:54 GMT, Stefan Johansson wrote: > Please review this change to restructure some code in the mark start pause to do updates while holding the lock. > > **Summary** > We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. > > I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. > > **Testing** > Mach5 tier1-5 Looks good. I've add a couple of suggestions for blank lines. src/hotspot/share/gc/z/zPageAllocator.cpp line 1378: > 1376: void ZPageAllocator::update_collection_stats(ZGenerationId id) { > 1377: assert(SafepointSynchronize::is_at_safepoint(), "Should be at safepoint"); > 1378: #ifdef ASSERT Suggestion: #ifdef ASSERT src/hotspot/share/gc/z/zPageAllocator.cpp line 1388: > 1386: assert(total_used == _used, "Must be consistent at safepoint %zu == %zu", total_used, _used); > 1387: #endif > 1388: _collection_stats[(int)id]._used_high = _used; Suggestion: _collection_stats[(int)id]._used_high = _used; ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24719#pullrequestreview-2775405882 PR Review Comment: https://git.openjdk.org/jdk/pull/24719#discussion_r2048745637 PR Review Comment: https://git.openjdk.org/jdk/pull/24719#discussion_r2048745371 From stefank at openjdk.org Thu Apr 17 11:28:04 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Apr 2025 11:28:04 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v6] In-Reply-To: References: Message-ID: <_rpazMMpjOnpEthySjhhfKE_Lit3eMNkH27qke5-Syc=.c0a68c73-9fdc-4ba8-950a-9fead760abda@github.com> On Thu, 17 Apr 2025 10:49:39 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Rename ZGC printing to match print_heap_on() and print_gc_on() src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 680: > 678: } > 679: st->cr(); > 680: Below this line we still have a print_on_error call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048749718 From stefank at openjdk.org Thu Apr 17 12:05:44 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Apr 2025 12:05:44 GMT Subject: RFR: 8354938: ZGC: Disable UseNUMA when ZFakeNUMA is used Message-ID: ZFakeNUMA is used to fake a number of NUMA nodes within ZGC. The intention was to make ZFakeNUMA mutually exclusive with UseNUMA, but the current code allows the user to enable UseNUMA and set ZFakeNUMA, which will trigger to the "mutual exclusion" assert in ZNUMA::initialize. Verified on NUMA machine with -XX:+UseNUMA -XX:ZFakeNUMA=. Will run this through our lower tiers. ------------- Commit messages: - 8354938: ZGC: Disable UseNUMA when ZFakeNUMA is used Changes: https://git.openjdk.org/jdk/pull/24721/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24721&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354938 Stats: 13 lines in 1 file changed: 10 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24721.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24721/head:pull/24721 PR: https://git.openjdk.org/jdk/pull/24721 From jsikstro at openjdk.org Thu Apr 17 12:15:45 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 12:15:45 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v7] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Rename print_on_error() in the remaining call-paths from print_gc_on() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/0824712c..042c0aee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=05-06 Stats: 15 lines in 11 files changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Thu Apr 17 12:15:46 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 12:15:46 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v6] In-Reply-To: <_rpazMMpjOnpEthySjhhfKE_Lit3eMNkH27qke5-Syc=.c0a68c73-9fdc-4ba8-950a-9fead760abda@github.com> References: <_rpazMMpjOnpEthySjhhfKE_Lit3eMNkH27qke5-Syc=.c0a68c73-9fdc-4ba8-950a-9fead760abda@github.com> Message-ID: On Thu, 17 Apr 2025 11:24:56 GMT, Stefan Karlsson wrote: >> Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename ZGC printing to match print_heap_on() and print_gc_on() > > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 680: > >> 678: } >> 679: st->cr(); >> 680: > > Below this line we still have a print_on_error call. I renamed the remaining instances of print_on_error() in GC code with alternative names, all the way down to BitMap::print_on_error() which I renamed to BitMap::print_range_on(). The only remaining print_on_error() is GCLogPrecious::print_on_error(), which I figured might be left unchanged. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048817231 From stefank at openjdk.org Thu Apr 17 13:01:46 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Apr 2025 13:01:46 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v7] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 12:15:45 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Rename print_on_error() in the remaining call-paths from print_gc_on() I think this looks good. Not yet a full review, thought, but just wanted to send out my +1 on the changes. I've added a couple of more suggestions. src/hotspot/share/gc/serial/tenuredGeneration.cpp line 449: > 447: > 448: StreamAutoIndentor indentor(st, 1); > 449: st->print("the "); _the_space->print_on(st); Suggestion: st->print("the "); _the_space->print_on(st); src/hotspot/share/gc/z/zPageAllocator.cpp line 1171: > 1169: } > 1170: > 1171: void ZPartition::print_extended_cache_on(outputStream* st) const { I would like to suggest that you flip the 'extended' and 'cache' words: Suggestion: void ZPartition::print_cache_extended_on(outputStream* st) const { So, that we have the structure: print__on print__extended_on ------------- PR Review: https://git.openjdk.org/jdk/pull/24593#pullrequestreview-2775538728 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048825842 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048839661 From sjohanss at openjdk.org Thu Apr 17 18:15:21 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 17 Apr 2025 18:15:21 GMT Subject: RFR: 8354929: Update collection stats while holding page allocator lock [v2] In-Reply-To: References: Message-ID: > Please review this change to restructure some code in the mark start pause to do updates while holding the lock. > > **Summary** > We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. > > I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. > > **Testing** > Mach5 tier1-5 Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: - Additional blank line Co-authored-by: Stefan Karlsson - Additional blank line Co-authored-by: Stefan Karlsson ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24719/files - new: https://git.openjdk.org/jdk/pull/24719/files/473bdff2..b8966d36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24719&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24719&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24719/head:pull/24719 PR: https://git.openjdk.org/jdk/pull/24719 From dlong at openjdk.org Thu Apr 17 19:40:57 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 17 Apr 2025 19:40:57 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 03:21:08 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions When I made my suggestions, I didn't realize it would also require changes on the Graal side. So I would suggest a separate PR only if the Graal team agrees. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2813856674 From manc at google.com Fri Apr 18 05:28:21 2025 From: manc at google.com (Man Cao) Date: Thu, 17 Apr 2025 22:28:21 -0700 Subject: Moving Forward with AHS for G1 Message-ID: >> Supporting both memory.high and memory.max for AHS sounds great. >> The soft limit for the custom container is only one example. The custom container also has "strange" use cases where the actual limit is larger than cgroup's hard memory limit. > Okay, great. Sounds like AHS + actually using the standardized cgroups memory limits as the way of limiting memory is a viable path forward then? Not exactly. It is still impractical to migrate the custom container cases to standard cgroups. Thus those custom container cases cannot use AHS. One reason is the "strange" use cases where the actual limit is larger than cgroup's hard memory limit. There are other reasons that the custom container cannot migrate to standard cgroups. > So the main point for introducing CurrentMaxHeapSize, as opposed to going directly to AHS, would be to support all the people out there that already built their own adaptive container infrastructure that doesn?t use industry standard cgroup technology to limit memory. Instead, this group of users use the very proposed CurrentMaxHeapSize functionality (which obviously does not exist in mainline yet) to limit memory adaptively instead. > I have to be honest? this sounds like a niche feature to me with a ticking clock attached to it. Yet if it gets integrated, we will not be able to get rid of it for decades and it will cost maintenance overheads along the way. So I think it would be good to see a prominent use case that might be interesting for a long time going forward as well, and not just a way to help you guys stop using the proposed feature in the transition to AHS, which seems to be where we are going. > I think what will reach a much broader audience going forward, is AHS. And if that?s the feature we really want, I can?t help but wonder if exposing this user configurable stuff along the way is helping towards that goal rather than slowing us down by inventing yet another set of manually set handcuffs that the JVM and AHS will have to respect for ages, way past its best before date. I'd say the statements above are "overfitting" CurrentMaxHeapSize to the custom container use case. The main point for the value of CurrentMaxHeapSize (or a high-precedence SoftMaxHeapSize) is as mentioned in the previous response : a fully-developed AHS is unlikely to satisfy all use cases and deployment environments out there. CurrentMaxHeapSize (or a high-precedence SoftMaxHeapSize) provides additional flexibility and control for AHS and for non-AHS use cases. The custom container and JVM-external algorithm for calculating CurrentMaxHeapSize/SoftMaxHeapSize is only one example of such use cases. I could think of other use cases for CurrentMaxHeapSize (or high-precedence SoftMaxHeapSize): 1. CRIU (OpenJDK CRaC) from [~rvansa]'s comment on https://bugs.openjdk.org/browse/JDK-8204088. This case needs to shrink the Java heap as much as possible before creating the process snapshot. CRaC has implemented https://bugs.openjdk.org/browse/JDK-8348650 for G1. This is almost the same as the use case for setting -XX:MinHeapFreeRatio=0 -XX:MaxHeapFreeRatio=0 mentioned previously in this thread . Min/MaxHeapFreeRatio only works for G1 and ParallelGC, and will likely stop working for G1 as https://bugs.openjdk.org/browse/JDK-8353716 says. 2. Multiple Java processes with different priorities. If multiple processes run inside the same container and memory is running low, users could set a smaller CurrentMaxHeapSize for low-priority processes, to make more memory available to high-priority processes. 3. Shrinking container memory limit dynamically. Directly setting container memory limit to below the container memory usage will likely fail. However, if user sets a smaller CurrentMaxHeapSize first, the Java process will shrink the heap, thus reducing container memory usage. Then lowering the memory limit will succeed. In addition, these use cases may not want to adopt AHS for various reasons. Instead, they could use CurrentMaxHeapSize/SoftMaxHeapSize to directly solve the problems. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Fri Apr 18 08:32:41 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 18 Apr 2025 08:32:41 GMT Subject: RFR: 8354929: Update collection stats while holding page allocator lock [v2] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 18:15:21 GMT, Stefan Johansson wrote: >> Please review this change to restructure some code in the mark start pause to do updates while holding the lock. >> >> **Summary** >> We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. >> >> I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. >> >> **Testing** >> Mach5 tier1-5 > > Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: > > - Additional blank line > > Co-authored-by: Stefan Karlsson > - Additional blank line > > Co-authored-by: Stefan Karlsson Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24719#pullrequestreview-2778082113 From tschatzl at openjdk.org Fri Apr 18 09:33:48 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 18 Apr 2025 09:33:48 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v34] In-Reply-To: References: Message-ID: <3VD8WHNeCOwh3vgziKpuOctwd7CsOXM6uEVc1P6HSrg=.961011ff-9e7b-456d-bb70-f6ef89cc6735@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - * ayang review (part 2 - yield duration changes) - * ayang review (part 1) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/068d2a37..a3b2386d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=32-33 Stats: 41 lines in 11 files changed: 1 ins; 11 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Fri Apr 18 09:46:41 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 18 Apr 2025 09:46:41 GMT Subject: RFR: 8346568: G1: Other time can be negative In-Reply-To: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Message-ID: On Fri, 4 Apr 2025 18:00:21 GMT, Sangheon Kim wrote: > Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. > 1. Different scope of measurement > - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. > (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) > - Changed not to be included in sum_of_sub_phases. > - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. > 2. Duplicated measurement > - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. > 3. Pre Concurrent task execution time > - Sometimes the difference between the existing average time and pre-concurrent work is 2 digit milliseconds. Changed to measure exact time rather than accumulating the average value to sum_of_sub_phases and keep displaying concurrent tasks' average execution time. > > Testing: tier 1 ~ 5 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 411: > 409: > 410: double G1GCPhaseTimes::print_pre_evacuate_collection_set() const { > 411: const double pre_concurrent_start_ms = average_time_ms(ResetMarkingState) + Could this assignment be moved down to just before the use? src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 425: > 423: // Concurrent tasks of ResetMarkingState and NoteStartOfMark are triggered during > 424: // young collection. However, their execution time are not included in _gc_pause_time_ms. > 425: if (pre_concurrent_start_ms > 0.0) { Since `pre_concurrent_start_ms` is now actually gathered, maybe print an extra line for it, with the `ResetMarkingState` and `NoteStartOfMark` log lines indented? I.e. something like: if (_cur_prepare_concurrent_task_time_ms > 0.0) { debug_time("Prepare Concurrent Start", _cur_prepare_concurrent_task_time_ms); debug_phase(_gc_par_phases[ResetMarkingState], 1); debug_phase(_gc_par_phases[NoteStartOfMark], 1); } ? Then we can also drop the calculation of the local `pre_concurrent_start_ms`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24454#pullrequestreview-2778191624 PR Review Comment: https://git.openjdk.org/jdk/pull/24454#discussion_r2050415309 PR Review Comment: https://git.openjdk.org/jdk/pull/24454#discussion_r2050420949 From tschatzl at openjdk.org Fri Apr 18 10:08:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 18 Apr 2025 10:08:52 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v34] In-Reply-To: <3VD8WHNeCOwh3vgziKpuOctwd7CsOXM6uEVc1P6HSrg=.961011ff-9e7b-456d-bb70-f6ef89cc6735@github.com> References: <3VD8WHNeCOwh3vgziKpuOctwd7CsOXM6uEVc1P6HSrg=.961011ff-9e7b-456d-bb70-f6ef89cc6735@github.com> Message-ID: On Fri, 18 Apr 2025 09:33:48 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * ayang review (part 2 - yield duration changes) > - * ayang review (part 1) The current use of all filters in the barrier is intentional: there is additional work going on investigating that, and I did not want to anticipate it in this change. When implementing the current `gen_write_ref_array_post` code measurements showed that the current version is slightly better than your suggestion for most arrays (everything larger than a few elements). I may still decide to use your version for now and re-measure later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2815125500 From sangheki at openjdk.org Fri Apr 18 19:09:33 2025 From: sangheki at openjdk.org (Sangheon Kim) Date: Fri, 18 Apr 2025 19:09:33 GMT Subject: RFR: 8346568: G1: Other time can be negative [v2] In-Reply-To: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Message-ID: > Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. > 1. Different scope of measurement > - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. > (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) > - Changed not to be included in sum_of_sub_phases. > - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. > 2. Duplicated measurement > - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. > 3. Pre Concurrent task execution time > - Sometimes the difference between the existing average time and pre-concurrent work is 2 digit milliseconds. Changed to measure exact time rather than accumulating the average value to sum_of_sub_phases and keep displaying concurrent tasks' average execution time. > > Testing: tier 1 ~ 5 Sangheon Kim has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8346568-G1-negative-time - Separate measurement for cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24454/files - new: https://git.openjdk.org/jdk/pull/24454/files/1c1750fd..d5f6b641 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24454&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24454&range=00-01 Stats: 257042 lines in 1817 files changed: 57470 ins; 193153 del; 6419 mod Patch: https://git.openjdk.org/jdk/pull/24454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24454/head:pull/24454 PR: https://git.openjdk.org/jdk/pull/24454 From lmesnik at openjdk.org Sat Apr 19 00:44:18 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 19 Apr 2025 00:44:18 GMT Subject: RFR: 8355069: Allocation::check_out_of_memory() should support CheckUnhandledOops mode Message-ID: The CheckUnhandledOops cause failure if JvmtiExport::post_resource_exhausted(...) is called in MemAllocator::Allocation::check_out_of_memory() The obj is null so it is not a real bug. I am fixing it to reduce noise for CheckUnhandledOops mode for jvmti tests execution. The vmTestbase/nsk/jvmti/ResourceExhausted/resexhausted002/TestDescription.java failed with -XX:+CheckUnhandledOops ------------- Commit messages: - 8355069 Changes: https://git.openjdk.org/jdk/pull/24766/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24766&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355069 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24766.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24766/head:pull/24766 PR: https://git.openjdk.org/jdk/pull/24766 From lmesnik at openjdk.org Sat Apr 19 02:25:33 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 19 Apr 2025 02:25:33 GMT Subject: RFR: 8355069: Allocation::check_out_of_memory() should support CheckUnhandledOops mode [v2] In-Reply-To: References: Message-ID: > The > CheckUnhandledOops > cause failure if JvmtiExport::post_resource_exhausted(...) > is called in > MemAllocator::Allocation::check_out_of_memory() > The obj is null so it is not a real bug. > > I am fixing it to reduce noise for CheckUnhandledOops mode for jvmti tests execution. > The vmTestbase/nsk/jvmti/ResourceExhausted/resexhausted002/TestDescription.java > failed with -XX:+CheckUnhandledOops Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: typo fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24766/files - new: https://git.openjdk.org/jdk/pull/24766/files/aa84af52..cb2904d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24766&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24766&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24766.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24766/head:pull/24766 PR: https://git.openjdk.org/jdk/pull/24766 From sangheki at openjdk.org Sat Apr 19 05:08:26 2025 From: sangheki at openjdk.org (Sangheon Kim) Date: Sat, 19 Apr 2025 05:08:26 GMT Subject: RFR: 8346568: G1: Other time can be negative [v3] In-Reply-To: References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Message-ID: <483hE4M8lfm5sv4bpf9YfN0qim6OwODlXgZj9aLReso=.0bbaadf1-221c-4e9d-a16f-f2e86fffe17a@github.com> On Fri, 18 Apr 2025 09:38:33 GMT, Thomas Schatzl wrote: >> Sangheon Kim has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Review from Thomas >> - Separate measurement for cleanup > > src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 425: > >> 423: // Concurrent tasks of ResetMarkingState and NoteStartOfMark are triggered during >> 424: // young collection. However, their execution time are not included in _gc_pause_time_ms. >> 425: if (pre_concurrent_start_ms > 0.0) { > > Since `pre_concurrent_start_ms` is now actually gathered, maybe print an extra line for it, with the `ResetMarkingState` and `NoteStartOfMark` log lines indented? > > I.e. something like: > > > if (_cur_prepare_concurrent_task_time_ms > 0.0) { > debug_time("Prepare Concurrent Start", _cur_prepare_concurrent_task_time_ms); > debug_phase(_gc_par_phases[ResetMarkingState], 1); > debug_phase(_gc_par_phases[NoteStartOfMark], 1); > } > > ? > > Then we can also drop the calculation of the local `pre_concurrent_start_ms`. Okay, this looks better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24454#discussion_r2051391042 From sangheki at openjdk.org Sat Apr 19 05:08:26 2025 From: sangheki at openjdk.org (Sangheon Kim) Date: Sat, 19 Apr 2025 05:08:26 GMT Subject: RFR: 8346568: G1: Other time can be negative [v3] In-Reply-To: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Message-ID: <1iumndO7Tu352QZf_8tPaSTlYqdRBtNVw7N_VHLj52E=.fc6d1856-8f3f-4e04-80ac-5b34dd3dbcb5@github.com> > Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. > 1. Different scope of measurement > - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. > (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) > - Changed not to be included in sum_of_sub_phases. > - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. > 2. Duplicated measurement > - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. > 3. Pre Concurrent task execution time > - Sometimes the difference between the existing average time and pre-concurrent work is 2 digit milliseconds. Changed to measure exact time rather than accumulating the average value to sum_of_sub_phases and keep displaying concurrent tasks' average execution time. > > Testing: tier 1 ~ 5 Sangheon Kim has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Review from Thomas - Separate measurement for cleanup ------------- Changes: https://git.openjdk.org/jdk/pull/24454/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24454&range=02 Stats: 68 lines in 4 files changed: 36 ins; 20 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24454/head:pull/24454 PR: https://git.openjdk.org/jdk/pull/24454 From gli at openjdk.org Sun Apr 20 11:15:42 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 20 Apr 2025 11:15:42 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v2] In-Reply-To: References: Message-ID: <6lDEcjgVR_AB4ZIAgX7oMHGdXzVGx52RB_EzOqJKqMg=.97d50c03-9113-4309-bd93-35b83d54f470@github.com> On Thu, 10 Apr 2025 11:59:52 GMT, Albert Mingkun Yang wrote: >> Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. >> >> Test: tier1-7 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/gc/parallel/parallelArguments.cpp line 78: > 76: } else { > 77: FLAG_SET_DEFAULT(InitialSurvivorRatio, MinSurvivorRatio); > 78: } If both `InitialSurvivorRatio` and `MinSurvivorRatio` are not set in command line and the condition `InitialSurvivorRatio < MinSurvivorRatio` is true, it seems the corresponding default/ergonomic values, we set before, are wrong. Should we guard this situation (such as printing an error message) to catch the bug in the previous code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24556#discussion_r2051691657