From stefank at openjdk.org Tue Apr 1 07:04:54 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 1 Apr 2025 07:04:54 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 Message-ID: We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. Thanks to @plummercj for digging into this and proposing the same workaround. Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline ------------- Commit messages: - 8352994: ZGC: Fix regression introduced in JDK-8350572 Changes: https://git.openjdk.org/jdk/pull/24349/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24349&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352994 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24349/head:pull/24349 PR: https://git.openjdk.org/jdk/pull/24349 From cjplummer at openjdk.org Tue Apr 1 07:34:10 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 1 Apr 2025 07:34:10 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 In-Reply-To: References: Message-ID: <1S8NSOeUGbiCGZVwqiX0WGoHBguDWHvwwsxziFaFdtk=.3f5d4b1a-8d4e-47c3-a72c-9b8fc00e529d@github.com> On Tue, 1 Apr 2025 06:58:56 GMT, Stefan Karlsson wrote: > We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). > > The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. > > Thanks to @plummercj for digging into this and proposing the same workaround. > > Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline I think you should also remove com/sun/jdi/JdbStopInNotificationThreadTest.java from the ZGC problem list. ------------- PR Review: https://git.openjdk.org/jdk/pull/24349#pullrequestreview-2731743846 From manc at openjdk.org Tue Apr 1 08:33:52 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 1 Apr 2025 08:33:52 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v4] In-Reply-To: References: Message-ID: <_pxXWVlRMa_NcaIQWm6RS_CCrMuHpKZiKIXzxJuer6g=.ba7c6007-cc1f-44a4-b7cd-dd55f3322c65@github.com> > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Add two tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/6f201fac..fc22cbfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=02-03 Stats: 162 lines in 2 files changed: 162 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From tschatzl at openjdk.org Tue Apr 1 08:43:01 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 1 Apr 2025 08:43:01 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure Message-ID: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Hi all, please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). This has been made possible with the refactoring of object array task queues. At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). Testing: tier1-5, some perf testing with no differences Thanks, Thomas ------------- Commit messages: - 8271870 Changes: https://git.openjdk.org/jdk/pull/24222/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24222&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8271870 Stats: 101 lines in 3 files changed: 46 ins; 32 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/24222.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24222/head:pull/24222 PR: https://git.openjdk.org/jdk/pull/24222 From manc at openjdk.org Tue Apr 1 08:44:55 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 1 Apr 2025 08:44:55 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v4] In-Reply-To: <_pxXWVlRMa_NcaIQWm6RS_CCrMuHpKZiKIXzxJuer6g=.ba7c6007-cc1f-44a4-b7cd-dd55f3322c65@github.com> References: <_pxXWVlRMa_NcaIQWm6RS_CCrMuHpKZiKIXzxJuer6g=.ba7c6007-cc1f-44a4-b7cd-dd55f3322c65@github.com> Message-ID: <0rUbRHQuIv6bhZEiaalc5Qcfq5E7FJb51TtEf9qeYTk=.b084a316-7352-4c1b-8bea-5485740704e9@github.com> On Tue, 1 Apr 2025 08:33:52 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add two tests This PR is ready for review. Included tests cover important functionality of `SoftMaxHeapSize`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2768618593 From manc at openjdk.org Tue Apr 1 08:44:55 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 1 Apr 2025 08:44:55 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Revise test summary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/fc22cbfe..68f03cad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=03-04 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From tschatzl at openjdk.org Tue Apr 1 08:57:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 1 Apr 2025 08:57:58 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:44:55 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Revise test summary Initial comments. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2066: > 2064: size_t G1CollectedHeap::soft_max_capacity() const { > 2065: return clamp(align_up(SoftMaxHeapSize, HeapAlignment), MinHeapSize, > 2066: max_capacity()); Maybe this clamping of `SoftMaxHeapSize` should be part of argument processing. src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 1203: > 1201: size_t max_capacity() const override; > 1202: > 1203: // Print the soft maximum heap capacity. Suggestion: // Returns the soft maximum heap capacity. src/hotspot/share/gc/g1/g1IHOPControl.cpp line 119: > 117: return (size_t)MIN2( > 118: G1CollectedHeap::heap()->soft_max_capacity() * (100.0 - safe_total_heap_percentage) / 100.0, > 119: _target_occupancy * (100.0 - _heap_waste_percent) / 100.0 This looks wrong. G1ReservePercent is in some way similar to soft max heap size, intended to keep the target below the real maximum capacity. I.e. it is not intended that G1 keeps another reserve of G1ReservePercent size below soft max capacity (which is below maximum capacity). There has been some internal discussion about whether the functionality of G1ReservePercent and SoftMaxHeapSize is too similar to warrant the former, but removing it is another issue. Imo, SoftMaxHeapSize should be an separate, actual target for this calculation. (`default_conc_mark_start_threshold()` also does not subtract `G1ReservePercent` from `SoftMaxHeapSize`). test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java line 29: > 27: * @test > 28: * @bug 8236073 > 29: * @requires vm.gc.G1 & vm.opt.ExplicitGCInvokesConcurrent != true It's nicer to put and-ed conditions in separate lines. test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java line 46: > 44: private static final long ALLOCATED_BYTES = 20_000_000; // About 20M > 45: private static final long MAX_HEAP_SIZE = > 46: 200 * 1024 * 1024; // 200MiB, must match -Xmx on command line. Is it possible to get that value from the `MemoryMXBean` instead of relying on manual update? I.e. `getMax()`? ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24211#pullrequestreview-2731934928 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2022415626 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2022415016 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2022430412 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2022434814 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2022438436 From tschatzl at openjdk.org Tue Apr 1 09:24:12 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 1 Apr 2025 09:24:12 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v29] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq - * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update - * fix IR code generation tests that change due to barrier cost changes - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - * more documentation on why we need to rendezvous the gc threads - Merge branch 'master' into 8342381-card-table-instead-of-dcq - ... and 27 more: https://git.openjdk.org/jdk/compare/aff5aa72...51fb6e63 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=28 Stats: 7089 lines in 110 files changed: 2610 ins; 3555 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From iwalulya at openjdk.org Tue Apr 1 10:55:27 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 1 Apr 2025 10:55:27 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:44:55 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Revise test summary With the changes to `young_collection_expansion_amount()`, once we reach the `SoftMaxHeapSize`, we cannot expand the heap except during GC where expansion can happen without regard for `SoftMaxHeapSize`. Thus, after exceeding `SoftMaxHeapSize` we go into a phase of repeated GCs where we expand the heap almost one region at a time. Is this the expected effect of the `SoftMaxHeapSize` as implemented by this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2768966455 From stefan.johansson at oracle.com Tue Apr 1 12:49:10 2025 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 1 Apr 2025 14:49:10 +0200 Subject: [EXTERNAL] Re: RFC: G1 as default collector (for real this time) In-Reply-To: References: <74d05686-9c57-4262-881d-31c269f34bc5@oracle.com> <61CEE33A-6718-479D-A498-697C1063B5AA@oracle.com> Message-ID: <792ad340-5160-413b-b766-c49b4ff6d4c5@oracle.com> Thanks for sharing these results Monica, As Thomas mentioned we have done some testing comparing Serial to G1 in small environments as well. Our conclusions are similar to yours, G1 nowdays handles the small environments pretty good. I used SPECjbb2005, and my focus was to compare throughput given a fixed memory usage. The reason for this is that the low native memory overhead of Serial (no marking bitmap etc) is often used as an argument to use it in small environments. On the other hand, the region based heap layout of G1 can in many cases offer a better out of the box heap utilization compared to Serial. To test this and to make a fair comparison I configure Serial to have a slightly larger heap to get an overall equal memory consumption (using the peak PSS usage in Linux as the measure). SpecJBB2005 by default runs 1 to 8 warehouses, where warehouses corresponds to worker threads. I did run this in a cgroup environment with 1CPU and 1G memory. By default this will give G1 a 256m max heap, which I fixed using Xmx and Xms. To let Serial use as much memory in total as G1 I configured it with a 288MB heap. With this setup Serial and G1 get a very similar score with a recent JDK 25 build. The calculated score only takes warehouse 1 and 2 into account and looking at the result/score for 8 warehouses G1 is ~10% better. So it looks like G1 is able to handle high pressure better compared to Serial. These results are without the new improved barriers for G1, when using a build with the new barrier the G1 results are improved by roughly 3%. This is a use-case not at all caring about latency and the fact the G1 is still performing this good, also points towards it being a suitable default even for small environments. I've also played around a bit with restricting the amount of concurrent work done with G1, to see how a G1 STW-only mode would perform, and on a single CPU system this looks beneficial when we start to run with more worker threads. But I don't suspect it's that common to run small cloud services at 100% load, so having a default that can do concurrent work seems reasonable. Thanks, Stefan On 2025-03-18 00:59, Monica Beckwith wrote: > Hi Thomas, Erik, and all, > > This is an important and timely discussion, and I appreciate the > insights on how the gap between SerialGC and G1GC has diminished over > time. Based on recent comparative tests of out-of-the-box GC > configurations (-Xmx only), I wanted to share some data-backed > observations that might help validate this shift. > > I tested G1GC and SerialGC under 1-core/2GB and 2-core/2GB > containerized environments (512MB < -Xmx <1.5GB), running SPECJBB2015 > with and without stress tests. The key findings: > > *Throughput (max_jOPS & critical_jOPS):* > > * > G1GC consistently outperforms SerialGC. > * > 1 core: G1GC shows a 1.78? increase in max_jOPS. > * > 2 cores: G1GC shows a 2.84? improvement over SerialGC. > > > *Latency and Stop-the-World (STW) Impact:* > > * > SerialGC struggles under stress, with frequent full GCs leading to > long pauses. > * > G1GC?s incremental?collections keep pause times lower, especially > under stress load. > * > critical_jOPS, a key SLA metric, is 4.5? higher for G1GC on 2 cores. > > > *Memory Behavior & Stability:* > > * > In 512MB heap configurations, SerialGC encountered OOM failures > due to heap exhaustion. > > > Given these results, it seems reasonable to reconsider why SerialGC > remains the default in small environments when G1GC offers clear > performance and stability advantages. > > Looking forward to thoughts on this. > > Best, > Monica > > P.S.: I haven?t tested for <512MB heaps yet, as that requires a > different test config I?m still working on. I?d also love to hear from > anyone running single-threaded, CPU-bound workloads if they have > observations to share. > > > ------------------------------------------------------------------------ > *From:*?hotspot-gc-dev on behalf of > Thomas Schatzl > *Sent:*?Monday, February 24, 2025 2:33 AM > *To:*?Erik Osterlund > *Cc:* hotspot-gc-dev at openjdk.org > *Subject:*?[EXTERNAL] Re: RFC: G1 as default collector (for real this > time) > Hi, > > On 21.02.25 15:02, Erik Osterlund wrote: > > Hi Thomas, > > > [...]> There is however a flip side for that argument on the other side > of the scaling spectrum, where ZGC is probably a better fit on the even > larger scale. So while it?s true that the effect of a Serial -> G1 > default change is a static default GC, I just think we should mind the > fact that there is more uncertainty on the larger end of the scale. I?m > not proposing any changes, just saying that maybe we should be careful > about stressing the importance of having a static default GC, if we > don?t know if that is the better strategy on the larger end of the scale > or not, going forward. > > +1 > > Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Tue Apr 1 16:09:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 1 Apr 2025 16:09:20 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:40:09 GMT, Thomas Schatzl wrote: >> Man Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Revise test summary > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2066: > >> 2064: size_t G1CollectedHeap::soft_max_capacity() const { >> 2065: return clamp(align_up(SoftMaxHeapSize, HeapAlignment), MinHeapSize, >> 2066: max_capacity()); > > Maybe this clamping of `SoftMaxHeapSize` should be part of argument processing. Ignore this - `SoftMaxHeapsize` is managable after all. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2023162750 From wkemper at openjdk.org Tue Apr 1 18:21:12 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 1 Apr 2025 18:21:12 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:17:51 GMT, Kelvin Nilsen wrote: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 78: > 76: _live_data(0), > 77: _critical_pins(0), > 78: _mixed_candidate_garbage_words(0), Do we need a new field to track this? During `final_mark`, we call `increase_live_data_alloc_words` to add `TAMS + top` to `_live_data` to account for objects allocated during mark. Could we "fix" `get_live_data` so that it always returned marked objects (counted by `increase_live_data_gc_words`) _plus_ `top - TAMS`. This way, the live data would not become stale after `final_mark` and we wouldn't have another field to manage. What do you think? src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 159: > 157: > 158: inline size_t ShenandoahHeapRegion::get_mixed_candidate_live_data_bytes() const { > 159: assert(SafepointSynchronize::is_at_safepoint(), "Should be at Shenandoah safepoint"); Could we use `shenandoah_assert_safepoint` here (and other places) instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/24319#pullrequestreview-2733584314 PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2023461623 PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2023396124 From manc at openjdk.org Tue Apr 1 20:54:36 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 1 Apr 2025 20:54:36 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v6] In-Reply-To: References: Message-ID: <3tPGLO7tcSAMgLFlLTlQCXWZ1Dvlk4xInkqdxoYTxwM=.5b8740c2-8ed3-4387-8a50-325007ed027e@github.com> > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Address comments and try fixing test failure on macos-aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/68f03cad..0bc55654 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=04-05 Stats: 12 lines in 3 files changed: 2 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From manc at openjdk.org Tue Apr 1 20:54:37 2025 From: manc at openjdk.org (Man Cao) Date: Tue, 1 Apr 2025 20:54:37 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:48:53 GMT, Thomas Schatzl wrote: >> Man Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Revise test summary > > src/hotspot/share/gc/g1/g1IHOPControl.cpp line 119: > >> 117: return (size_t)MIN2( >> 118: G1CollectedHeap::heap()->soft_max_capacity() * (100.0 - safe_total_heap_percentage) / 100.0, >> 119: _target_occupancy * (100.0 - _heap_waste_percent) / 100.0 > > This looks wrong. G1ReservePercent is in some way similar to soft max heap size, intended to keep the target below the real maximum capacity. > I.e. it is not intended that G1 keeps another reserve of G1ReservePercent size below soft max capacity (which is below maximum capacity). > > There has been some internal discussion about whether the functionality of G1ReservePercent and SoftMaxHeapSize is too similar to warrant the former, but removing it is another issue. > > Imo, SoftMaxHeapSize should be an separate, actual target for this calculation. (`default_conc_mark_start_threshold()` also does not subtract `G1ReservePercent` from `SoftMaxHeapSize`). Thanks. Yes, that makes sense. Now it uses `MIN3` to take `soft_max_capacity()` as a separate constraint. > test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java line 46: > >> 44: private static final long ALLOCATED_BYTES = 20_000_000; // About 20M >> 45: private static final long MAX_HEAP_SIZE = >> 46: 200 * 1024 * 1024; // 200MiB, must match -Xmx on command line. > > Is it possible to get that value from the `MemoryMXBean` instead of relying on manual update? I.e. `getMax()`? Yes, it is a good idea. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2023659889 PR Review Comment: https://git.openjdk.org/jdk/pull/24211#discussion_r2023660401 From wkemper at openjdk.org Tue Apr 1 22:27:07 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 1 Apr 2025 22:27:07 GMT Subject: RFR: 8351892: GenShen: Remove enforcement of generation sizes [v2] In-Reply-To: References: <-BEi4FpPLjKx07-J7ix9fHkKVhkcYylA0ojI-a1zrJs=.a3c073d3-7e52-46fd-8e2a-1ea601bd2074@github.com> Message-ID: On Sat, 29 Mar 2025 00:08:06 GMT, Kelvin Nilsen wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Don't let old have the entire heap > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalFullGC.cpp line 120: > >> 118: if (old_capacity > old_usage) { >> 119: size_t excess_old_regions = (old_capacity - old_usage) / ShenandoahHeapRegion::region_size_bytes(); >> 120: gen_heap->transfer_to_young(excess_old_regions); > > should we assert result is successful? Or replace with force_transfer? (just seems bad practice to ignore a status result) Yes, will try an assert here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24268#discussion_r2023754542 From wkemper at openjdk.org Tue Apr 1 22:44:35 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 1 Apr 2025 22:44:35 GMT Subject: RFR: 8351892: GenShen: Remove enforcement of generation sizes [v3] In-Reply-To: References: Message-ID: > * The option to configure minimum and maximum sizes for the young generation have been combined into `ShenandoahInitYoungPercentage`. > * The remaining functionality in `shGenerationSizer` wasn't enough to warrant being its own class, so the functionality was rolled into `shGenerationalHeap`. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Simplify confusing (and confused) comment - Assert that region transfers succeed when expected - Merge remote-tracking branch 'jdk/master' into stop-enforcing-gen-size-limits - Don't let old have the entire heap - Stop enforcing young/old generation sizes. Move what's left of generation sizing logic into shGenerationalHeap. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24268/files - new: https://git.openjdk.org/jdk/pull/24268/files/bc171089..33a2f19d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24268&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24268&range=01-02 Stats: 18299 lines in 378 files changed: 10486 ins; 6499 del; 1314 mod Patch: https://git.openjdk.org/jdk/pull/24268.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24268/head:pull/24268 PR: https://git.openjdk.org/jdk/pull/24268 From wkemper at openjdk.org Tue Apr 1 22:44:36 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 1 Apr 2025 22:44:36 GMT Subject: RFR: 8351892: GenShen: Remove enforcement of generation sizes [v2] In-Reply-To: References: <-BEi4FpPLjKx07-J7ix9fHkKVhkcYylA0ojI-a1zrJs=.a3c073d3-7e52-46fd-8e2a-1ea601bd2074@github.com> Message-ID: On Sat, 29 Mar 2025 00:10:28 GMT, Kelvin Nilsen wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Don't let old have the entire heap > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalHeap.cpp line 134: > >> 132: ShenandoahHeap::initialize_heuristics(); >> 133: >> 134: // Max capacity is the maximum _allowed_ capacity. This means the sum of the maximum > > I don't understand the relevance of this comment. Is there still a mximum allowed for old and a maximum allowed for young? This comment stemmed from own confusion over fields and variables called _max_ `capacity` . I would like to rename the `_max_capacity` field to just `_capacity`. In my mind, the _max_ should be immutable, but that isn't how Shenandoah uses this field. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24268#discussion_r2023766431 From jsikstro at openjdk.org Wed Apr 2 06:57:22 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 2 Apr 2025 06:57:22 GMT Subject: RFR: 8353471: ZGC: Redundant generation id in ZGeneration Message-ID: The ZGeneration class (and in turn ZGenerationOld and ZGenerationYoung) keeps track of its own ZGenerationId, which means that the generation id does not need to be passed along as an argument when calling internal functions. I've removed the id parameter from `ZGeneration::select_relocation_set` in favor of using the member variable `_id`. ------------- Commit messages: - 8353471: ZGC: Redundant generation id in ZGeneration Changes: https://git.openjdk.org/jdk/pull/24374/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24374&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353471 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24374.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24374/head:pull/24374 PR: https://git.openjdk.org/jdk/pull/24374 From stefank at openjdk.org Wed Apr 2 07:06:13 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 07:06:13 GMT Subject: RFR: 8353471: ZGC: Redundant generation id in ZGeneration In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:52:49 GMT, Joel Sikstr?m wrote: > The ZGeneration class (and in turn ZGenerationOld and ZGenerationYoung) keeps track of its own ZGenerationId, which means that the generation id does not need to be passed along as an argument when calling internal functions. > > I've removed the id parameter from `ZGeneration::select_relocation_set` in favor of using the member variable `_id`. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24374#pullrequestreview-2734854851 From eosterlund at openjdk.org Wed Apr 2 10:01:34 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 2 Apr 2025 10:01:34 GMT Subject: RFR: 8353471: ZGC: Redundant generation id in ZGeneration In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:52:49 GMT, Joel Sikstr?m wrote: > The ZGeneration class (and in turn ZGenerationOld and ZGenerationYoung) keeps track of its own ZGenerationId, which means that the generation id does not need to be passed along as an argument when calling internal functions. > > I've removed the id parameter from `ZGeneration::select_relocation_set` in favor of using the member variable `_id`. Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24374#pullrequestreview-2735717264 From ayang at openjdk.org Wed Apr 2 10:15:48 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 2 Apr 2025 10:15:48 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure In-Reply-To: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: On Tue, 25 Mar 2025 10:35:58 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). > > This has been made possible with the refactoring of object array task queues. > > At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). > > Testing: tier1-5, some perf testing with no differences > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24222#pullrequestreview-2735758122 From stefank at openjdk.org Wed Apr 2 11:15:01 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 11:15:01 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 [v2] In-Reply-To: References: Message-ID: > We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). > > The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. > > Thanks to @plummercj for digging into this and proposing the same workaround. > > Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Remove test from ProblemList ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24349/files - new: https://git.openjdk.org/jdk/pull/24349/files/8db3f6d0..fe07a340 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24349&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24349&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24349/head:pull/24349 PR: https://git.openjdk.org/jdk/pull/24349 From stefank at openjdk.org Wed Apr 2 11:15:02 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 11:15:02 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 06:58:56 GMT, Stefan Karlsson wrote: > We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). > > The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. > > Thanks to @plummercj for digging into this and proposing the same workaround. > > Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline I've removed the test and will run tier1-tier3. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24349#issuecomment-2772225278 From stefank at openjdk.org Wed Apr 2 11:47:53 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 11:47:53 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: <4Uyw00r7p9C-1BSfQRNEQ0p5td8RylD7YVLOHj6HODM=.47100abf-8467-4b47-9edb-c30877152c56@github.com> On Wed, 2 Apr 2025 11:35:36 GMT, Stefan Karlsson wrote: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. I moved this PR from hotspot to hotspot-gc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2772303530 From eosterlund at openjdk.org Wed Apr 2 11:58:57 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 2 Apr 2025 11:58:57 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:35:36 GMT, Stefan Karlsson wrote: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24377#pullrequestreview-2736002080 From tschatzl at openjdk.org Wed Apr 2 13:04:08 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 2 Apr 2025 13:04:08 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v5] In-Reply-To: References: Message-ID: <0xr7VMlEH9EAc8XB9HQKPdxOHUcLfwtZkNAkGrTPu_k=.72d5e5be-373f-4db2-bbfb-9026c82e3c94@github.com> On Tue, 1 Apr 2025 20:57:36 GMT, Man Cao wrote: > > With the changes to `young_collection_expansion_amount()`, once we reach the `SoftMaxHeapSize`, we cannot expand the heap except during GC where expansion can happen without regard for `SoftMaxHeapSize`. Thus, after exceeding `SoftMaxHeapSize` we go into a phase of repeated GCs where we expand the heap almost one region at a time. Is this the expected effect of the `SoftMaxHeapSize` as implemented by this patch? > > Yes. This is the expected behavior if user sets `SoftMaxHeapSize` too small. G1 will try its best to respect `SoftMaxHeapSize`, which could cause GC thrashing. However, it won't cause `OutOfMemoryError`. This problem is due to user's misconfiguration of `SoftMaxHeapSize`, which is similar to user misconfiguring `Xmx` to be too small. The original patch on the CR only set the guidance for the marking. It did not interact with heap sizing directly at all like the change does. What is the reason for this change? (Iirc, in tests long time ago, with that original patch, and also adapting `Min/MaxHeapFreeRatio`, did result the desired effect of G1/`SoftMaxHeapSize` decreasing the heap appropriately. Without it, the heap will almost never change, but that is expected how `Mindoes not work). So similar to @walulyai I would strongly prefer for `SoftMaxHeapSize` not interfere that much with the application's performance. To me, this behavior is not "soft", and there seems to be general consensus internally about allowing unbounded cpu usage for GC. Afaiu in ZGC, if heap grows beyond `SoftMaxHeapSize`, GC activity can grow up to 25% of cpu usage (basically maxing out concurrent threads). That could be a reasonable guidance as well here. GC thrashing will also prevent progress with marking, and actually cause more marking because of objects not having enough time to die. This just makes the situation worse until the heap gets scaled back to `SoftMaxHeapSize`. However at the moment, changing the GC activity threshold internally will not automatically shrink the heap as you would expect, since currently shrinking is controlled by marking using the `Min/MaxHeapFreeRatio` flags. That gets us back to (JDK-8238687)[https://bugs.openjdk.org/browse/JDK-8238687] and (JDK-8248324)[https://bugs.openjdk.org/browse/JDK-8248324]... @walulyai is currently working on the former issue again, testing it, maybe you two could together on that to see whether basing this work on what @walulyai is cooking up is a better way forward, if needed modifying `gctimeratio` if we are above `SoftMaxHeapSize`? Otherwise, if there really is need to get this functionality asap, even only making it a guide for the marking should at least give some effect (but I think without changing `Min/MaxHeapFreeRatio` at the same time there is not much effect anyway). But that is a fairly coarse and indirect way of getting the necessary effect to shrink the heap. We should not limit ourselves to what mainline provides at the moment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2772493942 From tschatzl at openjdk.org Wed Apr 2 13:04:11 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 2 Apr 2025 13:04:11 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v6] In-Reply-To: <3tPGLO7tcSAMgLFlLTlQCXWZ1Dvlk4xInkqdxoYTxwM=.5b8740c2-8ed3-4387-8a50-325007ed027e@github.com> References: <3tPGLO7tcSAMgLFlLTlQCXWZ1Dvlk4xInkqdxoYTxwM=.5b8740c2-8ed3-4387-8a50-325007ed027e@github.com> Message-ID: On Tue, 1 Apr 2025 20:54:36 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Address comments and try fixing test failure on macos-aarch64 There also seems to be a concurrency issue with reading the `SoftMaxHeapSize` variable: Since the flag is manageable, at least outside of safepoints (afaict `jcmd` is blocked by safepoints, but I'll ask), the variable can be written to it at any time. So e.g. the assignment of `G1IHOPControl::get_conc_mark_start_threshold` to `marking_initiating_used_threshold` in that call can be inlined in `G1Policy::need_to_start_conc_mark` (called by the mutator in `G1CollectedHeap::attempt_allocation_humongous`) in multiple places, and so `SoftMaxHeapSize` re-read with multiple different values in that method. Probably an `Atomic::load(&SoftMaxHeapSize)` in the getter is sufficient for that. The other multiple re-readings of the `soft_max_capacity()` in the safepoint seem okay - I do not think there is a way to update the value within a safepoint externally. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2772496003 From zgu at openjdk.org Wed Apr 2 13:24:56 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 2 Apr 2025 13:24:56 GMT Subject: RFR: 8353263: Parallel: Remove locking in PSOldGen::resize In-Reply-To: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> References: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> Message-ID: On Mon, 31 Mar 2025 09:45:23 GMT, Albert Mingkun Yang wrote: > Simple removing the use of `PSOldGenExpand_lock` in resizing logic after full-gc, because the calling context is inside a safepoint. > > Test: tier1-5 LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24322#pullrequestreview-2736263356 From stuefe at openjdk.org Wed Apr 2 13:30:51 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 2 Apr 2025 13:30:51 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:35:36 GMT, Stefan Karlsson wrote: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. Okay. Curious, was this a day zero problem? Incidentally, I remember that we had a problem with NUMA on windows where we only released the first NUMA stripe, leaving the other stripes around for future commits to trip over. But ZGC is probably not affected by that, since it does not use os::reserve/release_memory, right? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24377#pullrequestreview-2736284463 From stefank at openjdk.org Wed Apr 2 14:06:06 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 14:06:06 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 13:28:37 GMT, Thomas Stuefe wrote: > Okay. > > Curious, was this a day zero problem? I think it was. For completeness, this is the unreserve paths you need to hit to hit this bug: bool XVirtualMemoryManager::reserve_contiguous(uintptr_t start, size_t size) { assert(is_aligned(size, XGranuleSize), "Must be granule aligned"); // Reserve address views const uintptr_t marked0 = XAddress::marked0(start); const uintptr_t marked1 = XAddress::marked1(start); const uintptr_t remapped = XAddress::remapped(start); // Reserve address space if (!pd_reserve(marked0, size)) { return false; } if (!pd_reserve(marked1, size)) { pd_unreserve(marked0, size); return false; } if (!pd_reserve(remapped, size)) { pd_unreserve(marked0, size); pd_unreserve(marked1, size); return false; } // Register address views with native memory tracker nmt_reserve(marked0, size); nmt_reserve(marked1, size); nmt_reserve(remapped, size); // Make the address range free _manager.free(start, size); return true; } > > Incidentally, I remember that we had a problem with NUMA on windows where we only released the first NUMA stripe, leaving the other stripes around for future commits to trip over. But ZGC is probably not affected by that, since it does not use os::reserve/release_memory, right? It doesn't sound like ZGC would be affected by that. At least not via those APIs. FWIW, I've identified another corner-case bug on Windows that only happens if we end up allocating discontiguous heaps, which only every happens if all our attempts to allocate a contiguous heap fails. I'm in the process of trying to write a test showing this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2772671014 From ayang at openjdk.org Wed Apr 2 14:22:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 2 Apr 2025 14:22:55 GMT Subject: RFR: 8353263: Parallel: Remove locking in PSOldGen::resize In-Reply-To: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> References: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> Message-ID: On Mon, 31 Mar 2025 09:45:23 GMT, Albert Mingkun Yang wrote: > Simple removing the use of `PSOldGenExpand_lock` in resizing logic after full-gc, because the calling context is inside a safepoint. > > Test: tier1-5 Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24322#issuecomment-2772714981 From ayang at openjdk.org Wed Apr 2 14:22:56 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 2 Apr 2025 14:22:56 GMT Subject: Integrated: 8353263: Parallel: Remove locking in PSOldGen::resize In-Reply-To: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> References: <4QpvbYEywkzocWXFBkda0ymp3cdpp6PNNTylVqUFXig=.7ee05cda-222a-421c-b09c-1519dfea7bf1@github.com> Message-ID: On Mon, 31 Mar 2025 09:45:23 GMT, Albert Mingkun Yang wrote: > Simple removing the use of `PSOldGenExpand_lock` in resizing logic after full-gc, because the calling context is inside a safepoint. > > Test: tier1-5 This pull request has now been integrated. Changeset: a0677d94 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/a0677d94d8c83a75cee054700e098faa97edca3c Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod 8353263: Parallel: Remove locking in PSOldGen::resize Reviewed-by: tschatzl, zgu ------------- PR: https://git.openjdk.org/jdk/pull/24322 From iwalulya at openjdk.org Wed Apr 2 15:12:02 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 2 Apr 2025 15:12:02 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure In-Reply-To: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: On Tue, 25 Mar 2025 10:35:58 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). > > This has been made possible with the refactoring of object array task queues. > > At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). > > Testing: tier1-5, some perf testing with no differences > > Thanks, > Thomas Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24222#pullrequestreview-2736649318 From manc at openjdk.org Wed Apr 2 16:00:33 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 2 Apr 2025 16:00:33 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v7] In-Reply-To: References: Message-ID: > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Fix test failure on macos-aarch64 by using power-of-two sizes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/0bc55654..4435e89f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=05-06 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From stuefe at openjdk.org Wed Apr 2 16:16:06 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 2 Apr 2025 16:16:06 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: <639NoIyfKt-nwS-Pn2ia-83bQUjAykMzL0YKd8rSO7I=.8973dd8d-686c-42a5-95b5-443ca005ad4f@github.com> On Wed, 2 Apr 2025 14:03:36 GMT, Stefan Karlsson wrote: >> Okay. >> Curious, was this a day zero problem? > I think it was. For completeness, this is the unreserve paths you need to hit to hit this bug: Ah okay, this is probably rare. I wondered whether it affects the unmapper path. Because AFAIU, that would have led to out-of-address space at some point with high probability. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2773087416 From kdnilsen at openjdk.org Wed Apr 2 17:49:49 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 2 Apr 2025 17:49:49 GMT Subject: RFR: 8352181: Shenandoah: Evacuate thread roots after early cleanup In-Reply-To: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> References: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> Message-ID: On Mon, 17 Mar 2025 21:37:14 GMT, William Kemper wrote: > Moving the evacuation of thread roots after early cleanup allows Shenandoah to recycle immediate garbage a bit sooner in the cycle. Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24090#pullrequestreview-2737095478 From kdnilsen at openjdk.org Wed Apr 2 17:55:48 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 2 Apr 2025 17:55:48 GMT Subject: RFR: 8352181: Shenandoah: Evacuate thread roots after early cleanup In-Reply-To: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> References: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> Message-ID: On Mon, 17 Mar 2025 21:37:14 GMT, William Kemper wrote: > Moving the evacuation of thread roots after early cleanup allows Shenandoah to recycle immediate garbage a bit sooner in the cycle. Maybe the "best" tradeoff is "adaptive behavior". If allocatable memory is in "short supply", we should evacuate thread roots early. Otherwise, we should preserve existing behavior. Defining "short supply" might be a bit tricky. There's a related PR that is still in development, to surge GC worker threads when we are at risk of experiencing allocation failures. A lot of heuristic predictions feed into the decision of when and whether to surge. We could use that same feedback mechanism here. If we are under "worker surge" conditions, that suggests memory is in short supply, an this is the ideal time to shift some of the GC work onto the mutators, so this is when we should evacuate thread roots early. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24090#issuecomment-2773302505 From jsikstro at openjdk.org Wed Apr 2 18:20:00 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 2 Apr 2025 18:20:00 GMT Subject: RFR: 8353559: Restructure CollectedHeap error printing Message-ID: Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. The old and new printing orders are shown below for ZGC: # Old # New Testing: * GHA * Tiers 1 & 2 * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt ------------- Commit messages: - Copyright years - 8353559: Restructure CollectedHeap error printing Changes: https://git.openjdk.org/jdk/pull/24387/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24387&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353559 Stats: 141 lines in 16 files changed: 75 ins; 52 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24387.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24387/head:pull/24387 PR: https://git.openjdk.org/jdk/pull/24387 From jsikstro at openjdk.org Wed Apr 2 18:40:57 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 2 Apr 2025 18:40:57 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:35:36 GMT, Stefan Karlsson wrote: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. Should `_has_unreserved` and `test_unreserve` become be static like the other member variables and test methods? ------------- PR Review: https://git.openjdk.org/jdk/pull/24377#pullrequestreview-2737227639 From stefank at openjdk.org Wed Apr 2 20:16:59 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 20:16:59 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:38:34 GMT, Joel Sikstr?m wrote: > Should `_has_unreserved` and `test_unreserve` become be static like the other member variables and test methods? I'll look into that tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2773620954 From stefank at openjdk.org Wed Apr 2 20:16:58 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 20:16:58 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: <639NoIyfKt-nwS-Pn2ia-83bQUjAykMzL0YKd8rSO7I=.8973dd8d-686c-42a5-95b5-443ca005ad4f@github.com> References: <639NoIyfKt-nwS-Pn2ia-83bQUjAykMzL0YKd8rSO7I=.8973dd8d-686c-42a5-95b5-443ca005ad4f@github.com> Message-ID: On Wed, 2 Apr 2025 16:13:30 GMT, Thomas Stuefe wrote: > > > Okay. > > > > Curious, was this a day zero problem? > > > I think it was. For completeness, this is the unreserve paths you need to hit to hit this bug: > > Ah okay, this is probably rare. I wondered whether it affects the unmapper path. The unmapper converts the mapped memory (virtual to the physical memory) to be just reserved memory (but using Window's placeholder mechanism). So, the memory is not unreserved by the unmapper. I hope this makes sense. > Because AFAIU, that would have led to out-of-address space at some point with high probability. If you try to call this faulty unreserve implementation then the JVM will immediately shut down. So, I don't think this bug will cause and address-space leak. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2773620289 From stefank at openjdk.org Wed Apr 2 20:53:48 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 20:53:48 GMT Subject: RFR: 8353559: Restructure CollectedHeap error printing In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:09:12 GMT, Joel Sikstr?m wrote: > Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. > > To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. > > Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. > > To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. > > The old and new printing orders are shown below for ZGC: > > # Old > > > > > > > > > > # New > > > > > > > > Testing: > * GHA > * Tiers 1 & 2 > * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. > > ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt > ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24387#pullrequestreview-2737551377 From manc at openjdk.org Thu Apr 3 06:29:49 2025 From: manc at openjdk.org (Man Cao) Date: Thu, 3 Apr 2025 06:29:49 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 16:00:33 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix test failure on macos-aarch64 by using power-of-two sizes. Re [Thomas' comment](#issuecomment-2772493942): > The original patch on the CR only set the guidance for the marking. It did not interact with heap sizing directly at all like the change does. What is the reason for this change? Because without changing heap sizing directly, setting `SoftMaxHeapSize` alone is ineffective to shrink the heap in most cases. E.g., the included test `test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java` will fail. For other concerns, I think one fundamental issue is the precedence of heap sizing flags: should the JVM respect `SoftMaxHeapSize` over `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio`? My preference is yes, that `SoftMaxHeapSize` should have higher precedence, for the following reasons: 1. Users that set `SoftMaxHeapSize` expect it to be effective to limit heap size. The JVM should do its best to respect user's request. As [JDK-8222181](https://bugs.openjdk.org/browse/JDK-8222181) mentions: "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so." We might interpret "GC decides it's necessary" differently. I think the real necessary case is "the JVM will throw OutOfMemoryError if it does not grow the heap", instead of "the JVM will violate `MinHeapFreeRatio`/`MaxHeapFreeRatio`/`GCTimeRatio` if it does not grow the heap". 1. Having a single flag that makes G1 shrink heap more aggressively, is much more user-friendly than requiring users to tune 3 or more flags to achieve the same effect. As you mentioned, if `SoftMaxHeapSize` only guides marking, user has to also tune `MinHeapFreeRatio`/`MaxHeapFreeRatio` to make G1 shrink more aggressively. It is difficult to figure out a proper value for each flag. Moreover, if user wants to make G1 shrink to a specific heap size, it is a lot harder to achieve that through tuning `MinHeapFreeRatio`/`MaxHeapFreeRatio`. 1. Issues with expansion after young collections from `GCTimeRatio`. `MinHeapFreeRatio`/`MaxHeapFreeRatio` have no effect on how much G1 expands the heap after young collections. Users need to tune `GCTimeRatio` if they want to make G1 expand less aggressively, otherwise aggressive expansion would defeat the purpose of `SoftMaxHeapSize`. However, `GCTimeRatio` is not a manageable flag, so it cannot be changed at run time. If `SoftMaxHeapSize` has precedence, we don't need to bother making `GCTimeRatio` manageable and asking users to tune it at run time. (This is somewhat related to [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) and [email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051004.html). ) > So similar to @walulyai I would strongly prefer for SoftMaxHeapSize not interfere that much with the application's performance. If user sets a too small `SoftMaxHeapSize` and causes performance regression or GC thrashing, it is really user's misconfiguration, and they should take measures to adjust `SoftMaxHeapSize` based on workload. Also misconfiguring `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio` could cause similar regressions (think of `-XX:GCTimeRatio=1 -XX:MinHeapFreeRatio=1 -XX:MaxHeapFreeRatio=1`). However, I can see that `SoftMaxHeapSize` may be easier to misconfigure than the other 3 flags, because it does not adapt to changing live size by itself. I wonder if we could try reaching a middle ground (perhaps this is also what you suggests with ZGC's example of growing up to 25% of cpu usage?): - `SoftMaxHeapSize` still takes higher precedence over `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio`. - G1 could have an internal mechanism to detect GC thrashing, and expands heap above `SoftMaxHeapSize` if thrashing happens. > That gets us back to [JDK-8238687](https://bugs.openjdk.org/browse/JDK-8238687) and [JDK-8248324](https://bugs.openjdk.org/browse/JDK-8248324)... Yes, fixing these two issues would be great regardless of `SoftMaxHeapSize`. However, they do not address the 3 issues above about flag precedence. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2774619383 From manc at openjdk.org Thu Apr 3 07:08:19 2025 From: manc at openjdk.org (Man Cao) Date: Thu, 3 Apr 2025 07:08:19 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Use Atomic::load for flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/4435e89f..c60ade41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From manc at openjdk.org Thu Apr 3 07:30:51 2025 From: manc at openjdk.org (Man Cao) Date: Thu, 3 Apr 2025 07:30:51 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag Re: concurrency issue with reading `SoftMaxHeapSize` I updated to `Atomic::load()`, but not sure if I understand the concern correctly. > So e.g. the assignment of `G1IHOPControl::get_conc_mark_start_threshold` to `marking_initiating_used_threshold` in that call can be inlined in `G1Policy::need_to_start_conc_mark` (called by the mutator in `G1CollectedHeap::attempt_allocation_humongous`) in multiple places, and so `SoftMaxHeapSize` re-read with multiple different values in that method. I don't see where the re-read is. I think in any code path from `G1IHOPControl::get_conc_mark_start_threshold`, `G1CollectedHeap::heap()->soft_max_capacity()` is called only once. `G1CollectedHeap::attempt_allocation_humongous` also appears to call `G1Policy::need_to_start_conc_mark` only once, which calls `G1IHOPControl::get_conc_mark_start_threshold` only once. I agree it is a data race if `soft_max_capacity()` runs outside of a safepoint, so `Atomic::load()` makes sense regardless. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2774731515 From iwalulya at openjdk.org Thu Apr 3 08:11:00 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 3 Apr 2025 08:11:00 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:27:22 GMT, Man Cao wrote: > 1. Users that set `SoftMaxHeapSize` expect it to be effective to limit heap size. The JVM should do its best to respect user's request. As [JDK-8222181](https://bugs.openjdk.org/browse/JDK-8222181) mentions: "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so." We might interpret "GC decides it's necessary" differently. I think the real necessary case is "the JVM will throw OutOfMemoryError if it does not grow the heap", instead of "the JVM will violate `MinHeapFreeRatio`/`MaxHeapFreeRatio`/`GCTimeRatio` if it does not grow the heap". In the current approach, it is not that we are respecting the user's request, we are violating the request just that we do this only during GCs. So eventually you have back to back GCs that will expand the heap to whatever heapsize the application requires. My interpretation of `SoftMaxHeapSize` is that we can meet this limit where possible, but also exceed the limit if required. So I propose we take the same approach as used in other GCs where `SoftMaxHeapSize` is used as a parameter for setting GC pressure but not as a limit to allocations. > > 3. Issues with expansion after young collections from `GCTimeRatio`. `MinHeapFreeRatio`/`MaxHeapFreeRatio` have no effect on how much G1 expands the heap after young collections. Users need to tune `GCTimeRatio` if they want to make G1 expand less aggressively, otherwise aggressive expansion would defeat the purpose of `SoftMaxHeapSize`. However, `GCTimeRatio` is not a manageable flag, so it cannot be changed at run time. If `SoftMaxHeapSize` has precedence, we don't need to bother making `GCTimeRatio` manageable and asking users to tune it at run time. (This is somewhat related to [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) and [email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051004.html). ) Agreed, these ratios are problematic, and we should find a solution that removes them. We also need to agree on the purpose of `SoftMaxHeapSize`, my understanding is that `SoftMaxHeapSize` is meant for the application to be handle spikes in allocations and and quickly release the memory if no longer required. If `SoftMaxHeapSize` has precedence over`GCTimeRatio`, then G1 is changing the objective from balancing latency and throughput to optimizing for memory usage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2774824745 From tschatzl at openjdk.org Thu Apr 3 08:34:00 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 3 Apr 2025 08:34:00 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:28:13 GMT, Man Cao wrote: > Re: concurrency issue with reading `SoftMaxHeapSize` > > I updated to `Atomic::load()`, but not sure if I understand the concern correctly. > > > So e.g. the assignment of `G1IHOPControl::get_conc_mark_start_threshold` to `marking_initiating_used_threshold` in that call can be inlined in `G1Policy::need_to_start_conc_mark` (called by the mutator in `G1CollectedHeap::attempt_allocation_humongous`) in multiple places, and so `SoftMaxHeapSize` re-read with multiple different values in that method. > > I don't see where the re-read is. I think in any code path from `G1IHOPControl::get_conc_mark_start_threshold`, `G1CollectedHeap::heap()->soft_max_capacity()` is called only once. `G1CollectedHeap::attempt_allocation_humongous` also appears to call `G1Policy::need_to_start_conc_mark` only once, which calls `G1IHOPControl::get_conc_mark_start_threshold` only once. > > I agree it is a data race if `soft_max_capacity()` runs outside of a safepoint, so `Atomic::load()` makes sense regardless. The compiler could be(*) free to call `get_conc_mark_start_threshold()` again in any of the uses of the local variable without telling it that one of its components may change between re-reads. (*) Probably not after looking again, given that it's not marked as `const` (not sure why), and a virtual method, and fairly large. The situation would be much worse if somehow `SoftMaxHeapsize` could be changed within a safepoint. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2774885501 From stefank at openjdk.org Thu Apr 3 09:32:12 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Apr 2025 09:32:12 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken [v2] In-Reply-To: References: Message-ID: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8353264_zgc_unreserve - Make addtions static - 8353264: ZGC: Windows heap unreserving is broken ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24377/files - new: https://git.openjdk.org/jdk/pull/24377/files/7e2861b2..bbf83831 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24377&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24377&range=00-01 Stats: 11266 lines in 447 files changed: 7600 ins; 2558 del; 1108 mod Patch: https://git.openjdk.org/jdk/pull/24377.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24377/head:pull/24377 PR: https://git.openjdk.org/jdk/pull/24377 From jsikstro at openjdk.org Thu Apr 3 09:32:12 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 3 Apr 2025 09:32:12 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken [v2] In-Reply-To: References: Message-ID: <-jYFzlEXm9kiqtULRVQFRP1UcAfb_Yscb8s7AelLI98=.b68fb9ed-1a28-4437-8658-40087c134800@github.com> On Thu, 3 Apr 2025 09:29:08 GMT, Stefan Karlsson wrote: >> During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: >> >> If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) >> >> >> Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. >> >> In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. >> >> I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8353264_zgc_unreserve > - Make addtions static > - 8353264: ZGC: Windows heap unreserving is broken Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24377#pullrequestreview-2739122546 From eosterlund at openjdk.org Thu Apr 3 09:53:53 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 3 Apr 2025 09:53:53 GMT Subject: RFR: 8353559: Restructure CollectedHeap error printing In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:09:12 GMT, Joel Sikstr?m wrote: > Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. > > To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. > > Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. > > To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. > > The old and new printing orders are shown below for ZGC: > > # Old > > > > > > > > > > # New > > > > > > > > Testing: > * GHA > * Tiers 1 & 2 > * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. > > ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt > ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24387#pullrequestreview-2739190388 From tschatzl at openjdk.org Thu Apr 3 10:01:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 3 Apr 2025 10:01:54 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag > Re [Thomas' comment](#issuecomment-2772493942): > > > The original patch on the CR only set the guidance for the marking. It did not interact with heap sizing directly at all like the change does. What is the reason for this change? > > Because without changing heap sizing directly, setting `SoftMaxHeapSize` alone is ineffective to shrink the heap in most cases. E.g., the included test `test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java` will fail. > > For other concerns, I think one fundamental issue is the precedence of heap sizing flags: should the JVM respect `SoftMaxHeapSize` over `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio`? My preference is yes, that `SoftMaxHeapSize` should have higher precedence, for the following reasons: > > 1. Users that set `SoftMaxHeapSize` expect it to be effective to limit heap size. The JVM should do its best to respect user's request. As [JDK-8222181](https://bugs.openjdk.org/browse/JDK-8222181) mentions: "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so." We might interpret "GC decides it's necessary" differently. I think the real necessary case is "the JVM will throw OutOfMemoryError if it does not grow the heap", instead of "the JVM will violate `MinHeapFreeRatio`/`MaxHeapFreeRatio`/`GCTimeRatio` if it does not grow the heap". > > 2. Having a single flag that makes G1 shrink heap more aggressively, is much more user-friendly than requiring users to tune 3 or more flags to achieve the same effect. As you mentioned, if `SoftMaxHeapSize` only guides marking, user has to also tune `MinHeapFreeRatio`/`MaxHeapFreeRatio` to make G1 shrink more aggressively. It is difficult to figure out a proper value for each flag. Moreover, if user wants to make G1 shrink to a specific heap size, it is a lot harder to achieve that through tuning `MinHeapFreeRatio`/`MaxHeapFreeRatio`. > > 3. Issues with expansion after young collections from `GCTimeRatio`. `MinHeapFreeRatio`/`MaxHeapFreeRatio` have no effect on how much G1 expands the heap after young collections. Users need to tune `GCTimeRatio` if they want to make G1 expand less aggressively, otherwise aggressive expansion would defeat the purpose of `SoftMaxHeapSize`. However, `GCTimeRatio` is not a manageable flag, so it cannot be changed at run time. If `SoftMaxHeapSize` has precedence, we don't need to bother making `GCTimeRatio` manageable and asking users to tune it at run time. (This is somewhat related to [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) and [email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051004.html). ) > > > > So similar to @walulyai I would strongly prefer for SoftMaxHeapSize not interfere that much with the application's performance. > > If user sets a too small `SoftMaxHeapSize` and causes performance regression or GC thrashing, it is really user's misconfiguration, and they should take measures to adjust `SoftMaxHeapSize` based on workload. Also misconfiguring `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio` could cause similar regressions (think of `-XX:GCTimeRatio=1 -XX:MinHeapFreeRatio=1 -XX:MaxHeapFreeRatio=1`). > > However, I can see that `SoftMaxHeapSize` may be easier to misconfigure than the other 3 flags, because it does not adapt to changing live size by itself. I wonder if we could try reaching a middle ground (perhaps this is also what you suggests with ZGC's example of growing up to 25% of cpu usage?): Exactly. > > * `SoftMaxHeapSize` still takes higher precedence over `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio`. > > * G1 could have an internal mechanism to detect GC thrashing, and expands heap above `SoftMaxHeapSize` if thrashing happens. > > > > That gets us back to [JDK-8238687](https://bugs.openjdk.org/browse/JDK-8238687) and [JDK-8248324](https://bugs.openjdk.org/browse/JDK-8248324)... > > Yes, fixing these two issues would be great regardless of `SoftMaxHeapSize`. However, they do not address the 3 issues above about flag precedence. * JDK-8248324 effectively removes the use of `Min/MaxHeapFreeRatio` (apart of full gc, which obviously they also need to be handled in some way that fits into the system). * JDK-8238687 makes `GCTimeRatio` shrink the heap too, obviating the need for `Min/MaxHeapFreeRatio`, which are currently the knobs that limit excessive memory usage. With no flag to interfere (no `Min/MaxHeapFreeRatio`) with each other, there is no need for considering their precedence. As you mention, there is need for some strategy to reconcile divergent goals - ultimately G1 needs a single value that tells it to resize the heap in which direction in which degree. Incidentally, the way `GCTimeRatio` (or actually the internal gc cpu usage target as an intermediate) is already in use fits these requirements. From that guiding value you can calculate a difference to desired, with some smoothing applied, which gives you both direction and degree of the change in heap size (applying some magic factors/constants). So it seems fairly straightforward to have any outside "memory pressure" effect this intermediate control value instead of everyone overriding each other in multiple places in the code. Now there is some question about the weights of these factors: we (in the gc team) prefer to keep G1's balancing between throughput and latency, particularly if the input this time is some value explicitly containing "soft" in its name. Using the 25% from ZGC as a max limit for gc cpu usage if we are (way) beyond what the user desires seems good enough for an initial guess. Not too high, guaranteeing some application progress in the worst case (for this factor!), not too low, guaranteeing that the intent of the user setting this value is respected. (One can see `Min/MaxHeapFreeRatio` as an old attempt to limit heap size growth without affecting performance too much, changing memory pressure. However they are hard to use. And they are completely dis-associated with the rest of the heap sizing mechanism. `SoftMaxHeapSize` is easier to handle) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2775155378 From stefank at openjdk.org Thu Apr 3 10:38:59 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Apr 2025 10:38:59 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:32:12 GMT, Stefan Karlsson wrote: >> During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: >> >> If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) >> >> >> Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. >> >> In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. >> >> I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8353264_zgc_unreserve > - Make addtions static > - 8353264: ZGC: Windows heap unreserving is broken Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24377#issuecomment-2775290032 From stefank at openjdk.org Thu Apr 3 10:48:01 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Apr 2025 10:48:01 GMT Subject: Integrated: 8353264: ZGC: Windows heap unreserving is broken In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:35:36 GMT, Stefan Karlsson wrote: > During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: > > If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) > > > Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. > > In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. > > I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. This pull request has now been integrated. Changeset: ffca4f2d Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/ffca4f2da84cb8711794d8e692d176a7e785e7b1 Stats: 27 lines in 2 files changed: 24 ins; 0 del; 3 mod 8353264: ZGC: Windows heap unreserving is broken Reviewed-by: jsikstro, aboldtch, eosterlund, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/24377 From aboldtch at openjdk.org Thu Apr 3 10:48:01 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 3 Apr 2025 10:48:01 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:32:12 GMT, Stefan Karlsson wrote: >> During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: >> >> If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) >> >> >> Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. >> >> In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. >> >> I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8353264_zgc_unreserve > - Make addtions static > - 8353264: ZGC: Windows heap unreserving is broken lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24377#pullrequestreview-2739381258 From aboldtch at openjdk.org Thu Apr 3 11:15:53 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 3 Apr 2025 11:15:53 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:15:01 GMT, Stefan Karlsson wrote: >> We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). >> >> The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. >> >> Thanks to @plummercj for digging into this and proposing the same workaround. >> >> Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Remove test from ProblemList A good local fix. But I also think `VMError::is_error_reported_in_current_thread()` should do `return is_error_reported() && _first_error_tid == os::current_thread_id();` Given that `current_thread_id` has a non trivial cost. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24349#pullrequestreview-2739468102 From ayang at openjdk.org Thu Apr 3 11:32:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 3 Apr 2025 11:32:55 GMT Subject: RFR: 8353559: Restructure CollectedHeap error printing In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:09:12 GMT, Joel Sikstr?m wrote: > Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. > > To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. > > Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. > > To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. > > The old and new printing orders are shown below for ZGC: > > # Old > > > > > > > > > > # New > > > > > > > > Testing: > * GHA > * Tiers 1 & 2 > * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. > > ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt > ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24387#pullrequestreview-2739525769 From tschatzl at openjdk.org Thu Apr 3 11:33:45 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 3 Apr 2025 11:33:45 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure [v2] In-Reply-To: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: > Hi all, > > please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). > > This has been made possible with the refactoring of object array task queues. > > At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). > > Testing: tier1-5, some perf testing with no differences > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * some additional assert to make sure the scanner is initialized correctly. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24222/files - new: https://git.openjdk.org/jdk/pull/24222/files/e5ce3984..21cc754a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24222&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24222&range=00-01 Stats: 7 lines in 2 files changed: 6 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24222.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24222/head:pull/24222 PR: https://git.openjdk.org/jdk/pull/24222 From iwalulya at openjdk.org Thu Apr 3 13:31:55 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 3 Apr 2025 13:31:55 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure [v2] In-Reply-To: References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: On Thu, 3 Apr 2025 11:33:45 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). >> >> This has been made possible with the refactoring of object array task queues. >> >> At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). >> >> Testing: tier1-5, some perf testing with no differences >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * some additional assert to make sure the scanner is initialized correctly. LGTM! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24222#pullrequestreview-2739853788 From tschatzl at openjdk.org Thu Apr 3 15:09:18 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 3 Apr 2025 15:09:18 GMT Subject: RFR: 8271870: G1: Add objArray splitting when scanning object with evacuation failure [v2] In-Reply-To: References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: On Thu, 3 Apr 2025 13:29:18 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * some additional assert to make sure the scanner is initialized correctly. > > LGTM! Thanks @walulyai @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/24222#issuecomment-2776099031 From tschatzl at openjdk.org Thu Apr 3 15:09:19 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 3 Apr 2025 15:09:19 GMT Subject: Integrated: 8271870: G1: Add objArray splitting when scanning object with evacuation failure In-Reply-To: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> References: <7hH3ohZ65_msEVaZ0qAI1D3pNI1iyZbKM9sYgfEMbwg=.1d21c70e-788b-43a0-8720-ca0231a70a45@github.com> Message-ID: <3pkPiCQ3xl43uo_Y6hbpUa8qCjgvId2B6tcL23TZTbI=.69ecc66d-a462-41cc-8914-85dc38308b64@github.com> On Tue, 25 Mar 2025 10:35:58 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that makes the object iteration path for evacuation failed objects the same as the one for regular objects (and indeed make both use the same code). > > This has been made possible with the refactoring of object array task queues. > > At the same time this also covers [JDK-8271871](https://bugs.openjdk.org/browse/JDK-8271871). > > Testing: tier1-5, some perf testing with no differences > > Thanks, > Thomas This pull request has now been integrated. Changeset: 64b691ab Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/64b691ab619d2d99a9c6492341074d2794563c16 Stats: 106 lines in 4 files changed: 51 ins; 32 del; 23 mod 8271870: G1: Add objArray splitting when scanning object with evacuation failure 8271871: G1 does not try to deduplicate objects that failed evacuation Reviewed-by: iwalulya, ayang ------------- PR: https://git.openjdk.org/jdk/pull/24222 From ysr at openjdk.org Thu Apr 3 21:45:52 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 3 Apr 2025 21:45:52 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> Message-ID: <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> On Mon, 31 Mar 2025 23:09:53 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Can't verify marked object with complete marking after full GC I looked at the files that changed since the last review only, but can look over all of it once again if necessary (just let me know). This looks good; just a few small comments, and in particular a somewhat formalistic and pedantic distinction between the use of `gc_generation()` and `active_generation()` to fetch the marking context (and the use of `global_generation()`). Otherwise looks good to me. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 352: > 350: assert(_from_region != nullptr, "must set before work"); > 351: assert(_heap->active_generation()->complete_marking_context()->is_marked(p), "must be marked"); > 352: assert(!_heap->active_generation()->complete_marking_context()->allocated_after_mark_start(p), "must be truly marked"); I am probably being a bit pedantic here... I would use `gc_generation()` in all code that is executed strictly by GC threads, and `active_generation()` in all code that may possibly be executed by a mutator thread. It seems as if today this code is only executed by GC threads. In general, there is no real distinction between these field at times like these (STW pauses) when heap verification is taking place, but from a syntactic hygiene perspective. We can otherwise file a ticket to separately clean up any confusion in the use of these fields (and add a dynamic check to prevent creeping confusion). The names aren't super well-chosen, but generally think of `_gc_generation` as the generation that is being GC'd, `_active_generation` as one that mutator threads are aware is being the subject of GC. Any assertions by mutator threads should use the latter and by GC threads the former. The fields are reconciled at STW pauses. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 778: > 776: ShenandoahAdjustPointersClosure() : > 777: _heap(ShenandoahHeap::heap()), > 778: _ctx(ShenandoahHeap::heap()->global_generation()->complete_marking_context()) {} I liked the changes in this file that everywhere use the heap's `_gc_generation` (see comment about the distinction between `gc_generation()` and `active_generation()` above) field to fetch the marking context. While I understand that it might be the case that whenever we are here, the `_gc_generation` must necessarily be the `global_generation()`, I am wondering about: 1. using `_gc_generation` here as well to fetch the context, and 2. secondly, asserting also that the `_gc_generation` is in fact the `global_generation()`. I assume (2) must be the case here? If not, it would be good to see if this can be fixed. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1094: > 1092: ShenandoahHeapRegion* region = _regions.next(); > 1093: ShenandoahHeap* heap = ShenandoahHeap::heap(); > 1094: ShenandoahMarkingContext* const ctx = heap->global_generation()->complete_marking_context(); Same comment as at line 778. src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1191: > 1189: _verify_remembered_after_full_gc, // verify read-write remembered set > 1190: _verify_forwarded_none, // all objects are non-forwarded > 1191: _verify_marked_incomplete, // all objects are marked in incomplete bitmap Is the marking bitmap updated as objects are moved to their new locations? Is that done just to satisfy the verifier? ------------- PR Review: https://git.openjdk.org/jdk/pull/23886#pullrequestreview-2741111545 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027772698 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027710108 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027713065 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027777968 From ysr at openjdk.org Thu Apr 3 21:55:07 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 3 Apr 2025 21:55:07 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: References: <7yfWKXewUM1XqWtlnyuPV3nu9bGr5VNJXuXi1aNQGvQ=.4c53d85b-13f3-4bfc-87c3-634d547bb440@github.com> Message-ID: On Thu, 6 Mar 2025 23:09:47 GMT, Xiaolong Peng wrote: >> OK, yes, that makes sense. Why not then use both `ShenandoahHeap::[complete_]marking_context()` as synonyms for `ShehandoahHeap::active_generation()->[complete_]marking_context()`. See other related comments in this review round. > > I feel using `henandoahHeap::complete_marking_context()` as synonyms for `ShehandoahHeap::active_generation()->[complete_]marking_context()` may cause more confusion, just read from the name it seems that it indicates the marking is complete for the whole heap, not just the active generation. ok, that makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027790148 From ysr at openjdk.org Thu Apr 3 22:10:50 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 3 Apr 2025 22:10:50 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: References: <8w22oUPhZEx0iEIeNQ-GUUjx8jNkjXrTHjfjN_sX4HE=.2c391dd5-227e-4755-ba4d-528a7dcefca3@github.com> Message-ID: On Fri, 7 Mar 2025 19:25:33 GMT, William Kemper wrote: >> You proposal will make the impl of the set_mark_complete/is_mark_complete of ShenandoahGeneration cleaner, but the thing is it will change current design and behavior, we may have to update the code where there methods is called, e.g. when we call `set_mark_complete` of gc_generation/active_generation, if it is global generation, we may have to explicitly call the same methods of ShenandoahYoungGeneration and ShenandoahOldGeneration to fan out the status. >> >> How about I follow up it in a separate task and update the implementation if necessary? I want to limit the changes involved in this PR, and only fix the bug. > > The young and old generations are only instantiated in the generational mode, so using them without checking the mode will result in SEGV in non-generational modes. > > Global collections have a lot of overlap with old collections. I think what Ramki is saying, is that if we change all the code that makes assertions about the completion status of young/old marking to use the `active_generation` field instead, then we wouldn't need to update the completion status of young/old during a global collection. The difficulty here is that we need assurances that the old generation mark bitmap is valid in collections subsequent to a global collection. So, I don't think we can rely on completion status of `active_generation` when it was global, in following collections where it may now refer to young or old. I see. Yes, that makes sense to me, thanks William. It would then be the case for the global generation that if is_mark_complete() then in the generational case that's also the case for both of its constituent generations. May be we can assert that when we fetch that at line 204 (and find it's true)? May be I am being paranoid, but the assert would make me feel confident that the state maintenance isn't going awry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027812176 From xpeng at openjdk.org Thu Apr 3 22:33:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 3 Apr 2025 22:33:56 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> Message-ID: On Thu, 3 Apr 2025 21:39:33 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Can't verify marked object with complete marking after full GC > > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1191: > >> 1189: _verify_remembered_after_full_gc, // verify read-write remembered set >> 1190: _verify_forwarded_none, // all objects are non-forwarded >> 1191: _verify_marked_incomplete, // all objects are marked in incomplete bitmap > > Is the marking bitmap updated as objects are moved to their new locations? Is that done just to satisfy the verifier? Yes, making bitmaps has been reset after full GC, except for the for regions with pined objects. _verify_marked_complete requires complete marking context, it might make more sense to change it to _verify_marked_disable after full GC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027835236 From xpeng at openjdk.org Thu Apr 3 22:37:50 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 3 Apr 2025 22:37:50 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> Message-ID: On Thu, 3 Apr 2025 21:34:06 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Can't verify marked object with complete marking after full GC > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 352: > >> 350: assert(_from_region != nullptr, "must set before work"); >> 351: assert(_heap->active_generation()->complete_marking_context()->is_marked(p), "must be marked"); >> 352: assert(!_heap->active_generation()->complete_marking_context()->allocated_after_mark_start(p), "must be truly marked"); > > I am probably being a bit pedantic here... > > I would use `gc_generation()` in all code that is executed strictly by GC threads, and `active_generation()` in all code that may possibly be executed by a mutator thread. It seems as if today this code is only executed by GC threads. > > In general, there is no real distinction between these field at times like these (STW pauses) when heap verification is taking place, but from a syntactic hygiene perspective. > > We can otherwise file a ticket to separately clean up any confusion in the use of these fields (and add a dynamic check to prevent creeping confusion). The names aren't super well-chosen, but generally think of `_gc_generation` as the generation that is being GC'd, `_active_generation` as one that mutator threads are aware is being the subject of GC. Any assertions by mutator threads should use the latter and by GC threads the former. The fields are reconciled at STW pauses. Make sense, I did notice that there is assert `assert(!Thread::current()->is_Java_thread(), "Not allowed");` in `gc_generation()` suggesting that non-Java thread should call `gc_generation()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027837825 From ysr at openjdk.org Thu Apr 3 22:57:50 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 3 Apr 2025 22:57:50 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> Message-ID: <6dN8IY3rHlVn2aiHJwWdB-OKbbx8GABuvau9-Bdw6vU=.a74101a0-845d-4174-a87a-b41674e90579@github.com> On Thu, 3 Apr 2025 22:31:27 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1191: >> >>> 1189: _verify_remembered_after_full_gc, // verify read-write remembered set >>> 1190: _verify_forwarded_none, // all objects are non-forwarded >>> 1191: _verify_marked_incomplete, // all objects are marked in incomplete bitmap >> >> Is the marking bitmap updated as objects are moved to their new locations? Is that done just to satisfy the verifier? > > Yes, making bitmaps has been reset after full GC, except for the for regions with pined objects. > _verify_marked_complete requires complete marking context, it might make more sense to change it to _verify_marked_disable after full GC. Curious; in that case should it not have failed in your testing because the objects not pinned may not have been marked as the verifier would have insisted they were? Why do we leave the regions with pinned objects marked? I am guessing once we have filled in the dead objects, the marks do not serve any purpose? May be I am missing some corner case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2027852832 From manc at openjdk.org Fri Apr 4 07:26:54 2025 From: manc at openjdk.org (Man Cao) Date: Fri, 4 Apr 2025 07:26:54 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag Thank you both for the quick and detailed responses! > * JDK-8248324 effectively removes the use of `Min/MaxHeapFreeRatio` (apart of full gc, which obviously they also need to be handled in some way that fits into the system). > * JDK-8238687 makes `GCTimeRatio` shrink the heap too, obviating the need for `Min/MaxHeapFreeRatio`, which are currently the knobs that limit excessive memory usage. > > With no flag to interfere (no `Min/MaxHeapFreeRatio`) with each other, there is no need for considering their precedence. > > As you mention, there is need for some strategy to reconcile divergent goals - ultimately G1 needs a single value that tells it to resize the heap in which direction in which degree. > > Incidentally, the way `GCTimeRatio` (or actually the internal gc cpu usage target as an intermediate) is already in use fits these requirements. From some actual value you can calculate a difference to desired, with some smoothing applied, which gives you both direction and degree of the change in heap size (applying some magic factors/constants). I was unaware that G1 plans to stop using `Min/MaxHeapFreeRatio` until now. Looks like [JDK-8238686](https://bugs.openjdk.org/browse/JDK-8238686) has more relevant description. It sounds good to solve all above-mentioned issues and converge on a single flag such as `GCTimeRatio`, and ensure both incremental and full GCs respect this flag. (We should also fix [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) for converging on `GCTimeRatio`. ) It would be nicer if we have a doc or a master bug that describes the overall plan. In comparison, this PR's approach for a high-precedence, "harder" `SoftMaxHeapSize` is an easier and more expedient approach to improve heap resizing, without solving all other issues. However, it requires users to carefully maintain and dynamically adjust `SoftMaxHeapSize` to prevent GC thrashing. I think if all other issues are resolved, our existing internal use cases that use a separate algorithm to dynamically calculate and set the high-precedence `SoftMaxHeapSize` (or `ProposedHeapSize`) could probably migrate to the `GCTimeRatio` approach, and stop using `SoftMaxHeapSize`. I'll need some discussion with my team about what we would do next. Meanwhile, @mo-beck do you guys have preference on how `SoftMaxHeapSize` should work? > > Now there is some question about the weights of these factors: we (in the gc team) prefer to keep G1's balancing between throughput and latency, particularly if the input this time is some value explicitly containing "soft" in its name. Using the 25% from ZGC as a max limit for gc cpu usage if we are (way) beyond what the user desires seems good enough for an initial guess. Not too high, guaranteeing some application progress in the worst case (for this factor!), not too low, guaranteeing that the intent of the user setting this value is respected. Somewhat related to above, our experience with our internal algorithm that adjusts `SoftMaxHeapSize` based on GC CPU overhead, encountered cases that it behaves poorly. The problem is that some workload have large variance in mutator's CPU usage (e.g. peak hours vs off-peak hours), but smaller variance in GC CPU usage. Then it does not make much sense to maintain a constant % for GC CPU overhead, which could cause excessive heap expansion when mutator CPU usage is low. The workaround is to take live size into consideration when calculating `SoftMaxHeapSize`, which is similar to how `Min/MaxHeapFreeRatio` works. I'm not sure if `GCTimeRatio` using wall time and pause time could run into similar issues. I'm happy to experiment when we make progress on JDK-8238687/JDK-8248324/JDK-8349978. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2777769994 From tschatzl at openjdk.org Fri Apr 4 08:10:34 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 4 Apr 2025 08:10:34 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: - * missing file from merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq - * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update - * fix IR code generation tests that change due to barrier cost changes - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=29 Stats: 7089 lines in 110 files changed: 2610 ins; 3555 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Fri Apr 4 09:03:50 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 4 Apr 2025 09:03:50 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 07:23:45 GMT, Man Cao wrote: > Thank you both for the quick and detailed responses! > > > * JDK-8248324 effectively removes the use of `Min/MaxHeapFreeRatio` (apart of full gc, which obviously they also need to be handled in some way that fits into the system). > > * JDK-8238687 makes `GCTimeRatio` shrink the heap too, obviating the need for `Min/MaxHeapFreeRatio`, which are currently the knobs that limit excessive memory usage. > > > > With no flag to interfere (no `Min/MaxHeapFreeRatio`) with each other, there is no need for considering their precedence. > > As you mention, there is need for some strategy to reconcile divergent goals - ultimately G1 needs a single value that tells it to resize the heap in which direction in which degree. > > Incidentally, the way `GCTimeRatio` (or actually the internal gc cpu usage target as an intermediate) is already in use fits these requirements. From some actual value you can calculate a difference to desired, with some smoothing applied, which gives you both direction and degree of the change in heap size (applying some magic factors/constants). > > I was unaware that G1 plans to stop using `Min/MaxHeapFreeRatio` until now. Looks like [JDK-8238686](https://bugs.openjdk.org/browse/JDK-8238686) has more relevant description. It sounds good to solve all above-mentioned issues and converge on a single flag such as `GCTimeRatio`, and ensure both incremental and full GCs respect this flag. (We should also fix [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) for converging on `GCTimeRatio`. ) It would be nicer if we have a doc or a master bug that describes the overall plan. Last time this has been mentioned in the hotspot-gc-dev list has been [here](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051079.html). I remember giving multiple outlines to everyone involved earlier, each mentioning that `Min/MaxHeapFreeRatio` need to go away because it's in the way, so I was/am a bit surprised on this response. I will look through the existing bugs and see if I there is a need for a(nother) master bug. > > In comparison, this PR's approach for a high-precedence, "harder" `SoftMaxHeapSize` is an easier and more expedient approach to improve heap resizing, without solving all other issues. However, it requires users to carefully maintain and dynamically adjust `SoftMaxHeapSize` to prevent GC thrashing. I think if all other issues are resolved, our existing internal use cases that use a separate algorithm to dynamically calculate and set the high-precedence `SoftMaxHeapSize` (or `ProposedHeapSize`) could probably migrate to the `GCTimeRatio` approach, and stop using `SoftMaxHeapSize`. > > I'll need some discussion with my team about what we would do next. Meanwhile, @mo-beck do you guys have preference on how `SoftMaxHeapSize` should work? > > > Now there is some question about the weights of these factors: we (in the gc team) prefer to keep G1's balancing between throughput and latency, particularly if the input this time is some value explicitly containing "soft" in its name. Using the 25% from ZGC as a max limit for gc cpu usage if we are (way) beyond what the user desires seems good enough for an initial guess. Not too high, guaranteeing some application progress in the worst case (for this factor!), not too low, guaranteeing that the intent of the user setting this value is respected. > > Somewhat related to above, our experience with our internal algorithm that adjusts `SoftMaxHeapSize` based on GC CPU overhead, encountered cases that it behaves poorly. The problem is that some workload have large variance in mutator's CPU usage (e.g. peak hours vs off-peak hours), but smaller variance in GC CPU usage. Then it does not make much sense to maintain a constant % for GC CPU overhead, which could cause excessive heap expansion when mutator CPU usage is low. The workaround is to take live size into consideration when calculating `SoftMaxHeapSize`, which is similar to how `Min/MaxHeapFreeRatio` works. > > I'm not sure if `GCTimeRatio` using wall time and pause time could run into similar issues. I'm happy to experiment when we make progress on JDK-8238687/JDK-8248324/JDK-8349978. Obviously there are issues to sort out. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2778005801 From ayang at openjdk.org Fri Apr 4 09:12:23 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 4 Apr 2025 09:12:23 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 Message-ID: Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. ------------- Commit messages: - tmp - gclocker-nested Changes: https://git.openjdk.org/jdk/pull/24407/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24407&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352116 Stats: 31 lines in 4 files changed: 20 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24407.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24407/head:pull/24407 PR: https://git.openjdk.org/jdk/pull/24407 From eosterlund at openjdk.org Fri Apr 4 09:12:23 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 4 Apr 2025 09:12:23 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Looks good. Would be nice to refactor the if (UseSerialGC || UseParallelGC) code to something that explains why it's there (those are the GCs that use the new improved GC locker). But that's pre existing so I don't mind if it's split to a separate RFE. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24407#pullrequestreview-2739864515 From jsikstro at openjdk.org Fri Apr 4 11:56:07 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 4 Apr 2025 11:56:07 GMT Subject: RFR: 8353471: ZGC: Redundant generation id in ZGeneration In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:52:49 GMT, Joel Sikstr?m wrote: > The ZGeneration class (and in turn ZGenerationOld and ZGenerationYoung) keeps track of its own ZGenerationId, which means that the generation id does not need to be passed along as an argument when calling internal functions. > > I've removed the id parameter from `ZGeneration::select_relocation_set` in favor of using the member variable `_id`. Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24374#issuecomment-2778471557 From jsikstro at openjdk.org Fri Apr 4 11:56:07 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 4 Apr 2025 11:56:07 GMT Subject: Integrated: 8353471: ZGC: Redundant generation id in ZGeneration In-Reply-To: References: Message-ID: <8QZgCh8R7ZycqowtfLbPwmbJz59ni6HckX2dwRW-U7w=.1db6ca63-5edd-4086-be8a-2d55ae6ac0de@github.com> On Wed, 2 Apr 2025 06:52:49 GMT, Joel Sikstr?m wrote: > The ZGeneration class (and in turn ZGenerationOld and ZGenerationYoung) keeps track of its own ZGenerationId, which means that the generation id does not need to be passed along as an argument when calling internal functions. > > I've removed the id parameter from `ZGeneration::select_relocation_set` in favor of using the member variable `_id`. This pull request has now been integrated. Changeset: b92a4436 Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/b92a44364d3a2267f5bc9aef3077805bebdf9fba Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod 8353471: ZGC: Redundant generation id in ZGeneration Reviewed-by: stefank, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24374 From xpeng at openjdk.org Fri Apr 4 18:11:50 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 4 Apr 2025 18:11:50 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: <6dN8IY3rHlVn2aiHJwWdB-OKbbx8GABuvau9-Bdw6vU=.a74101a0-845d-4174-a87a-b41674e90579@github.com> References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> <6dN8IY3rHlVn2aiHJwWdB-OKbbx8GABuvau9-Bdw6vU=.a74101a0-845d-4174-a87a-b41674e90579@github.com> Message-ID: On Thu, 3 Apr 2025 22:55:18 GMT, Y. Srinivas Ramakrishna wrote: >> Yes, making bitmaps has been reset after full GC, except for the for regions with pined objects. >> _verify_marked_complete requires complete marking context, it might make more sense to change it to _verify_marked_disable after full GC. > > Curious; in that case should it not have failed in your testing because the objects not pinned may not have been marked as the verifier would have insisted they were? Why do we leave the regions with pinned objects marked? I am guessing once we have filled in the dead objects, the marks do not serve any purpose? > > May be I am missing some corner case? It does, one of the changes https://github.com/openjdk/jdk/pull/24092 is to set the marking completeness flag to false after Full GC because the bitmaps have been reset, `_verify_marked_complete` requires complete marking marking context so there is assert error. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2029244689 From xpeng at openjdk.org Fri Apr 4 18:18:30 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 4 Apr 2025 18:18:30 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v8] In-Reply-To: References: Message-ID: > With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. > > This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] Tier 1 > - [x] Tier 2 Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address PR comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23886/files - new: https://git.openjdk.org/jdk/pull/23886/files/7c73e121..d4af962a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=06-07 Stats: 8 lines in 2 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23886/head:pull/23886 PR: https://git.openjdk.org/jdk/pull/23886 From sangheki at openjdk.org Fri Apr 4 21:21:22 2025 From: sangheki at openjdk.org (Sangheon Kim) Date: Fri, 4 Apr 2025 21:21:22 GMT Subject: RFR: 8346568: G1: Other time can be negative Message-ID: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. 1. Different scope of measurement - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) - Changed not to be included in sum_of_sub_phases. - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. 2. Duplicated measurement - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. 3. Concurrent task execution time - Sometimes just triggering concurrent work takes 2 digit milliseconds. Changed to add only initiating time on sum_of_sub_phases and keep displaying concurrent tasks' average execution time. Testing: tier 1 ~ 5 ------------- Commit messages: - Separate measurement for cleanup Changes: https://git.openjdk.org/jdk/pull/24454/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24454&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346568 Stats: 61 lines in 4 files changed: 35 ins; 17 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24454/head:pull/24454 PR: https://git.openjdk.org/jdk/pull/24454 From kbarrett at openjdk.org Sat Apr 5 06:29:47 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 5 Apr 2025 06:29:47 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24407#pullrequestreview-2744728350 From duke at openjdk.org Mon Apr 7 05:45:54 2025 From: duke at openjdk.org (Saint Wesonga) Date: Mon, 7 Apr 2025 05:45:54 GMT Subject: RFR: 8350722: Serial GC: Remove duplicate logic for detecting pointers in young gen In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 06:54:19 GMT, Saint Wesonga wrote: > Checking whether a pointer is in the young generation is currently done by comparing the pointer to the end of the young generation reserved space. The duplication of these checks in various places complicates any changes the layout of the young generation since all these locations need to be updated. This PR replaces the duplicated logic with the DefNewGeneration::is_in_reserved method. @tschatzl , I'm closing this PR now that I have an updated approach in https://github.com/openjdk/jdk/pull/23853 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23792#issuecomment-2782077611 From duke at openjdk.org Mon Apr 7 05:45:54 2025 From: duke at openjdk.org (Saint Wesonga) Date: Mon, 7 Apr 2025 05:45:54 GMT Subject: Withdrawn: 8350722: Serial GC: Remove duplicate logic for detecting pointers in young gen In-Reply-To: References: Message-ID: <_hkx74X6j9YnTj9Z_dUXjLPXSMY4IeRk3W4Vo5Ti_KI=.0b979267-53cc-4cc4-8f03-c33d726bedc7@github.com> On Wed, 26 Feb 2025 06:54:19 GMT, Saint Wesonga wrote: > Checking whether a pointer is in the young generation is currently done by comparing the pointer to the end of the young generation reserved space. The duplication of these checks in various places complicates any changes the layout of the young generation since all these locations need to be updated. This PR replaces the duplicated logic with the DefNewGeneration::is_in_reserved method. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23792 From tschatzl at openjdk.org Mon Apr 7 07:55:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 7 Apr 2025 07:55:52 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24407#pullrequestreview-2745840662 From tschatzl at openjdk.org Mon Apr 7 07:57:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 7 Apr 2025 07:57:51 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: <9nwg79xCItPNaMsHRK6VQFl-dkWPP385vHqhvTYK_k0=.a830743a-5fd6-46a3-87c3-fd2a164ddf6a@github.com> On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag Filed [JDK-8353716](https://bugs.openjdk.org/browse/JDK-8353716). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2782349959 From thomas.schatzl at oracle.com Mon Apr 7 09:07:08 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 7 Apr 2025 11:07:08 +0200 Subject: Moving Forward with AHS for G1 In-Reply-To: References: Message-ID: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> Hi all, On 26.03.25 03:33, Monica Beckwith wrote: > Hi Ivan, > Thanks for the note ? and nice to meet you! > > The refinements you're working on around |GCTimeRatio|?and memory > uncommit are valuable contributions to the broader AHS direction we've > been shaping. They align closely with the multi-input heap sizing model > Thomas and I outlined ? especially the emphasis on GC cost (via | > GCTimeRatio|) and memory responsiveness as primary drivers. > > These kinds of enhancements are central to making G1?s heap sizing more > adaptive and responsive, particularly in environments with shifting > workload patterns. I?m especially interested in your work around > improving the GC time-base ? it seems like a crucial piece for > coordinating GC-triggered adjustments more precisely. > > Given the growing collaboration across contributors, I?ve been thinking > of opening an umbrella issue to track these efforts and possibly > drafting a JEP to help clarify and unify the overall scope. With Oracle, > Google, and others actively contributing, it?s exciting to see a shared > vision taking shape ? and your work is clearly part of it. > I created an umbrella CR at https://bugs.openjdk.org/browse/JDK-8353716 supposed to contain latest info on the effort. Feel free to add to it. If possible, I would like to keep the more free-form discussion here in the mailing list though. My bad for not following up on this much much earlier. > I?m genuinely excited to see this come together. Looking forward to > continuing the discussion and shaping the future of G1 ergonomics together. > Hth, Thomas From ayang at openjdk.org Mon Apr 7 09:19:03 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 7 Apr 2025 09:19:03 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24407#issuecomment-2782605636 From ayang at openjdk.org Mon Apr 7 09:19:03 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 7 Apr 2025 09:19:03 GMT Subject: Integrated: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. This pull request has now been integrated. Changeset: 39549f89 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/39549f89905019fa90dd20ff8b6822c1351cbaa6 Stats: 31 lines in 4 files changed: 20 ins; 7 del; 4 mod 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 Reviewed-by: kbarrett, tschatzl, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24407 From tschatzl at openjdk.org Mon Apr 7 09:22:50 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 7 Apr 2025 09:22:50 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag Also collected thoughts and existing documents with some additional rough explanations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2782661911 From shade at openjdk.org Mon Apr 7 10:33:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 10:33:35 GMT Subject: RFR: 8348278: Trim InitialRAMPercentage to improve startup in default modes [v3] In-Reply-To: References: Message-ID: > See bug for discussion. This is the code change, which is simple. What is not simple is deciding what the new value should be. The change would probably require CSR, which I can file after we agree on the value. > > I think cutting to 0.2% of RAM size gets us into good sweet spot: > - On huge 1024G machine, this yields 2G initial heap > - On reasonably sized 128G machine, this gives 256M initial heap > - On smaller 1G container, this gives 2M initial heap > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8348278-trim-iramp - Also man page - Merge branch 'master' into JDK-8348278-trim-iramp - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23262/files - new: https://git.openjdk.org/jdk/pull/23262/files/d3a327ae..6a6c3ab8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23262&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23262&range=01-02 Stats: 152480 lines in 3423 files changed: 68119 ins; 65042 del; 19319 mod Patch: https://git.openjdk.org/jdk/pull/23262.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23262/head:pull/23262 PR: https://git.openjdk.org/jdk/pull/23262 From shade at openjdk.org Mon Apr 7 10:48:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 10:48:51 GMT Subject: RFR: 8348278: Trim InitialRAMPercentage to improve startup in default modes [v3] In-Reply-To: References: Message-ID: <_J82bhnQOjixO9UDu2Mm0CsGVNe9gXXBxayIyv2TFz8=.2deea0ff-c51b-499d-a8fd-1ebc253a9e2d@github.com> On Mon, 7 Apr 2025 10:33:35 GMT, Aleksey Shipilev wrote: >> See bug for discussion. This is the code change, which is simple. What is not simple is deciding what the new value should be. The change would probably require CSR, which I can file after we agree on the value. >> >> I think cutting to 0.2% of RAM size gets us into good sweet spot: >> - On huge 1024G machine, this yields 2G initial heap >> - On reasonably sized 128G machine, this gives 256M initial heap >> - On smaller 1G container, this gives 2M initial heap >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8348278-trim-iramp > - Also man page > - Merge branch 'master' into JDK-8348278-trim-iramp > - Fix CSR filed: [JDK-8353837](https://bugs.openjdk.org/browse/JDK-8353837) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23262#issuecomment-2782893464 From jsikstro at openjdk.org Mon Apr 7 11:33:57 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 7 Apr 2025 11:33:57 GMT Subject: RFR: 8353559: Restructure CollectedHeap error printing In-Reply-To: References: Message-ID: <9tbw7_56t4aDDTVE-KI9b84ccG_Iky2LRhsMmL0gXF0=.f03a1ac0-099f-465d-977d-751f7b5cf7ff@github.com> On Wed, 2 Apr 2025 18:09:12 GMT, Joel Sikstr?m wrote: > Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. > > To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. > > Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. > > To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. > > The old and new printing orders are shown below for ZGC: > > # Old > > > > > > > > > > # New > > > > > > > > Testing: > * GHA > * Tiers 1 & 2 > * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. > > ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt > ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt Since this is a relatively small change, I'm hoping that the Shenandoah devs are on board. I am going to integrate this now so that we can continue working in this area in ZGC. I am happy to follow up on this if there are any more opinions in the future. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24387#issuecomment-2783006637 From jsikstro at openjdk.org Mon Apr 7 11:33:58 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 7 Apr 2025 11:33:58 GMT Subject: Integrated: 8353559: Restructure CollectedHeap error printing In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:09:12 GMT, Joel Sikstr?m wrote: > Calling Universe::heap()->print_on_error() gets dispatched to the most specific implementation, which for some GCs is their own implementation instead of the default in CollectedHeap. Each GC-specific implementation calls back to CollectedHeap::print_on_error(), which then dispatches back into the specific implementation of print_on(). This is kind of awkward and creates a call-chain that's not straightforward to wrap your head around, jumping back and forth via CollectedHeap and the specific implementation. > > To make the call-chain cleaner, I have made print_on_error() a pure virtual method in CollectedHeap, and implemented print_on_error() in each GC's implementation of CollectedHeap. In addition, I have removed print_extended_on() from CollectedHeap and implemented that for the GCs that actually need/use it. > > Removing the usage of the common print_on_error() also means that GCs that do not print anything interesting for their barrier set can omit this. So, I've removed it from ZGC and Shenandoah. > > To make print_on_error() consistent with print_on(), I have moved the printing of "Heap:" to the caller(s) of print_on_error() (only inside vmError.cpp). This is a trivial change for all GCs except ZGC, which requires some restructuring in its error printing. > > The old and new printing orders are shown below for ZGC: > > # Old > > > > > > > > > > # New > > > > > > > > Testing: > * GHA > * Tiers 1 & 2 > * Manually verified that printing still works and outputs the intended information via running the following commands and comparing the output. > > ../fastdebug-old/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_old.txt > ../fastdebug-new/jdk/bin/java -XX:ErrorHandlerTest=14 -XX:+ErrorFileToStdout -XX:+Use${gc}GC --version > ${gc}_new.txt This pull request has now been integrated. Changeset: c494a00a Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/c494a00a66d21d2e403fd9ce253eb132c34e455d Stats: 141 lines in 16 files changed: 75 ins; 52 del; 14 mod 8353559: Restructure CollectedHeap error printing Reviewed-by: stefank, eosterlund, ayang ------------- PR: https://git.openjdk.org/jdk/pull/24387 From ysr at openjdk.org Tue Apr 8 01:29:09 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 8 Apr 2025 01:29:09 GMT Subject: RFR: 8353218: Shenandoah: Out of date comment references Brooks pointers In-Reply-To: References: Message-ID: <-zSlCWIHyxeR9-mjP1si49UGzRl9qMSSWscVELQxYAQ=.8f6e1108-bd47-49f8-918b-c2f6c9eb640b@github.com> On Fri, 28 Mar 2025 23:33:48 GMT, William Kemper wrote: > Trivial change, comment only. Thanks for fixing this! ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24304#pullrequestreview-2748445404 From tschatzl at openjdk.org Tue Apr 8 11:57:21 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 8 Apr 2025 11:57:21 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 [v2] In-Reply-To: References: Message-ID: <_1K7Q1L9cPr-wd5jefhS6rBjR0sJvbBWsjf71YbR6k4=.0c0c89ae-15a8-4e9b-a3fb-7c028740b15c@github.com> On Wed, 2 Apr 2025 11:15:01 GMT, Stefan Karlsson wrote: >> We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). >> >> The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. >> >> Thanks to @plummercj for digging into this and proposing the same workaround. >> >> Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Remove test from ProblemList Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24349#pullrequestreview-2749704531 From stefank at openjdk.org Tue Apr 8 15:22:49 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 8 Apr 2025 15:22:49 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 [v3] In-Reply-To: References: Message-ID: > We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). > > The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. > > Thanks to @plummercj for digging into this and proposing the same workaround. > > Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8352994_is_error_reported - Remove test from ProblemList - 8352994: ZGC: Fix regression introduced in JDK-8350572 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24349/files - new: https://git.openjdk.org/jdk/pull/24349/files/fe07a340..4720444d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24349&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24349&range=01-02 Stats: 26029 lines in 781 files changed: 18783 ins; 5145 del; 2101 mod Patch: https://git.openjdk.org/jdk/pull/24349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24349/head:pull/24349 PR: https://git.openjdk.org/jdk/pull/24349 From ysr at openjdk.org Tue Apr 8 21:54:25 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 8 Apr 2025 21:54:25 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 00:33:58 GMT, Xiaolong Peng wrote: >> Right, active_generation should be used instead of global_generation to get the complete marking context, with the context of full GC, even we know it active_generation is the global gen, but it's better not to use global_generation directly for better maintainable code. > > Updated it to use active_generation. Thanks for the fixes; this looks good! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2034084571 From wkemper at openjdk.org Tue Apr 8 22:04:34 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 8 Apr 2025 22:04:34 GMT Subject: Integrated: 8353218: Shenandoah: Out of date comment references Brooks pointers In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 23:33:48 GMT, William Kemper wrote: > Trivial change, comment only. This pull request has now been integrated. Changeset: b4ab964b Author: William Kemper URL: https://git.openjdk.org/jdk/commit/b4ab964b72c631632511e6f01cdd5a47fb2e31fa Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8353218: Shenandoah: Out of date comment references Brooks pointers Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/24304 From ysr at openjdk.org Tue Apr 8 23:30:27 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 8 Apr 2025 23:30:27 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v8] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 18:18:30 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address PR comments LGTM! Thanks for your patience! ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23886#pullrequestreview-2751608049 From ysr at openjdk.org Tue Apr 8 23:30:28 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 8 Apr 2025 23:30:28 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> <6dN8IY3rHlVn2aiHJwWdB-OKbbx8GABuvau9-Bdw6vU=.a74101a0-845d-4174-a87a-b41674e90579@github.com> Message-ID: On Fri, 4 Apr 2025 18:09:36 GMT, Xiaolong Peng wrote: >> Curious; in that case should it not have failed in your testing because the objects not pinned may not have been marked as the verifier would have insisted they were? Why do we leave the regions with pinned objects marked? I am guessing once we have filled in the dead objects, the marks do not serve any purpose? >> >> May be I am missing some corner case? > > It does, one of the changes in https://github.com/openjdk/jdk/pull/24092 is to set the marking completeness flag to false after Full GC because the bitmaps have been reset, `_verify_marked_complete` requires complete marking marking context so there is assert error. Thanks; I looked through the code and see where I had confused myself above. This looks good to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2034163269 From xpeng at openjdk.org Tue Apr 8 23:45:33 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 8 Apr 2025 23:45:33 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v8] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 18:18:30 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address PR comments thanks all for the reviews and suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23886#issuecomment-2787875051 From duke at openjdk.org Tue Apr 8 23:45:33 2025 From: duke at openjdk.org (duke) Date: Tue, 8 Apr 2025 23:45:33 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v8] In-Reply-To: References: Message-ID: <3CxQWRmVEeYX_O3D2Lh5-1GiTRLSZRkaNKDc3ztM2ZE=.68ecc2fe-b5e1-4e62-a58e-0de858d9dc5f@github.com> On Fri, 4 Apr 2025 18:18:30 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address PR comments @pengxiaolong Your change (at version d4af962adb11c03281af80ecfc12344dac01b11a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23886#issuecomment-2787877229 From xpeng at openjdk.org Tue Apr 8 23:45:34 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 8 Apr 2025 23:45:34 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v7] In-Reply-To: References: <5Yxk8oBN69i5Ty_jRCtXoLeNjyet6DEySoFqnzxrblk=.9a1ad401-9da2-4d06-8e22-c51d810dd2f8@github.com> <6sjBSQODcXKXzjvshAJiHq96N4Ler-TEBaSuN4nNr6w=.a6ee8ec7-9a3e-49ae-9718-8d1a027e6420@github.com> <6dN8IY3rHlVn2aiHJwWdB-OKbbx8GABuvau9-Bdw6vU=.a74101a0-845d-4174-a87a-b41674e90579@github.com> Message-ID: <6D387djX5BAacoBeaJCLj1HGYsNoRm3lTWVipWp6vn0=.ed5303a1-0107-405f-a0d0-e1360315fc46@github.com> On Tue, 8 Apr 2025 23:27:33 GMT, Y. Srinivas Ramakrishna wrote: >> It does, one of the changes in https://github.com/openjdk/jdk/pull/24092 is to set the marking completeness flag to false after Full GC because the bitmaps have been reset, `_verify_marked_complete` requires complete marking marking context so there is assert error. > > Thanks; I looked through the code and see where I had confused myself above. This looks good to me. thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r2034171337 From kdnilsen at openjdk.org Wed Apr 9 00:20:29 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 00:20:29 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 17:49:38 GMT, William Kemper wrote: >> The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. >> >> However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. >> >> This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 159: > >> 157: >> 158: inline size_t ShenandoahHeapRegion::get_mixed_candidate_live_data_bytes() const { >> 159: assert(SafepointSynchronize::is_at_safepoint(), "Should be at Shenandoah safepoint"); > > Could we use `shenandoah_assert_safepoint` here (and other places) instead? Good call. I'll make this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2034198164 From kdnilsen at openjdk.org Wed Apr 9 00:29:25 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 00:29:25 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 18:16:43 GMT, William Kemper wrote: >> The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. >> >> However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. >> >> This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 78: > >> 76: _live_data(0), >> 77: _critical_pins(0), >> 78: _mixed_candidate_garbage_words(0), > > Do we need a new field to track this? During `final_mark`, we call `increase_live_data_alloc_words` to add `TAMS + top` to `_live_data` to account for objects allocated during mark. Could we "fix" `get_live_data` so that it always returned marked objects (counted by `increase_live_data_gc_words`) _plus_ `top - TAMS`. This way, the live data would not become stale after `final_mark` and we wouldn't have another field to manage. What do you think? This is a good idea. Let me experiment with this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2034208988 From xpeng at openjdk.org Wed Apr 9 01:02:41 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 9 Apr 2025 01:02:41 GMT Subject: Integrated: 8351091: Shenandoah: global marking context completeness is not accurately maintained In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:34:16 GMT, Xiaolong Peng wrote: > With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. > > This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] Tier 1 > - [x] Tier 2 This pull request has now been integrated. Changeset: aec1fe0a Author: Xiaolong Peng Committer: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/jdk/commit/aec1fe0a17fa6801e26a517d4d21656353409f7c Stats: 71 lines in 17 files changed: 7 ins; 34 del; 30 mod 8351091: Shenandoah: global marking context completeness is not accurately maintained Reviewed-by: ysr, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/23886 From kdnilsen at openjdk.org Wed Apr 9 01:55:48 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 01:55:48 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: <8UF5sC8lbb-hBUpkbzDarvFxOlbQU0nDPbTqWhAedM0=.e078bb2a-2331-47f7-aa67-807d09c4ca11@github.com> > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Experiment with reviewer suggestion Redefine the way ShenandoahHeapRegion::get_live_data_ works to simplify changes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/70613882..3c1f788a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=00-01 Stats: 28 lines in 5 files changed: 15 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From stefank at openjdk.org Wed Apr 9 06:22:35 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Apr 2025 06:22:35 GMT Subject: RFR: 8352994: ZGC: Fix regression introduced in JDK-8350572 [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 15:22:49 GMT, Stefan Karlsson wrote: >> We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). >> >> The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. >> >> Thanks to @plummercj for digging into this and proposing the same workaround. >> >> Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8352994_is_error_reported > - Remove test from ProblemList > - 8352994: ZGC: Fix regression introduced in JDK-8350572 Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24349#issuecomment-2788390540 From stefank at openjdk.org Wed Apr 9 06:22:35 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Apr 2025 06:22:35 GMT Subject: Integrated: 8352994: ZGC: Fix regression introduced in JDK-8350572 In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 06:58:56 GMT, Stefan Karlsson wrote: > We have seen a bunch of timeouts that all points towards the introduction of a check against VMError::is_error_reported_in_current_thread() in the ZGC verification code. I propose this workaround to first check if there's really an error reporting event that is going on by checking VMError::is_error_reported(). > > The underlying performance issue (or hang(?)) when calling os::current_thread_id() is being investigated as a separate bug. This fix just tries to clean up issues we see when running ZGC testing. > > Thanks to @plummercj for digging into this and proposing the same workaround. > > Testing: GHA is clean, I'll run this through a few tiers of our CI pipeline This pull request has now been integrated. Changeset: 3340e13f Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/3340e13fd0a8d25212003e8371a135471b2f44b3 Stats: 2 lines in 2 files changed: 0 ins; 1 del; 1 mod 8352994: ZGC: Fix regression introduced in JDK-8350572 Reviewed-by: aboldtch, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/24349 From manc at openjdk.org Wed Apr 9 07:27:33 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 9 Apr 2025 07:27:33 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:08:19 GMT, Man Cao wrote: >> Hi all, >> >> I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: >> >> - does not respect `MinHeapSize`; >> - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; >> - does not affect heuristcs to trigger a concurrent cycle; >> >> [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use Atomic::load for flag Thank you for creating [JDK-8353716](https://bugs.openjdk.org/browse/JDK-8353716)! > Last time this has been mentioned in the hotspot-gc-dev list has been [here](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051079.html). I remember giving multiple outlines to everyone involved earlier, each mentioning that `Min/MaxHeapFreeRatio` need to go away because it's in the way, so I was/am a bit surprised on this response. Apology for overlooking previous mentions about `Min/MaxHeapFreeRatio`. Previous mentions were mostly inside responses to complicated issues, and I have hardly got the time to follow hotspot-gc-dev closely. To be honest, we didn't pay much attention to `Min/MaxHeapFreeRatio` before I started working on this PR. I guess this is a good example that a one-pager doc/umbrella bug provides cleaner communication and additional values over email discussion, especially when one party already has a pretty detailed plan for how it should be done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2788609820 From manc at google.com Wed Apr 9 07:44:08 2025 From: manc at google.com (Man Cao) Date: Wed, 9 Apr 2025 00:44:08 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> Message-ID: Hi all, Thank you Thomas for creating the umbrella CR at https://bugs.openjdk.org/browse/JDK-8353716. While waiting a bit on SoftMaxHeapSize PR ( https://github.com/openjdk/jdk/pull/24211) to see if others have feedback, I could start working on CurrentMaxHeapSize ( https://bugs.openjdk.org/browse/JDK-8204088). I also agree that CurrentMaxHeapSize may not need a JEP due to its small size and low complexity. Should it proceed similarly to how SoftMaxHeapSize was introduced? I.e, https://bugs.openjdk.org/browse/JDK-8222145, and creating a CSR (https://bugs.openjdk.org/browse/JDK-8222181) for it. Separately, for removing support for Min/MaxHeapFreeRatio for G1 (mentioned in https://bugs.openjdk.org/browse/JDK-8353716 and https://bugs.openjdk.org/browse/JDK-8238686), how do we handle existing users that set these two flags? (We have very few internal users setting these two flags. But yesterday I ran into a use case that sets -XX:MinHeapFreeRatio=0 -XX:MaxHeapFreeRatio=0 for G1...) Best, Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Wed Apr 9 07:56:37 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 07:56:37 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 07:24:43 GMT, Man Cao wrote: > > Last time this has been mentioned in the hotspot-gc-dev list has been [here](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051079.html). I remember giving multiple outlines to everyone involved earlier, each mentioning that `Min/MaxHeapFreeRatio` need to go away because it's in the way, so I was/am a bit surprised on this response. > > Apology for overlooking previous mentions about `Min/MaxHeapFreeRatio`. Previous mentions were mostly inside responses to complicated issues, and I have hardly got the time to follow hotspot-gc-dev closely. To be honest, we didn't pay much attention to `Min/MaxHeapFreeRatio` before I started working on this PR. > > I guess this is a good example that a one-pager doc/umbrella bug provides cleaner communication and additional values over email discussion, especially when one party already has a pretty detailed plan for how it should be done. Don't worry, I should have been better with following up with that summary about thoughts/plans communicated so far somewhere publicly. Let's go forward with that CR summarizing the respective (current) general direction. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2788687511 From ayang at openjdk.org Wed Apr 9 10:36:44 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 9 Apr 2025 10:36:44 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 08:10:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: > > - * missing file from merge > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq > - * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update > - * fix IR code generation tests that change due to barrier cost changes > - * factor out card table and refinement table merging into a single > method > - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 > - * obsolete G1UpdateBufferSize > > G1UpdateBufferSize has previously been used to size the refinement > buffers and impose a minimum limit on the number of cards per thread > that need to be pending before refinement starts. > > The former function is now obsolete with the removal of the dirty > card queues, the latter functionality has been taken over by the new > diagnostic option `G1PerThreadPendingCardThreshold`. > > I prefer to make this a diagnostic option is better than a product option > because it is something that is only necessary for some test cases to > produce some otherwise unwanted behavior (continuous refinement). > > CSR is pending. > - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 170: > 168: } > 169: return result; > 170: } I see in `G1ConcurrentRefineThread::do_refinement`: // The yielding may have completed the task, check. if (!state.is_in_progress()) { I wonder if it's simpler to use `is_in_progress` consistently to detect whether we should restart sweep, instead of `_sweep_start_epoch`. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 349: > 347: } > 348: > 349: bool has_sweep_rt_work = is_in_progress() && _state == State::SweepRT; Why `is_in_progress()`? src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 79: > 77: > 78: void inc_cards_scanned(size_t increment = 1) { _cards_scanned += increment; } > 79: void inc_cards_clean(size_t increment = 1) { _cards_clean += increment; } The sole caller always passes in arg, so no need for default-arg-value. src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 87: > 85: void add_atomic(G1ConcurrentRefineStats* other); > 86: > 87: G1ConcurrentRefineStats& operator+=(const G1ConcurrentRefineStats& other); Seems that these operators are not used after this PR. src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 83: > 81: break; > 82: } > 83: case G1RemSet::HasRefToOld : break; // Nothing special to do. Why doesn't call `inc_cards_clean_again` in this case? The card is cleared also. (In fact, I don't get why this needs to a separate case from `NoInteresting`.) src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 156: > 154: > 155: _refine_stats.inc_cards_scanned(claim.size()); > 156: _refine_stats.inc_cards_clean(claim.size() - scanned); I feel these two "scanned" mean sth diff; the local var should probably be sth like `num_dirty_cards`. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 207: > 205: > 206: if (!interrupted_by_gc) { > 207: state.add_yield_duration(G1CollectedHeap::heap()->safepoint_duration() - synchronize_duration_at_sweep_start); I think this is recorded to later calculate actual refine-time, i.e. sweep-time - yield-time. However, why can't yield-duration be recorded in this refine-control-thread directly -- accumulation of `jlong yield_duration = os::elapsed_counter() - yield_start`. I feel that is easier to reason than going through g1heap. src/hotspot/share/gc/g1/g1ReviseYoungListTargetLengthTask.cpp line 75: > 73: { > 74: MutexLocker x(G1ReviseYoungLength_lock, Mutex::_no_safepoint_check_flag); > 75: G1Policy* p = g1h->policy(); Can probably use the existing `policy`. src/hotspot/share/gc/g1/g1ReviseYoungListTargetLengthTask.cpp line 88: > 86: } > 87: > 88: G1ReviseYoungLengthTargetLengthTask::G1ReviseYoungLengthTargetLengthTask(const char* name) : I wonder if the class name can be shortened a bit, sth like `G1ReviseYoungLengthTask`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033251162 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033222407 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033929489 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033975054 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033934399 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033910496 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2032008908 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2029855278 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2029855435 From duke at openjdk.org Wed Apr 9 10:48:48 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Wed, 9 Apr 2025 10:48:48 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly Message-ID: After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. before this patch: ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops bool UseCompressedOops = false {product lp64_product} {default} openjdk version "25-internal" 2025-09-16 OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) after this patch: ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops bool UseCompressedOops = true {product lp64_product} {ergonomic} openjdk version "25-internal" 2025-09-16 OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) ------------- Commit messages: - 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly Changes: https://git.openjdk.org/jdk/pull/24541/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354145 Stats: 8 lines in 3 files changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24541/head:pull/24541 PR: https://git.openjdk.org/jdk/pull/24541 From tschatzl at openjdk.org Wed Apr 9 11:26:24 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 11:26:24 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 10:37:24 GMT, Tongbao Zhang wrote: > After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. > So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. > > When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. > > Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. > > before this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = false {product lp64_product} {default} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > > after this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = true {product lp64_product} {ergonomic} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) Would it be possible to add a regression test that checks the value of the `UseCompressedOops` flag after running a VM with these settings? ------------- PR Review: https://git.openjdk.org/jdk/pull/24541#pullrequestreview-2753132517 From duke at openjdk.org Wed Apr 9 11:37:39 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Wed, 9 Apr 2025 11:37:39 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 11:23:56 GMT, Thomas Schatzl wrote: > Would it be possible to add a regression test that checks the value of the `UseCompressedOops` flag after running a VM with these settings? Thanks for your suggestion, I will add a test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24541#issuecomment-2789382638 From rcastanedalo at openjdk.org Wed Apr 9 12:03:49 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 9 Apr 2025 12:03:49 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> On Fri, 4 Apr 2025 08:10:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: > > - * missing file from merge > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq > - * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update > - * fix IR code generation tests that change due to barrier cost changes > - * factor out card table and refinement table merging into a single > method > - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 > - * obsolete G1UpdateBufferSize > > G1UpdateBufferSize has previously been used to size the refinement > buffers and impose a minimum limit on the number of cards per thread > that need to be pending before refinement starts. > > The former function is now obsolete with the removal of the dirty > card queues, the latter functionality has been taken over by the new > diagnostic option `G1PerThreadPendingCardThreshold`. > > I prefer to make this a diagnostic option is better than a product option > because it is something that is only necessary for some test cases to > produce some otherwise unwanted behavior (continuous refinement). > > CSR is pending. > - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f Hi Thomas, great simplification and encouraging results! I reviewed the compiler-related parts of the changeset, including x64 and aarch64 changes. src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 246: > 244: __ cbz(new_val, done); > 245: } > 246: // Storing region crossing non-null, is card young? Suggestion: // Storing region crossing non-null. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101: > 99: } > 100: > 101: void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators, Have you measured the performance impact of inlining this assembly code instead of resorting to a runtime call as done before? Is it worth the maintenance cost (for every platform), risk of introducing bugs, etc.? src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 145: > 143: > 144: __ bind(is_clean_card); > 145: // Card was clean. Dirty card and go to next.. This code seems unreachable if `!UseCondCardMark`, meaning we only dirty cards here if `UseCondCardMark` is enabled. Is that intentional? src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 319: > 317: const Register thread, > 318: const Register tmp1, > 319: const Register tmp2, Since `tmp2` is not needed in the x64 post-barrier, I suggest not passing it around for this platform, for simplicity and also to make optimization opportunities more visible in the future. Here is my suggestion: https://github.com/robcasloz/jdk/commit/855ec8df4a641f8c491c5c09acea3ee434b7e230, feel free to merge if you agree. src/hotspot/share/gc/g1/c1/g1BarrierSetC1.cpp line 38: > 36: #include "c1/c1_LIRAssembler.hpp" > 37: #include "c1/c1_MacroAssembler.hpp" > 38: #endif // COMPILER1 I suggest removing the conditional compilation directives and grouping these includes together with the above `c1` ones. src/hotspot/share/gc/g1/c1/g1BarrierSetC1.cpp line 147: > 145: state->do_input(_thread); > 146: > 147: // Use temp registers to ensure these they use different registers. Suggestion: // Use temps to enforce different registers. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 307: > 305: + 6 // same region check: Uncompress (new_val) oop, xor, shr, (cmp), jmp > 306: + 4 // new_val is null check > 307: + 4; // card not clean check. It probably does not affect the unrolling heuristics too much, but you may want to make the last cost component conditional on `UseCondCardMark`. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 396: > 394: bool needs_liveness_data(const MachNode* mach) const { > 395: return G1BarrierStubC2::needs_pre_barrier(mach) || > 396: G1BarrierStubC2::needs_post_barrier(mach); Suggestion: // Liveness data is only required to compute registers that must be // preserved across the runtime call in the pre-barrier stub. return G1BarrierStubC2::needs_pre_barrier(mach); src/hotspot/share/gc/g1/g1BarrierSet.hpp line 56: > 54: // > 55: // The refinement threads mark cards in the current collection set specially on the > 56: // card table - this is fine wrt to synchronization with the mutator, because at Suggestion: // card table - this is fine wrt synchronization with the mutator, because at test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java line 521: > 519: phase = CompilePhase.FINAL_CODE) > 520: @IR(counts = {IRNode.COUNTED_LOOP, "2"}, > 521: phase = CompilePhase.FINAL_CODE) I suggest to remove this extra IR check to avoid over-specifying the expected loop shape. For example, running this test with loop unrolling disabled (`-XX:LoopUnrollLimit=0`) would now fail because only one counted loop would be found. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23739#pullrequestreview-2753154117 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035174209 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035175921 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035177738 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035183250 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035186980 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035192666 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035210464 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035196251 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035198219 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035201056 From tschatzl at openjdk.org Wed Apr 9 12:41:40 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 12:41:40 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Wed, 9 Apr 2025 11:35:26 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: >> >> - * missing file from merge >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq >> - * make young gen length revising independent of refinement thread >> * use a service task >> * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update >> - * fix IR code generation tests that change due to barrier cost changes >> - * factor out card table and refinement table merging into a single >> method >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 >> - * obsolete G1UpdateBufferSize >> >> G1UpdateBufferSize has previously been used to size the refinement >> buffers and impose a minimum limit on the number of cards per thread >> that need to be pending before refinement starts. >> >> The former function is now obsolete with the removal of the dirty >> card queues, the latter functionality has been taken over by the new >> diagnostic option `G1PerThreadPendingCardThreshold`. >> >> I prefer to make this a diagnostic option is better than a product option >> because it is something that is only necessary for some test cases to >> produce some otherwise unwanted behavior (continuous refinement). >> >> CSR is pending. >> - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 145: > >> 143: >> 144: __ bind(is_clean_card); >> 145: // Card was clean. Dirty card and go to next.. > > This code seems unreachable if `!UseCondCardMark`, meaning we only dirty cards here if `UseCondCardMark` is enabled. Is that intentional? Great find! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035280909 From tschatzl at openjdk.org Wed Apr 9 12:50:42 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 12:50:42 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Wed, 9 Apr 2025 11:34:09 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: >> >> - * missing file from merge >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq >> - * make young gen length revising independent of refinement thread >> * use a service task >> * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update >> - * fix IR code generation tests that change due to barrier cost changes >> - * factor out card table and refinement table merging into a single >> method >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 >> - * obsolete G1UpdateBufferSize >> >> G1UpdateBufferSize has previously been used to size the refinement >> buffers and impose a minimum limit on the number of cards per thread >> that need to be pending before refinement starts. >> >> The former function is now obsolete with the removal of the dirty >> card queues, the latter functionality has been taken over by the new >> diagnostic option `G1PerThreadPendingCardThreshold`. >> >> I prefer to make this a diagnostic option is better than a product option >> because it is something that is only necessary for some test cases to >> produce some otherwise unwanted behavior (continuous refinement). >> >> CSR is pending. >> - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101: > >> 99: } >> 100: >> 101: void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators, > > Have you measured the performance impact of inlining this assembly code instead of resorting to a runtime call as done before? Is it worth the maintenance cost (for every platform), risk of introducing bugs, etc.? I remember significant impact in some microbenchmark. It's also inlined in Parallel GC. I do not consider it a big issue wrt to maintenance - these things never really change, and the method is small and contained. I will try to redo numbers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035298557 From thomas.schatzl at oracle.com Wed Apr 9 14:05:56 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 9 Apr 2025 16:05:56 +0200 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> Message-ID: <13c7d913-e61f-47af-a299-6c6b6e2d45f6@oracle.com> Hi Man, On 09.04.25 09:44, Man Cao wrote: > Hi all, > > Thank you Thomas for creating the umbrella CR at https:// > bugs.openjdk.org/browse/JDK-8353716 JDK-8353716>. > While waiting a bit on SoftMaxHeapSize PR (https://github.com/openjdk/ > jdk/pull/24211) to see if others have feedback, I could start working on > CurrentMaxHeapSize (https://bugs.openjdk.org/browse/JDK-8204088). > I also agree that?CurrentMaxHeapSize may not need a JEP due to its small > size and low complexity. Should it proceed similarly to how > SoftMaxHeapSize was introduced? I.e, https://bugs.openjdk.org/browse/ > JDK-8222145, and creating > a CSR (https://bugs.openjdk.org/browse/JDK-8222181) for it. I think this is the best way forward. There is no need for a JEP from me either. Exact behavior in various situations needs to be defined in the CSR. > > Separately, for removing support for?Min/MaxHeapFreeRatio for G1 > (mentioned in https://bugs.openjdk.org/browse/JDK-8353716 and https://bugs.openjdk.org/ > browse/JDK-8238686), how > do we handle existing users that set these two flags? After searching the web a little, it seems that these flags are actually in use, and recommended to be used (e.g. in default settings). So we need some transition strategy to get off them, and can't just remove. One option is to translate these options into other values impacting heap size "similarly". E.g. have Min/MaxHeapFreeRatio translate to internal pressure at the time the changes are noticed, but that is just a potential solution that hand-waves away the effort for that. Then start deprecating and remove; depends a little on how useful (or how much in the way) they are for Serial and Parallel GC (other collectors don't support them). It is unlikely that ZGC and Shenandoah will adopt these. Even already in JDK-8238687 Min/MaxHeapFreeRatio happily work to counter the cpu based sizing, so some solution needs to be found there already. That change will already be quite disruptive in terms of impact on heap sizing, so another option is to remove support in G1. > (We have very few internal users setting these two flags. But yesterday > I ran into a use case that sets -XX:MinHeapFreeRatio=0 - > XX:MaxHeapFreeRatio=0 for G1...) What would be the use case for setting it to these values? There seem to be little upside and lots of downside for this choice, because it likely causes a lot of GC activity since the VM will need GC to expand the heap little by little all the time, and full gc/Remark will immediately reset these expansion efforts. Thomas From tschatzl at openjdk.org Wed Apr 9 14:38:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 14:38:46 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 19:59:09 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: >> >> - * missing file from merge >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq >> - * make young gen length revising independent of refinement thread >> * use a service task >> * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update >> - * fix IR code generation tests that change due to barrier cost changes >> - * factor out card table and refinement table merging into a single >> method >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 >> - * obsolete G1UpdateBufferSize >> >> G1UpdateBufferSize has previously been used to size the refinement >> buffers and impose a minimum limit on the number of cards per thread >> that need to be pending before refinement starts. >> >> The former function is now obsolete with the removal of the dirty >> card queues, the latter functionality has been taken over by the new >> diagnostic option `G1PerThreadPendingCardThreshold`. >> >> I prefer to make this a diagnostic option is better than a product option >> because it is something that is only necessary for some test cases to >> produce some otherwise unwanted behavior (continuous refinement). >> >> CSR is pending. >> - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f > > src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 83: > >> 81: break; >> 82: } >> 83: case G1RemSet::HasRefToOld : break; // Nothing special to do. > > Why doesn't call `inc_cards_clean_again` in this case? The card is cleared also. (In fact, I don't get why this needs to a separate case from `NoInteresting`.) "NoInteresting" means that the card contains no interesting reference at all. "HasRefToOld" means that there has been an interesting reference in the card. The distinction between these groups of cards seems interesting to me. E.g. out of X non-clean cards, there were A with a reference to the collection set, B that were already marked as containing a card to the collection, C not having any interesting card any more (transitioned from clean -> dirty -> clean, and cleared by the mutator), D being non-parsable, and E having references to old (and no other references). I could add a separate counter for these type of cards too - they can be inferred from the total number of scanned minus the others though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035512686 From erik.osterlund at oracle.com Wed Apr 9 15:22:12 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 9 Apr 2025 15:22:12 +0000 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> Message-ID: <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> Hi Man, Sorry to butt in. A high level question about the AHS plan for G1? are we interested in the intermediate functionality (SoftMaxHeapSize and CurrentMaxHeapSize), or is it AHS that we are interested in? The reason I ask is that each incremental feature comes with some baggage due to being a (somewhat) static and manually set limit, which the AHS solution won?t need to deal with. For example, it?s unclear how a *static* SoftMaxHeapSize should behave when the livee set is larger than the limit. While that can maybe be solved in some reasonable way, it?s worth noting that AHS won?t need the solution, because there it?s a dynamic limit that the GC simply won?t set lower than the memory usage after GC. It will however get in the way because the user can now also set a SoftMaxHeapSize that conflicts with the AHS soft heap size that the JVM wants to use, and then we gotta deal with that. Similarly, the CurrentMaxHeapSize adds another way for users to control (read: mess up) the JVM behaviour that we need to respect. In the end, AHS will compute this dynamically instead depending on environment circumstances. I suspect the fact that it can also be manually set in a way that conflicts with what the JVM wants to do, will end up being a pain. I?m not against the plan of building these incremental features, especially if we want them in isolation. But if it?s AHS we want, then I wonder if it would be easier to go straight for what we need for AHS without the intermediate user exposed steps, because they might introduce unnecessary problems along the way. My 50c, no strong opinion though. /Erik On 9 Apr 2025, at 09:44, Man Cao wrote: Hi all, Thank you Thomas for creating the umbrella CR at https://bugs.openjdk.org/browse/JDK-8353716. While waiting a bit on SoftMaxHeapSize PR (https://github.com/openjdk/jdk/pull/24211) to see if others have feedback, I could start working on CurrentMaxHeapSize (https://bugs.openjdk.org/browse/JDK-8204088). I also agree that CurrentMaxHeapSize may not need a JEP due to its small size and low complexity. Should it proceed similarly to how SoftMaxHeapSize was introduced? I.e, https://bugs.openjdk.org/browse/JDK-8222145, and creating a CSR (https://bugs.openjdk.org/browse/JDK-8222181) for it. Separately, for removing support for Min/MaxHeapFreeRatio for G1 (mentioned in https://bugs.openjdk.org/browse/JDK-8353716 and https://bugs.openjdk.org/browse/JDK-8238686), how do we handle existing users that set these two flags? (We have very few internal users setting these two flags. But yesterday I ran into a use case that sets -XX:MinHeapFreeRatio=0 -XX:MaxHeapFreeRatio=0 for G1...) Best, Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirk at kodewerk.com Wed Apr 9 16:14:18 2025 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Wed, 9 Apr 2025 09:14:18 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> Message-ID: <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> > On Apr 9, 2025, at 8:22?AM, Erik Osterlund wrote: > > Hi Man, > > Sorry to butt in. A high level question about the AHS plan for G1? are we interested in the > intermediate functionality (SoftMaxHeapSize and CurrentMaxHeapSize), or is it AHS that > we are interested in? > > The reason I ask is that each incremental feature comes with some baggage due to being > a (somewhat) static and manually set limit, which the AHS solution won?t need to deal with. > > For example, it?s unclear how a *static* SoftMaxHeapSize should behave when the livee set > is larger than the limit. While that can maybe be solved in some reasonable way, it?s worth > noting that AHS won?t need the solution, because there it?s a dynamic limit that the GC simply > won?t set lower than the memory usage after GC. It will however get in the way because the > user can now also set a SoftMaxHeapSize that conflicts with the AHS soft heap size that > the JVM wants to use, and then we gotta deal with that. > > Similarly, the CurrentMaxHeapSize adds another way for users to control (read: mess up) > the JVM behaviour that we need to respect. In the end, AHS will compute this dynamically > instead depending on environment circumstances. I suspect the fact that it can also be > manually set in a way that conflicts with what the JVM wants to do, will end up being a pain. I would agree and to this point, I?ve rarely found ratios to be useful. In general, eden, survivor, and old each play a different role in object life cycle and as such each should be tuned separately from each other. Min/Max heap is the sum of the needs of the parts. Being able to meet the needs of eden, survivor and old by simply setting a max heap and relying on ratios is wishful thinking that sometimes comes true. Might I suggest that an entirely new (experimental?) adaptive size policy be introduced that makes use of current flags in a manner that is appropriate to the new policy. That policy would calculate a size of Eden to control GC frequency, a size of survivor to limit promotion of transients, and a tenured large enough to accommodate the live set as well as manage the expected number of humongous allocations. If global heap pressure won?t support the ensuing max heap size, then the cost could be smaller eden implying higher GC overhead due to increased frequency. Metrics to support eden sizing would be allocation rate. The age table with premature promotion rates would be used to estimate the size of survivor. Live set size with a recent history of humongous allocations would be used for tenured. There will need to be a dampening strategy in play. My current (dumb) idea for Serial is to set an overhead threshold delta that needs to be exceeded to trigger a resize. > > I?m not against the plan of building these incremental features, especially if we want them > in isolation. But if it?s AHS we want, then I wonder if it would be easier to go straight for what > we need for AHS without the intermediate user exposed steps, because they might introduce > unnecessary problems along the way. I would agree with this. And I would suggest that the way to achieve it is to introduce a new experimental ASP. > > My 50c, no strong opinion though. From kdnilsen at openjdk.org Wed Apr 9 17:05:38 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 17:05:38 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 00:27:17 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 78: >> >>> 76: _live_data(0), >>> 77: _critical_pins(0), >>> 78: _mixed_candidate_garbage_words(0), >> >> Do we need a new field to track this? During `final_mark`, we call `increase_live_data_alloc_words` to add `TAMS + top` to `_live_data` to account for objects allocated during mark. Could we "fix" `get_live_data` so that it always returned marked objects (counted by `increase_live_data_gc_words`) _plus_ `top - TAMS`. This way, the live data would not become stale after `final_mark` and we wouldn't have another field to manage. What do you think? > > This is a good idea. Let me experiment with this. My experiment with an initial attempt at this failed with over 60 failures. The "problem" is that we often consult get_live_data() in contexts from which it is "not appropriate" to add (top- TAMS) to the atomic volatile ShenandoahHeapRegion::_live_data() . I think most of these are asserts. I have so far confirmed that there are at least two different places that need to be fixed. Not sure how many total scenarios. I'm willing to move forward with changes to the failing asserts to make this change work. I think the code would be cleaner with your suggested refactor. It just makes this PR a little more far-reaching than the original. See the most recent commit on this PR to see the direction this would move us. Let me know if you think I should move forward with more refactoring, or revert this most recent change. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2035784267 From ayang at openjdk.org Wed Apr 9 17:38:54 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 9 Apr 2025 17:38:54 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio Message-ID: Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. Test: tier1-7 ------------- Commit messages: - pgc-min-initial-fix Changes: https://git.openjdk.org/jdk/pull/24556/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24556&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354228 Stats: 15 lines in 3 files changed: 12 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24556.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24556/head:pull/24556 PR: https://git.openjdk.org/jdk/pull/24556 From wkemper at openjdk.org Wed Apr 9 17:53:31 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 9 Apr 2025 17:53:31 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 17:02:40 GMT, Kelvin Nilsen wrote: >> This is a good idea. Let me experiment with this. > > My experiment with an initial attempt at this failed with over 60 failures. The "problem" is that we often consult get_live_data() in contexts from which it is "not appropriate" to add (top- TAMS) to the atomic volatile ShenandoahHeapRegion::_live_data() . I think most of these are asserts. I have so far confirmed that there are at least two different places that need to be fixed. Not sure how many total scenarios. > > I'm willing to move forward with changes to the failing asserts to make this change work. I think the code would be cleaner with your suggested refactor. It just makes this PR a little more far-reaching than the original. > > See the most recent commit on this PR to see the direction this would move us. Let me know if you think I should move forward with more refactoring, or revert this most recent change. > > Thanks. It does look simpler. Do you have an example of one of the failing asserts? One thing I hadn't considered is how "hot" `ShenandoahHeapRegion::get_live_data_words` is. Is there going to be a significant performance hit if we make this method do more work? It does look like this method is called frequently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2035852703 From kdnilsen at openjdk.org Wed Apr 9 18:03:47 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 18:03:47 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 17:51:06 GMT, William Kemper wrote: >> My experiment with an initial attempt at this failed with over 60 failures. The "problem" is that we often consult get_live_data() in contexts from which it is "not appropriate" to add (top- TAMS) to the atomic volatile ShenandoahHeapRegion::_live_data() . I think most of these are asserts. I have so far confirmed that there are at least two different places that need to be fixed. Not sure how many total scenarios. >> >> I'm willing to move forward with changes to the failing asserts to make this change work. I think the code would be cleaner with your suggested refactor. It just makes this PR a little more far-reaching than the original. >> >> See the most recent commit on this PR to see the direction this would move us. Let me know if you think I should move forward with more refactoring, or revert this most recent change. >> >> Thanks. > > It does look simpler. Do you have an example of one of the failing asserts? > > One thing I hadn't considered is how "hot" `ShenandoahHeapRegion::get_live_data_words` is. Is there going to be a significant performance hit if we make this method do more work? It does look like this method is called frequently. Examples: FullGC worker: void ShenandoahMCResestCompleteBitmapTask::work(uint worker_id) { ShenandoahParallelWorkerSession worker_session(worker_id); ShenandoahHeapRegion* region = _regions.next(); ShenandoahHeap* heap = ShenandoahHeap::heap(); ShenandoahMarkingContext* const ctx = heap->complete_marking_context(); while (region != nullptr) { if (heap->is_bitmap_slice_committed(region) && !region->is_pinned() && region->has_marked()) { // kelvin replacing has_live() with new method has_marked() because has_live() calls get_live_data_words() // and pointer_delta() asserts out because TAMS is not less than top(). has_marked() does what has_live() // used to do... ctx->clear_bitmap(region); } region = _regions.next(); } } ShenandoahInitMarkUpdateRegionStateClosure::heap_region_do() { - assert(!r->has_live(), "Region %zu should have no live data", r->index()); + assert(!r->has_marked(), "Region %zu should have no marked data", r->index()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2035869108 From kdnilsen at openjdk.org Wed Apr 9 18:18:27 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 18:18:27 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 18:01:03 GMT, Kelvin Nilsen wrote: >> It does look simpler. Do you have an example of one of the failing asserts? >> >> One thing I hadn't considered is how "hot" `ShenandoahHeapRegion::get_live_data_words` is. Is there going to be a significant performance hit if we make this method do more work? It does look like this method is called frequently. > > Examples: > FullGC worker: > void ShenandoahMCResestCompleteBitmapTask::work(uint worker_id) { > ShenandoahParallelWorkerSession worker_session(worker_id); > ShenandoahHeapRegion* region = _regions.next(); > ShenandoahHeap* heap = ShenandoahHeap::heap(); > ShenandoahMarkingContext* const ctx = heap->complete_marking_context(); > while (region != nullptr) { > if (heap->is_bitmap_slice_committed(region) && !region->is_pinned() && region->has_marked()) { > // kelvin replacing has_live() with new method has_marked() because has_live() calls get_live_data_words() > // and pointer_delta() asserts out because TAMS is not less than top(). has_marked() does what has_live() > // used to do... > ctx->clear_bitmap(region); > } > region = _regions.next(); > } > } > > ShenandoahInitMarkUpdateRegionStateClosure::heap_region_do() { > - assert(!r->has_live(), "Region %zu should have no live data", r->index()); > + assert(!r->has_marked(), "Region %zu should have no marked data", r->index()); Not sure about performance impact, other than implementing and testing... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2035888970 From kdnilsen at openjdk.org Wed Apr 9 18:24:36 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 18:24:36 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 18:15:38 GMT, Kelvin Nilsen wrote: >> Examples: >> FullGC worker: >> void ShenandoahMCResestCompleteBitmapTask::work(uint worker_id) { >> ShenandoahParallelWorkerSession worker_session(worker_id); >> ShenandoahHeapRegion* region = _regions.next(); >> ShenandoahHeap* heap = ShenandoahHeap::heap(); >> ShenandoahMarkingContext* const ctx = heap->complete_marking_context(); >> while (region != nullptr) { >> if (heap->is_bitmap_slice_committed(region) && !region->is_pinned() && region->has_marked()) { >> // kelvin replacing has_live() with new method has_marked() because has_live() calls get_live_data_words() >> // and pointer_delta() asserts out because TAMS is not less than top(). has_marked() does what has_live() >> // used to do... >> ctx->clear_bitmap(region); >> } >> region = _regions.next(); >> } >> } >> >> ShenandoahInitMarkUpdateRegionStateClosure::heap_region_do() { >> - assert(!r->has_live(), "Region %zu should have no live data", r->index()); >> + assert(!r->has_marked(), "Region %zu should have no marked data", r->index()); > > Not sure about performance impact, other than implementing and testing... i suspect performance impact is minimal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2035896982 From mdoerr at openjdk.org Wed Apr 9 22:26:31 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 9 Apr 2025 22:26:31 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 08:10:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: > > - * missing file from merge > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq > - * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update > - * fix IR code generation tests that change due to barrier cost changes > - * factor out card table and refinement table merging into a single > method > - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 > - * obsolete G1UpdateBufferSize > > G1UpdateBufferSize has previously been used to size the refinement > buffers and impose a minimum limit on the number of cards per thread > that need to be pending before refinement starts. > > The former function is now obsolete with the removal of the dirty > card queues, the latter functionality has been taken over by the new > diagnostic option `G1PerThreadPendingCardThreshold`. > > I prefer to make this a diagnostic option is better than a product option > because it is something that is only necessary for some test cases to > produce some otherwise unwanted behavior (continuous refinement). > > CSR is pending. > - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f This PR needs an update for x86 platforms when merging: g1BarrierSetAssembler_x86.cpp:117:6: error: 'class MacroAssembler' has no member named 'get_thread' ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2791114662 From kdnilsen at openjdk.org Wed Apr 9 22:32:46 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 9 Apr 2025 22:32:46 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v3] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Experiment 2: refinements to reduce regressions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/3c1f788a..8ff388d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=01-02 Stats: 30 lines in 4 files changed: 23 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From kdnilsen at openjdk.org Thu Apr 10 04:36:38 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 10 Apr 2025 04:36:38 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v4] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix garbage_before_padded_for_promote() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/8ff388d1..8e820f29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=02-03 Stats: 6 lines in 1 file changed: 3 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From tschatzl at openjdk.org Thu Apr 10 07:26:28 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 07:26:28 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v31] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits: - * fixes after merge related to 32 bit x86 removal - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review: revising young gen length * robcasloz review: various minor refactorings - Do not unnecessarily pass around tmp2 in x86 - Refine needs_liveness_data - Reorder includes - * missing file from merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - ... and 35 more: https://git.openjdk.org/jdk/compare/45b7c748...39aa903f ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=30 Stats: 7118 lines in 110 files changed: 2586 ins; 3598 del; 934 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Thu Apr 10 07:28:31 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 07:28:31 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 22:24:10 GMT, Martin Doerr wrote: > This PR needs an update for x86 platforms when merging: g1BarrierSetAssembler_x86.cpp:117:6: error: 'class MacroAssembler' has no member named 'get_thread' I fixed this for now, but it will be broken again in just a bit with Aleksey's ongoing removal of x86 32 bit platform efforts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2791807489 From shade at openjdk.org Thu Apr 10 08:36:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 08:36:33 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: <03K6ui5yP3iy8HS_C4nurnsrbOymrm_962YA0-U92IM=.0f83b0ac-5895-4e1a-bb22-0006bd5dd888@github.com> On Thu, 10 Apr 2025 07:25:47 GMT, Thomas Schatzl wrote: > I fixed this for now, but it will be broken again in just a bit with Aleksey's ongoing removal of x86 32 bit platform efforts. I think all x86 cleanups related to GC and adjacent code have landed in mainline now. So I expect no more major conflicts with this PR :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2791985351 From manc at google.com Thu Apr 10 08:45:58 2025 From: manc at google.com (Man Cao) Date: Thu, 10 Apr 2025 01:45:58 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> Message-ID: Re Thomas's comments: I think this is the best way forward. There is no need for a JEP from me > either. > Exact behavior in various situations needs to be defined in the CSR. Thanks. Should I edit https://bugs.openjdk.org/browse/JDK-8204088 in place to change it to a CSR, or do you prefer creating a separate issue? One option is to translate these options into other values impacting > heap size "similarly". E.g. have Min/MaxHeapFreeRatio translate to > internal pressure at the time the changes are noticed, but that is just > a potential solution that hand-waves away the effort for that. > Then start deprecating and remove; depends a little on how useful (or > how much in the way) they are for Serial and Parallel GC (other > collectors don't support them). It is unlikely that ZGC and Shenandoah > will adopt these. I feel like both approaches have additional problems: For the first approach, even with a translation mechanism, it still has the problem of GCTimeRatio and Min/MaxHeapFreeRatio overriding each other. I think the only solution is to translate Min/MaxHeapFreeRatio directly to a value for GCTimeRatio, as well as making GCTimeRatio a manageable flag. Agree that the effort to implement this approach is nontrivial. For the second approach, Min/MaxHeapFreeRatio are pretty popular flags for Parallel GC, so it could be difficult to remove them for Parallel GC. Even already in JDK-8238687 Min/MaxHeapFreeRatio happily work to counter > the cpu based sizing, so some solution needs to be found there already. That change will already be quite disruptive in terms of impact on heap > sizing, so another option is to remove support in G1. I think removing support for Min/MaxHeapFreeRatio only for G1 is feasible, as long as we provide a replacement approach. Some high-level guidance like "if you set Min/MaxHeapFreeRatio to small values such as XX, try lowering GCTimeRatio to YY" may be acceptable. The downside is that it requires users of Min/MaxHeapFreeRatio to re-tune JVM parameters. One unresolved use case is dynamically changing Min/MaxHeapFreeRatio due to them being manageable. Perhaps we could make GCTimeRatio manageable? But Parallel GC and Shenandoah also use GCTimeRatio, so it could be difficult. Or if we reconsider the high-precedence SoftMaxHeapSize implementation ( https://github.com/openjdk/jdk/pull/24211), perhaps users who dynamically set Min/MaxHeapFreeRatio could move to set SoftMaxHeapSize instead. > (We have very few internal users setting these two flags. But yesterday > > I ran into a use case that sets -XX:MinHeapFreeRatio=0 - > > XX:MaxHeapFreeRatio=0 for G1...) > What would be the use case for setting it to these values? There seem to be little upside and lots of downside for this choice, > because it likely causes a lot of GC activity since the VM will need GC > to expand the heap little by little all the time, and full gc/Remark > will immediately reset these expansion efforts. The use case is to create a process snapshot image via CRIU (checkpoint/restore), like what https://openjdk.org/projects/crac does. The application wants G1 to shrink the heap as much as possible, to reduce the size of the snapshot. It sets both flags to zero, performs several System.gc(), then sets both flags back to previous values, then creates the snapshot. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Thu Apr 10 09:07:39 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 09:07:39 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v32] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 46 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * fixes after merge related to 32 bit x86 removal - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review: revising young gen length * robcasloz review: various minor refactorings - Do not unnecessarily pass around tmp2 in x86 - Refine needs_liveness_data - Reorder includes - * missing file from merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - ... and 36 more: https://git.openjdk.org/jdk/compare/f94a4f7a...fcf96a2a ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=31 Stats: 7112 lines in 110 files changed: 2592 ins; 3594 del; 926 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From ayang at openjdk.org Thu Apr 10 09:12:32 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 10 Apr 2025 09:12:32 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 14:32:43 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 83: >> >>> 81: break; >>> 82: } >>> 83: case G1RemSet::HasRefToOld : break; // Nothing special to do. >> >> Why doesn't call `inc_cards_clean_again` in this case? The card is cleared also. (In fact, I don't get why this needs to a separate case from `NoInteresting`.) > > "NoInteresting" means that the card contains no interesting reference at all. "HasRefToOld" means that there has been an interesting reference in the card. > > The distinction between these groups of cards seems interesting to me. E.g. out of X non-clean cards, there were A with a reference to the collection set, B that were already marked as containing a card to the collection, C not having any interesting card any more (transitioned from clean -> dirty -> clean, and cleared by the mutator), D being non-parsable, and E having references to old (and no other references). > > I could add a separate counter for these type of cards too - they can be inferred from the total number of scanned minus the others though. I see; "clean again" means the existing interesting pointer was overwritten by mutator. I misinterpret the comment as cards transitioned from dirty to clean. ` size_t _cards_clean_again; // Dirtied cards that were cleaned.` To prevent misunderstanding, what do you think of renaming "NoInteresting" to "NoCrossRegion" and "_cards_clean_again" to "_cards_no_cross_region", or sth alike so that the 1:1 mapping is clearer? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2036885633 From manc at google.com Thu Apr 10 09:30:54 2025 From: manc at google.com (Man Cao) Date: Thu, 10 Apr 2025 02:30:54 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> Message-ID: Re Eric's comments: Sorry to butt in. A high level question about the AHS plan for G1? are we > interested in the > intermediate functionality (SoftMaxHeapSize and CurrentMaxHeapSize), or is > it AHS that > we are interested in? No worries, and I appreciate the comment. The high-level rationale is that JVM should provide at least one of SoftMaxHeapSize or CurrentMaxHeapSize as a high-precedence, manageable flag, so that the JVM could take customized input signal for heap sizing decisions. Even with fully-developed AHS algorithm, it cannot satisfy all deployment environments. E.g. custom container system or custom OS, in which the JVM cannot detect system memory pressure via standard approaches. So these flags are not necessarily intermediate solutions, and they could allow more deployment environments to use AHS. For SoftMaxHeapSize for G1, based on discussion in https://github.com/openjdk/jdk/pull/24211, it will likely become just hint to trigger concurrent marks, which will be unlikely to interfere with other parts of G1 AHS. For my original proposal of high-precedence SoftMaxHeapSize (as currently implemented in the PR), the guidance for users is that they should either provide a mechanism to adjust SoftMaxHeapSize dynamically to prevent GC thrashing, or only set it temporarily and accept the risk of GC thrashing. It is not intended as a static value that the user "sets and forgets". For CurrentMaxHeapSize, it has similar issues as high-precedence SoftMaxHeapSize, that it is not "sets and forgets". However, I can see that clearly-specified OutOfMemoryError behavior from CurrentMaxHeapSize could be more favorable than hard-to-define potential GC thrashing condition that a high-precedence SoftMaxHeapSize could cause. Re Kirk's comments: > Might I suggest that an entirely new (experimental?) adaptive size policy > be introduced that makes use of current flags in a manner that is > appropriate to the new policy. That policy would calculate a size of Eden > to control GC frequency, a size of survivor to limit promotion of > transients, and a tenured large enough to accommodate the live set as well > as manage the expected number of humongous allocations. If global heap > pressure won?t support the ensuing max heap size, then the cost could be > smaller eden implying higher GC overhead due to increased frequency. > Metrics to support eden sizing would be allocation rate. The age table > with premature promotion rates would be used to estimate the size of > survivor. Live set size with a recent history of humongous allocations > would be used for tenured. > There will need to be a dampening strategy in play. My current (dumb) idea > for Serial is to set an overhead threshold delta that needs to be exceeded > to trigger a resize. I don't quite understand how this adaptive size policy (ASP) solves the problems AHS tries to solve. AHS tries solve the problem of reaching an appropriate target *total* heap size, based on multiple inputs (JVM flags, environment circumstances). Once a total heap size is determined, G1 uses existing algorithms to determine young-gen and old-gen sizes. However, the ASP seems to focus on determining young-gen and old-gen sizes using a new algorithm. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Thu Apr 10 10:02:40 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 10:02:40 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v33] In-Reply-To: References: Message-ID: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - * indentation fix - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/fcf96a2a..068d2a37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=31-32 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Thu Apr 10 10:02:41 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 10:02:41 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: <03K6ui5yP3iy8HS_C4nurnsrbOymrm_962YA0-U92IM=.0f83b0ac-5895-4e1a-bb22-0006bd5dd888@github.com> References: <03K6ui5yP3iy8HS_C4nurnsrbOymrm_962YA0-U92IM=.0f83b0ac-5895-4e1a-bb22-0006bd5dd888@github.com> Message-ID: On Thu, 10 Apr 2025 08:34:00 GMT, Aleksey Shipilev wrote: > > I fixed this for now, but it will be broken again in just a bit with Aleksey's ongoing removal of x86 32 bit platform efforts. > > I think all x86 cleanups related to GC and adjacent code have landed in mainline now. So I expect no more major conflicts with this PR :) Thanks. :) @TheRealMDoerr: should be fixed now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2792213039 From tschatzl at openjdk.org Thu Apr 10 11:01:42 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 11:01:42 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Wed, 9 Apr 2025 12:48:10 GMT, Thomas Schatzl wrote: >> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101: >> >>> 99: } >>> 100: >>> 101: void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators, >> >> Have you measured the performance impact of inlining this assembly code instead of resorting to a runtime call as done before? Is it worth the maintenance cost (for every platform), risk of introducing bugs, etc.? > > I remember significant impact in some microbenchmark. It's also inlined in Parallel GC. I do not consider it a big issue wrt to maintenance - these things never really change, and the method is small and contained. > I will try to redo numbers. >From our microbenchmarks (higher numbers are better): Current code: Benchmark (size) Mode Cnt Score Error Units ArrayCopyObject.conjoint_micro 31 thrpt 15 166136.959 ? 5517.157 ops/ms ArrayCopyObject.conjoint_micro 63 thrpt 15 108880.108 ? 4331.112 ops/ms ArrayCopyObject.conjoint_micro 127 thrpt 15 93159.977 ? 5025.458 ops/ms ArrayCopyObject.conjoint_micro 2047 thrpt 15 17234.842 ? 831.344 ops/ms ArrayCopyObject.conjoint_micro 4095 thrpt 15 9202.216 ? 292.612 ops/ms ArrayCopyObject.conjoint_micro 8191 thrpt 15 3565.705 ? 121.116 ops/ms ArrayCopyObject.disjoint_micro 31 thrpt 15 159106.245 ? 5965.576 ops/ms ArrayCopyObject.disjoint_micro 63 thrpt 15 95475.658 ? 5415.267 ops/ms ArrayCopyObject.disjoint_micro 127 thrpt 15 84249.979 ? 6313.007 ops/ms ArrayCopyObject.disjoint_micro 2047 thrpt 15 10682.650 ? 381.832 ops/ms ArrayCopyObject.disjoint_micro 4095 thrpt 15 4471.940 ? 216.439 ops/ms ArrayCopyObject.disjoint_micro 8191 thrpt 15 1378.296 ? 33.421 ops/ms ArrayCopy.arrayCopyObject N/A avgt 15 13.880 ? 0.517 ns/op ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 14.844 ? 0.751 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 11.080 ? 0.703 ns/op ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 11.003 ? 0.135 ns/op Runtime call: Benchmark (size) Mode Cnt Score Error Units ArrayCopyObject.conjoint_micro 31 thrpt 15 73100.230 ? 11079.381 ops/ms ArrayCopyObject.conjoint_micro 63 thrpt 15 65039.431 ? 1996.832 ops/ms ArrayCopyObject.conjoint_micro 127 thrpt 15 58336.711 ? 2260.660 ops/ms ArrayCopyObject.conjoint_micro 2047 thrpt 15 17035.419 ? 524.445 ops/ms ArrayCopyObject.conjoint_micro 4095 thrpt 15 9207.661 ? 286.526 ops/ms ArrayCopyObject.conjoint_micro 8191 thrpt 15 3264.491 ? 73.848 ops/ms ArrayCopyObject.disjoint_micro 31 thrpt 15 84587.219 ? 3007.310 ops/ms ArrayCopyObject.disjoint_micro 63 thrpt 15 62815.254 ? 1214.310 ops/ms ArrayCopyObject.disjoint_micro 127 thrpt 15 58423.470 ? 285.670 ops/ms ArrayCopyObject.disjoint_micro 2047 thrpt 15 10720.462 ? 617.173 ops/ms ArrayCopyObject.disjoint_micro 4095 thrpt 15 4178.195 ? 178.942 ops/ms ArrayCopyObject.disjoint_micro 8191 thrpt 15 1374.268 ? 44.290 ops/ms ArrayCopy.arrayCopyObject N/A avgt 15 19.667 ? 0.740 ns/op ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 21.243 ? 1.891 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 16.645 ? 0.504 ns/op ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 17.409 ? 0.705 ns/op Obviously with larger arrays, the impact diminishes, but it's always there. I think the inlined code is worth the effort in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2037086410 From rcastanedalo at openjdk.org Thu Apr 10 11:22:36 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 10 Apr 2025 11:22:36 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Thu, 10 Apr 2025 10:58:24 GMT, Thomas Schatzl wrote: >> I remember significant impact in some microbenchmark. It's also inlined in Parallel GC. I do not consider it a big issue wrt to maintenance - these things never really change, and the method is small and contained. >> I will try to redo numbers. > > From our microbenchmarks (higher numbers are better): > > Current code: > > Benchmark (size) Mode Cnt Score Error Units > ArrayCopyObject.conjoint_micro 31 thrpt 15 166136.959 ? 5517.157 ops/ms > ArrayCopyObject.conjoint_micro 63 thrpt 15 108880.108 ? 4331.112 ops/ms > ArrayCopyObject.conjoint_micro 127 thrpt 15 93159.977 ? 5025.458 ops/ms > ArrayCopyObject.conjoint_micro 2047 thrpt 15 17234.842 ? 831.344 ops/ms > ArrayCopyObject.conjoint_micro 4095 thrpt 15 9202.216 ? 292.612 ops/ms > ArrayCopyObject.conjoint_micro 8191 thrpt 15 3565.705 ? 121.116 ops/ms > ArrayCopyObject.disjoint_micro 31 thrpt 15 159106.245 ? 5965.576 ops/ms > ArrayCopyObject.disjoint_micro 63 thrpt 15 95475.658 ? 5415.267 ops/ms > ArrayCopyObject.disjoint_micro 127 thrpt 15 84249.979 ? 6313.007 ops/ms > ArrayCopyObject.disjoint_micro 2047 thrpt 15 10682.650 ? 381.832 ops/ms > ArrayCopyObject.disjoint_micro 4095 thrpt 15 4471.940 ? 216.439 ops/ms > ArrayCopyObject.disjoint_micro 8191 thrpt 15 1378.296 ? 33.421 ops/ms > ArrayCopy.arrayCopyObject N/A avgt 15 13.880 ? 0.517 ns/op > ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 14.844 ? 0.751 ns/op > ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 11.080 ? 0.703 ns/op > ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 11.003 ? 0.135 ns/op > > Runtime call: > > Benchmark (size) Mode Cnt Score Error Units > ArrayCopyObject.conjoint_micro 31 thrpt 15 73100.230 ? 11079.381 ops/ms > ArrayCopyObject.conjoint_micro 63 thrpt 15 65039.431 ? 1996.832 ops/ms > ArrayCopyObject.conjoint_micro 127 thrpt 15 58336.711 ? 2260.660 ops/ms > ArrayCopyObject.conjoint_micro 2047 thrpt 15 17035.419 ? 524.445 ops/ms > ArrayCopyObject.conjoint_micro 4095 thrpt 15 9207.661 ? 286.526 ops/ms > ArrayCopyObject.conjoint_micro 8191 thrpt 15 3264.491 ? 73.848 ops/ms > ArrayCopyObject.disjoint_micro 31 thrpt 15 84587.219 ? 3007.310 ops/ms > ArrayCopyObject.disjoint_micro ... Fair enough, thanks for the measurements! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2037121277 From tschatzl at openjdk.org Thu Apr 10 11:41:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 11:41:33 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio In-Reply-To: References: Message-ID: <89h5aK0Oop82whqONpjyoqsYaLnShKKDmPSpxhMpVJQ=.b29ac864-000f-4987-bf6c-27c9299c7730@github.com> On Wed, 9 Apr 2025 17:33:07 GMT, Albert Mingkun Yang wrote: > Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. > > Test: tier1-7 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/shared/gc_globals.hpp line 415: > 413: product(uintx, InitialSurvivorRatio, 8, \ > 414: "Initial ratio of young generation/survivor space size") \ > 415: range(0, max_uintx) \ There is code somewhere which sets InitialSurvivorRatio to 3 if it is smaller than that. It should be removed. Somewhere around `parallelArguments.cpp:108). There is similar code next to it for `MinSurvivorRatio` which is dead code too (`MinSurvivorRatio` is already bounded with 3 at minimum). Also, previously this value has been overridden silently, bailing out is a behavioral change that requires a CSR. ------------- PR Review: https://git.openjdk.org/jdk/pull/24556#pullrequestreview-2756365732 PR Review Comment: https://git.openjdk.org/jdk/pull/24556#discussion_r2037149128 From ayang at openjdk.org Thu Apr 10 11:59:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 10 Apr 2025 11:59:52 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v2] In-Reply-To: References: Message-ID: > Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. > > Test: tier1-7 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24556/files - new: https://git.openjdk.org/jdk/pull/24556/files/6dfd92bf..1cd03d17 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24556&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24556&range=00-01 Stats: 11 lines in 1 file changed: 0 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24556.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24556/head:pull/24556 PR: https://git.openjdk.org/jdk/pull/24556 From kdnilsen at openjdk.org Thu Apr 10 16:28:21 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 10 Apr 2025 16:28:21 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v5] In-Reply-To: References: Message-ID: <5jhoXMiuinw50NFwWr_kQdOudqZTx-3rfX8-4eCr4OY=.565602e3-8dc6-47eb-aa36-ddc5b9f27a08@github.com> > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Refactor for better abstraction - Fix set_live() after full gc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/8e820f29..eb2679aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=03-04 Stats: 13 lines in 3 files changed: 3 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From erik.osterlund at oracle.com Thu Apr 10 17:30:09 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 10 Apr 2025 17:30:09 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> Message-ID: <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> > On 10 Apr 2025, at 11:31, Man Cao wrote: > > Even with fully-developed AHS algorithm, it cannot satisfy all deployment environments. E.g. custom container system or custom OS, in which the JVM cannot detect system memory pressure via standard approaches. So these flags are not necessarily intermediate solutions, and they could allow more deployment environments to use AHS. Could you elaborate the concrete scenario you have in mind? What use case do you have in mind where AHS is not enough, while external heap control is? /Erik From manc at google.com Thu Apr 10 20:18:07 2025 From: manc at google.com (Man Cao) Date: Thu, 10 Apr 2025 13:18:07 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: Re Erik: > Could you elaborate the concrete scenario you have in mind? What use case do you have in mind where AHS is not enough, while external heap control is? One example is a customized container environment that requires non-standard approaches to read container memory usage and container memory limit, i.e., the application cannot use standard cgroup's memory.memsw.usage_in_bytes, memory.memsw.max_usage_in_bytes control files. Instead, the customized container could provide its own library for the application to get container usage and limit. Without CurrentMaxHeapSize or a high-precedence SoftMaxHeapSize, the JVM has no way to use the container-provided library to get signals for memory pressure. With such JVM flags, the application could use the container-provided library to calculate a value for those JVM flags based on memory pressure, and pass that information to the JVM. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.osterlund at oracle.com Thu Apr 10 21:02:08 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 10 Apr 2025 21:02:08 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: On 10 Apr 2025, at 22:18, Man Cao wrote: One example is a customized container environment that requires non-standard approaches to read container memory usage and container memory limit, i.e., the application cannot use standard cgroup's memory.memsw.usage_in_bytes, memory.memsw.max_usage_in_bytes control files. Instead, the customized container could provide its own library for the application to get container usage and limit. If the custom container app allocates 300 GB native memory with, for example, panama APIs or JNI, what will happen? Is it allowed, or limited? /Erik -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdnilsen at openjdk.org Thu Apr 10 21:55:45 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 10 Apr 2025 21:55:45 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v6] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Remove deprecation conditional compiles - Adjust candidate live memory for each mixed evac ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/eb2679aa..ef783d48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=04-05 Stats: 85 lines in 6 files changed: 24 ins; 61 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From manc at google.com Thu Apr 10 22:15:03 2025 From: manc at google.com (Man Cao) Date: Thu, 10 Apr 2025 15:15:03 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > If the custom container app allocates 300 GB native memory with, for example, panama APIs or JNI, what will happen? Is it allowed, or limited? I suppose the more accurate way to put it is "if an app inside the custom container environment allocates 300 GB native memory ...". The custom container environment itself is not a Java app. If the container memory limit is 310GiB, container usage is 305GiB, and the app's current Java heap size is 3GiB, and Xmx is 20GiB, then the app could set CurrentMaxHeapSize=8G (310 - 305 + 3), or CurrentMaxHeapSize=7G (to give 1GiB head room for growth from other non-heap memory: code cache, thread stack, metaspace, etc.), to prevent running out of container memory limit. Note that the app should actively monitor container usage to adjust CurrentMaxHeapSize, e.g. increasing CurrentMaxHeapSize when container usage drops. If the app keeps allocating more native memory, CurrentMaxHeapSize will further drop, and it will eventually die with Java OutOfMemoryError. In the above case, the JVM is unaware of the 310G container limit or the 305G container usage. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From ysr at openjdk.org Thu Apr 10 22:36:25 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 10 Apr 2025 22:36:25 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v6] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 21:55:45 GMT, Kelvin Nilsen wrote: >> The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. >> >> However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. >> >> This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - Remove deprecation conditional compiles > - Adjust candidate live memory for each mixed evac Haven't started looking at these changes, but I do wonder if it might be worthwhile to also consider (and implement under a tunable flag) the alternative policy of never adding to the collection set any regions that are still "active" at the point when the collection set for a marking cycle is first assembled at the end of the final marking. That way we don't have to do any re-computing, and the criterion for evacuation is garbage-first (or liveness-least) both of which remain invariant (and complements of each other) throughout the duration of evacuation and obviating entirely the need for recomputing the goodness/choice metric afresh. The downside is that we may leave some garbage on the table in the active regions, but this is probably a minor price for most workloads and heap configurations, and doesn't unnecessarily complicate or overengineer the solution. One question to consider is how G1 does this. May be regions placed in the collection set are retired (i.e. made inactive?) -- I prefer not to forcibly retire active regions as this wastes space that may have been usable. Thoughts? (Can add this comment and discuss on the ticket if that is logistically preferable.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24319#issuecomment-2795315167 From mbeckwit at openjdk.org Fri Apr 11 02:19:33 2025 From: mbeckwit at openjdk.org (Monica Beckwith) Date: Fri, 11 Apr 2025 02:19:33 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v8] In-Reply-To: References: Message-ID: <3x_0x1y1pPb4CI4eSx1FUDNoqPCbWhv-Se1FwbC5mlE=.a0ccd4e9-8ea1-4540-8e55-4b992c58b8b1@github.com> On Fri, 4 Apr 2025 09:01:30 GMT, Thomas Schatzl wrote: > Meanwhile, @mo-beck do you guys have preference on how SoftMaxHeapSize should work? Thanks for the thoughtful work here ? this PR is a solid step toward strengthening G1?s memory footprint management, and I support it. This patch adds support for `SoftMaxHeapSize` in both expansion and shrinkage paths, as well as IHOP calculation, ensuring it's part of the regular heap policy logic. As I outlined in my [original note](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050191.html) and follow-up on [AHS integration](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html), my intent has been to use `SoftMaxHeapSize` as a guiding input ? a soft signal ? within a broader dynamic heap sizing controller that considers GC overhead, mutator behavior, and memory availability. This patch lays the groundwork for that direction. The behavior when the live set exceeds the soft target has come up in the discussion. My view remains that the heap should be influenced by the value, not strictly bound to it. That?s the balance I?ve been aiming for in describing how it integrates into the control loop ? SoftMax helps inform decisions, but doesn?t unconditionally restrict them. I agree that we?ll want to follow up with logic that can respond to GC pressure and workload needs, to avoid any unintended performance issues. I?ll update [JDK-8353716](https://bugs.openjdk.org/browse/JDK-8353716) to reflect this, and I?ll continue the thread on the mailing list to coordinate the next phase. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2795676870 From erik.osterlund at oracle.com Fri Apr 11 05:52:27 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Fri, 11 Apr 2025 05:52:27 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: Okay it seems to me that the use case you are describing is wanting a container with an enforced memory limit. It should quack like a cgroup and walk like a cgroup but must not actually use cgroups for some reason. Cgroups were seemingly built for this use case and has a complete view of the memory usage in the container due to being an OS feature. Conversely, if the custom ad-hoc container environment does not have OS support for the memory limit, then the app can temporarily exceed the memory limit, and hence won?t be as effective of a limit. But if you want to actually enforce a memory limit such that the app dies if it exceeds the limit I can?t help but wonder? why not use a cgroup to declare that limit though? Regardless, I wonder if what you actually want for your use case is a way to tell AHS what the max memory of the entire JVM should be, similar to the -XX:RssLimit Thomas Stuefe proposed: https://bugs.openjdk.org/browse/JDK-8321266 In other words, letting the JVM know that it has a bound on memory, and have AHS know about and try to adapt the heap such that the JVM memory usage is below the limit when native memory goes up and down. In other words, let the heap heuristics live in the JVM. Perhaps then the limit would also be static, or do the containers themselves actually grow and shrink at runtime, or was the dynamic nature of CurrentMaxHeapSize mostly an artifact of out sourcing the heap heuristics of an otherwise static custom container limit? /Erik On 11 Apr 2025, at 00:15, Man Cao wrote: ? > If the custom container app allocates 300 GB native memory with, for example, panama APIs or JNI, what will happen? Is it allowed, or limited? I suppose the more accurate way to put it is "if an app inside the custom container environment allocates 300 GB native memory ...". The custom container environment itself is not a Java app. If the container memory limit is 310GiB, container usage is 305GiB, and the app's current Java heap size is 3GiB, and Xmx is 20GiB, then the app could set CurrentMaxHeapSize=8G (310 - 305 + 3), or CurrentMaxHeapSize=7G (to give 1GiB head room for growth from other non-heap memory: code cache, thread stack, metaspace, etc.), to prevent running out of container memory limit. Note that the app should actively monitor container usage to adjust CurrentMaxHeapSize, e.g. increasing CurrentMaxHeapSize when container usage drops. If the app keeps allocating more native memory, CurrentMaxHeapSize will further drop, and it will eventually die with Java OutOfMemoryError. In the above case, the JVM is unaware of the 310G container limit or the 305G container usage. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From aboldtch at openjdk.org Fri Apr 11 06:20:11 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 11 Apr 2025 06:20:11 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly Message-ID: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` Currently running this through testing. ------------- Commit messages: - 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly Changes: https://git.openjdk.org/jdk/pull/24589/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24589&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354358 Stats: 31 lines in 2 files changed: 7 ins; 2 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24589/head:pull/24589 PR: https://git.openjdk.org/jdk/pull/24589 From stefank at openjdk.org Fri Apr 11 07:02:39 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 11 Apr 2025 07:02:39 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly In-Reply-To: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Fri, 11 Apr 2025 06:14:42 GMT, Axel Boldt-Christmas wrote: > Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. > > However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. > > Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 > `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` > > Currently running this through testing. Looks good. As a follow-up, we might want to move the pre-touching so that we don't start and stop threads multiple times. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24589#pullrequestreview-2759342921 From jsikstro at openjdk.org Fri Apr 11 07:05:40 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 11 Apr 2025 07:05:40 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly In-Reply-To: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Fri, 11 Apr 2025 06:14:42 GMT, Axel Boldt-Christmas wrote: > Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. > > However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. > > Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 > `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` > > Currently running this through testing. src/hotspot/share/gc/z/zPageAllocator.cpp line 1011: > 1009: const size_t claimed_size = claim_virtual(size, &vmems); > 1010: > 1011: // Each partition must have at least size total vmems available when priming. Maybe something like "The partition must have size available in virtual memory when priming"? I'm reading this as the number of vmems, not the size of them combined. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24589#discussion_r2038925034 From aboldtch at openjdk.org Fri Apr 11 07:45:03 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 11 Apr 2025 07:45:03 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v2] In-Reply-To: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: > Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. > > However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. > > Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 > `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` > > Currently running this through testing. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Update Comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24589/files - new: https://git.openjdk.org/jdk/pull/24589/files/0abce51a..70b0e923 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24589&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24589&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24589/head:pull/24589 PR: https://git.openjdk.org/jdk/pull/24589 From jsikstro at openjdk.org Fri Apr 11 07:47:31 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 11 Apr 2025 07:47:31 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v2] In-Reply-To: References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: <-P89Vbi7uncmcA5LSlyADETTuDB5EJWG3NaarpyAouk=.7364df7e-5e0d-484a-b53e-44614f2eabe6@github.com> On Fri, 11 Apr 2025 07:45:03 GMT, Axel Boldt-Christmas wrote: >> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. >> >> However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. >> >> Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 >> `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` >> >> Currently running this through testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update Comment Looks good. As you say, this is nicely implemented with features from the Mapped Cache. ------------- Marked as reviewed by jsikstro (Committer). PR Review: https://git.openjdk.org/jdk/pull/24589#pullrequestreview-2759442646 From stefank at openjdk.org Fri Apr 11 07:52:25 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 11 Apr 2025 07:52:25 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v2] In-Reply-To: References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Fri, 11 Apr 2025 07:45:03 GMT, Axel Boldt-Christmas wrote: >> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. >> >> However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. >> >> Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 >> `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` >> >> Currently running this through testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update Comment Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24589#pullrequestreview-2759456953 From eosterlund at openjdk.org Fri Apr 11 10:37:41 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 11 Apr 2025 10:37:41 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v2] In-Reply-To: References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Fri, 11 Apr 2025 07:45:03 GMT, Axel Boldt-Christmas wrote: >> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. >> >> However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. >> >> Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 >> `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` >> >> Currently running this through testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update Comment Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24589#pullrequestreview-2759908182 From jsikstro at openjdk.org Fri Apr 11 11:38:08 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 11 Apr 2025 11:38:08 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing Message-ID: Hello, > This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth making, considering that memory for buffers of strings usually outweigh this extra memory cost. Additionally, when factoring in the improved code understandability and maintainability, I feel like it's a change worth making. Some new changes in the way the printing looks are: * Epsilon has received indentation in its print_on, which was not there before, in an effort to look similar to other GCs and also improve readability. * Shenandoah has also received indentation to behave similar to other GCs. * "the space" in Serial's output was indented by two spaces, now it's one. * With the removal of print_on from print_on_error, I've also removed Epsilon's barrier set printing, making it's print_on_error empty. Before this, Serial printed two spaces between the sections in the hs_err file. Code re-structure: * PSOldGen::print_on had an inlined version of virtual_space()->print_space_boundaries_on(st), which is now called instead. * PSYoungGen::print_on had its name inlined. Now, name() is called instead, which is how PSOldGen::print_on does it. * I've added a common print_space_boundaries_on for the virtual space used in Serial's DefNewGeneration and TenuredGeneration, like how Parallel does it. * I've opted to use fill_to() in Metaspace printing so that it works well with ZGC printing. This does not really affect other GCs since only ZGC aligns with the same column as Metaspace. Testing: * GHA, Oracle's tier 1-3 * Manual inspection of printed content * Exit printing `-Xlog:gc+heap+exit=info` * Periodic printing `-Xlog:gc+heap=debug` * jcmd `jcmd GC.heap_info` * jcmd `jcmd VM.info` * hs_err file, both "Heap:" and "Heap before/after invocations=" printing, `-XX:ErrorHandlerTest=14` ------------- Commit messages: - 8354362: Use automatic indentation in CollectedHeap printing Changes: https://git.openjdk.org/jdk/pull/24593/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354362 Stats: 239 lines in 26 files changed: 88 ins; 88 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Fri Apr 11 11:38:08 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 11 Apr 2025 11:38:08 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 11:28:12 GMT, Joel Sikstr?m wrote: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Ping @tstuefe regarding changes for `StreamAutoIndentor`. Would be nice to get your opinion since you are the author of it and its uses :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24593#issuecomment-2796653117 From rcastanedalo at openjdk.org Fri Apr 11 13:01:49 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Apr 2025 13:01:49 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v33] In-Reply-To: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> References: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> Message-ID: On Thu, 10 Apr 2025 10:02:40 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * indentation fix > - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade Thank you for addressing my comments, Thomas! The new x64 version of `G1BarrierSetAssembler::gen_write_ref_array_post_barrier` looks correct, but I think it could be significantly simplified, here is my suggestion which is more similar to the aarch64 version: https://github.com/robcasloz/jdk/commit/fbedc0ae1ec5fcfa95b00ad354986885c7a56ce0 (note: did not test it thoroughly). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2796850628 From rcastanedalo at openjdk.org Fri Apr 11 13:10:33 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Apr 2025 13:10:33 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v33] In-Reply-To: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> References: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> Message-ID: On Thu, 10 Apr 2025 10:02:40 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * indentation fix > - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade > G1 sets UseCondCardMark to true by default. The conditional card mark corresponds to the third filter in the write barrier now, and since I decided to keep all filters for this change, it makes sense to directly use this mechanism. Do you have performance results for `-UseCondCardMark` vs. `+UseCondCardMark`? The benefit of `+UseCondCardMark` is not obvious from looking at the generated barrier code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2796872496 From rcastanedalo at openjdk.org Fri Apr 11 14:30:32 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 11 Apr 2025 14:30:32 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v33] In-Reply-To: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> References: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> Message-ID: On Thu, 10 Apr 2025 10:02:40 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * indentation fix > - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade The compiler-related parts of this change (including x64 and aarch64 changes) look good! These are the files I reviewed: - `src/hotspot/share/gc/g1/g1BarrierSet*` - `src/hotspot/share/gc/g1/{c1,c2}` - `src/hotspot/cpu/{x86,aarch64}` - `test/hotspot/jtreg/compiler` - `test/hotspot/jtreg/testlibrary_tests` ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23739#pullrequestreview-2760546283 From wkemper at openjdk.org Fri Apr 11 20:46:01 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 11 Apr 2025 20:46:01 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times Message-ID: Without enforcing limits on `ShenandoahControlIntervalMin` and `ShenandoahControlIntervalMax`, the user may supply values that cause assertions to fail. ------------- Commit messages: - Enforce limits on control thread's minimum and maximum sleep times Changes: https://git.openjdk.org/jdk/pull/24602/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24602&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354452 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24602.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24602/head:pull/24602 PR: https://git.openjdk.org/jdk/pull/24602 From ysr at openjdk.org Fri Apr 11 20:59:30 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 11 Apr 2025 20:59:30 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 20:41:00 GMT, William Kemper wrote: > Without enforcing limits on `ShenandoahControlIntervalMin` and `ShenandoahControlIntervalMax`, the user may supply values that cause assertions to fail. > > This assertion failure has been observed in Genshen's regulator thread: > > #0 0x000028e8062d021a in ShenandoahRegulatorThread::regulator_sleep (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:125 > #1 0x000028e8062d0027 in ShenandoahRegulatorThread::regulate_young_and_old_cycles (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:95 > #2 0x000028e8062cfd06 in ShenandoahRegulatorThread::run_service (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:51 > > But it could just as easily happen in other modes to the `ShenandoahControlThread` instance. Left a comment for consideration but changes look fine if this changes doesn't interfere with potential tuning space etc. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 1: > 1: /* Change looks fine, but I wonder about using a `naked_sleep()` and allowing longer durations without triggering asserts in those cases? Not sure where this could be used and whether 1-second is the maximum we might like for these numbers regardless. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24602#pullrequestreview-2761556102 PR Review Comment: https://git.openjdk.org/jdk/pull/24602#discussion_r2040287010 From wkemper at openjdk.org Fri Apr 11 21:06:30 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 11 Apr 2025 21:06:30 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: Message-ID: <3BSSSNcGzbGojKBsi0fMQ9y4CXR3xnGWMlsMVixnbSo=.fcaca705-04f0-45c7-b0c7-ed1355265edb@github.com> On Fri, 11 Apr 2025 20:55:31 GMT, Y. Srinivas Ramakrishna wrote: >> Without enforcing limits on `ShenandoahControlIntervalMin` and `ShenandoahControlIntervalMax`, the user may supply values that cause assertions to fail. >> >> This assertion failure has been observed in Genshen's regulator thread: >> >> #0 0x000028e8062d021a in ShenandoahRegulatorThread::regulator_sleep (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:125 >> #1 0x000028e8062d0027 in ShenandoahRegulatorThread::regulate_young_and_old_cycles (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:95 >> #2 0x000028e8062cfd06 in ShenandoahRegulatorThread::run_service (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:51 >> >> But it could just as easily happen in other modes to the `ShenandoahControlThread` instance. > > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 1: > >> 1: /* > > Change looks fine, but I wonder about using a `naked_sleep()` and allowing longer durations without triggering asserts in those cases? Not sure where this could be used and whether 1-second is the maximum we might like for these numbers regardless. 1 second is enforced by `naked_sleep` itself, so raising it would impact all callers. Not using `naked_sleep` would be possible here, but the default maximum sleep time is 10ms. Even 1 second (well, 999ms) would make the heuristics dangerously slow to respond. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24602#discussion_r2040294574 From ysr at openjdk.org Fri Apr 11 21:12:25 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 11 Apr 2025 21:12:25 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: <3BSSSNcGzbGojKBsi0fMQ9y4CXR3xnGWMlsMVixnbSo=.fcaca705-04f0-45c7-b0c7-ed1355265edb@github.com> References: <3BSSSNcGzbGojKBsi0fMQ9y4CXR3xnGWMlsMVixnbSo=.fcaca705-04f0-45c7-b0c7-ed1355265edb@github.com> Message-ID: On Fri, 11 Apr 2025 21:03:59 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 1: >> >>> 1: /* >> >> Change looks fine, but I wonder about using a `naked_sleep()` and allowing longer durations without triggering asserts in those cases? Not sure where this could be used and whether 1-second is the maximum we might like for these numbers regardless. > > 1 second is enforced by `naked_sleep` itself, so raising it would impact all callers. Not using `naked_sleep` would be possible here, but the default maximum sleep time is 10ms. Even 1 second (well, 999ms) would make the heuristics dangerously slow to respond. Hmm, curious, I see this: // Convenience wrapper around naked_short_sleep to allow for longer sleep // times. Only for use by non-JavaThreads. void os::naked_sleep(jlong millis) { assert(!Thread::current()->is_Java_thread(), "not for use by JavaThreads"); const jlong limit = 999; while (millis > limit) { naked_short_sleep(limit); millis -= limit; } naked_short_sleep(millis); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24602#discussion_r2040297668 From ysr at openjdk.org Fri Apr 11 21:12:25 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 11 Apr 2025 21:12:25 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: <3BSSSNcGzbGojKBsi0fMQ9y4CXR3xnGWMlsMVixnbSo=.fcaca705-04f0-45c7-b0c7-ed1355265edb@github.com> Message-ID: On Fri, 11 Apr 2025 21:08:04 GMT, Y. Srinivas Ramakrishna wrote: >> 1 second is enforced by `naked_sleep` itself, so raising it would impact all callers. Not using `naked_sleep` would be possible here, but the default maximum sleep time is 10ms. Even 1 second (well, 999ms) would make the heuristics dangerously slow to respond. > > Hmm, curious, I see this: > > // Convenience wrapper around naked_short_sleep to allow for longer sleep > // times. Only for use by non-JavaThreads. > void os::naked_sleep(jlong millis) { > assert(!Thread::current()->is_Java_thread(), "not for use by JavaThreads"); > const jlong limit = 999; > while (millis > limit) { > naked_short_sleep(limit); > millis -= limit; > } > naked_short_sleep(millis); > } Still if ppl aren't gonna need longer than 1 sec, and longer is a bad idea, then limiting it is a good idea. Reviewed. ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24602#discussion_r2040298962 From wkemper at openjdk.org Fri Apr 11 21:12:26 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 11 Apr 2025 21:12:26 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: <3BSSSNcGzbGojKBsi0fMQ9y4CXR3xnGWMlsMVixnbSo=.fcaca705-04f0-45c7-b0c7-ed1355265edb@github.com> Message-ID: On Fri, 11 Apr 2025 21:09:36 GMT, Y. Srinivas Ramakrishna wrote: >> Hmm, curious, I see this: >> >> // Convenience wrapper around naked_short_sleep to allow for longer sleep >> // times. Only for use by non-JavaThreads. >> void os::naked_sleep(jlong millis) { >> assert(!Thread::current()->is_Java_thread(), "not for use by JavaThreads"); >> const jlong limit = 999; >> while (millis > limit) { >> naked_short_sleep(limit); >> millis -= limit; >> } >> naked_short_sleep(millis); >> } > > Still if ppl aren't gonna need longer than 1 sec, and longer is a bad idea, then limiting it is a good idea. > Reviewed. ? Aye - we _could_ use that, but I don't think we _should_. Having the heuristics sleep longer than this between evaluations wouldn't do anyone any good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24602#discussion_r2040299379 From kdnilsen at openjdk.org Fri Apr 11 21:28:12 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 11 Apr 2025 21:28:12 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v7] In-Reply-To: References: Message-ID: > The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. > > However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. > > This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix uninitialized variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24319/files - new: https://git.openjdk.org/jdk/pull/24319/files/ef783d48..e6e44b67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319 From wkemper at openjdk.org Fri Apr 11 21:28:31 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 11 Apr 2025 21:28:31 GMT Subject: RFR: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 20:56:26 GMT, Y. Srinivas Ramakrishna wrote: >> Without enforcing limits on `ShenandoahControlIntervalMin` and `ShenandoahControlIntervalMax`, the user may supply values that cause assertions to fail. >> >> This assertion failure has been observed in Genshen's regulator thread: >> >> #0 0x000028e8062d021a in ShenandoahRegulatorThread::regulator_sleep (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:125 >> #1 0x000028e8062d0027 in ShenandoahRegulatorThread::regulate_young_and_old_cycles (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:95 >> #2 0x000028e8062cfd06 in ShenandoahRegulatorThread::run_service (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:51 >> >> But it could just as easily happen in other modes to the `ShenandoahControlThread` instance. > > Left a comment for consideration but changes look fine if this changes doesn't interfere with potential tuning space etc. Appreciate the careful review @ysramakrishna ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24602#issuecomment-2798035149 From wkemper at openjdk.org Fri Apr 11 21:28:32 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 11 Apr 2025 21:28:32 GMT Subject: Integrated: 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 20:41:00 GMT, William Kemper wrote: > Without enforcing limits on `ShenandoahControlIntervalMin` and `ShenandoahControlIntervalMax`, the user may supply values that cause assertions to fail. > > This assertion failure has been observed in Genshen's regulator thread: > > #0 0x000028e8062d021a in ShenandoahRegulatorThread::regulator_sleep (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:125 > #1 0x000028e8062d0027 in ShenandoahRegulatorThread::regulate_young_and_old_cycles (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:95 > #2 0x000028e8062cfd06 in ShenandoahRegulatorThread::run_service (this=0x4ef9701893b0) at src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp:51 > > But it could just as easily happen in other modes to the `ShenandoahControlThread` instance. This pull request has now been integrated. Changeset: e8bcedb0 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/e8bcedb09b0e5eeb77bf1dc3a87bb61d7a5e8404 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8354452: Shenandoah: Enforce range checks on parameters controlling heuristic sleep times Reviewed-by: ysr ------------- PR: https://git.openjdk.org/jdk/pull/24602 From kdnilsen at openjdk.org Fri Apr 11 21:30:28 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 11 Apr 2025 21:30:28 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v6] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 22:33:28 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove deprecation conditional compiles >> - Adjust candidate live memory for each mixed evac > > Haven't started looking at these changes, but I do wonder if it might be worthwhile to also consider (and implement under a tunable flag) the alternative policy of never adding to the collection set any regions that are still "active" at the point when the collection set for a marking cycle is first assembled at the end of the final marking. That way we don't have to do any re-computing, and the criterion for evacuation is garbage-first (or liveness-least) both of which remain invariant (and complements of each other) throughout the duration of evacuation and obviating entirely the need for recomputing the goodness/choice metric afresh. > > The downside is that we may leave some garbage on the table in the active regions, but this is probably a minor price for most workloads and heap configurations, and doesn't unnecessarily complicate or overengineer the solution. > > One question to consider is how G1 does this. May be regions placed in the collection set are retired (i.e. made inactive?) -- I prefer not to forcibly retire active regions as this wastes space that may have been usable. > > Thoughts? (Can add this comment and discuss on the ticket if that is logistically preferable.) @ysramakrishna : Interesting idea. Definitely worthy of an experiment. On the upside, this can make GC more "efficient" by procrastinating until the GC effort maximizes the returns of allocatable memory. On the downside, this can allow garbage to hide out for arbitrarily long times in regions that are not "fully used". I'd be in favor of proposing these experiments and possible feature enhancements in the context of a separate JBS ticket. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24319#issuecomment-2798040688 From manc at google.com Sat Apr 12 08:07:27 2025 From: manc at google.com (Man Cao) Date: Sat, 12 Apr 2025 01:07:27 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > Okay it seems to me that the use case you are describing is wanting a container with an enforced memory limit. It should quack like a cgroup and walk like a cgroup but must not actually use cgroups for some reason. > Cgroups were seemingly built for this use case and has a complete view of the memory usage in the container due to being an OS feature. > Conversely, if the custom ad-hoc container environment does not have OS support for the memory limit, then the app can temporarily exceed the memory limit, and hence won?t be as effective of a limit. > But if you want to actually enforce a memory limit such that the app dies if it exceeds the limit I can?t help but wonder? why not use a cgroup to declare that limit though? The custom container has additional features that cgroup does not have. Enforcing memory limit is only a basic feature. Other features of the custom container are largely irrelevant to the AHS discussion (and I'm not sure if I could publicly share those features). In fact, the custom container is more of an extension or wrapper on top of cgroup. It is quite likely we have internal patches to the OS kernel to support the custom container. > Regardless, I wonder if what you actually want for your use case is a way to tell AHS what the max memory of the entire JVM should be, similar to the -XX:RssLimit Thomas Stuefe proposed: https://bugs.openjdk.org/browse/JDK-8321266 > In other words, letting the JVM know that it has a bound on memory, and have AHS know about and try to adapt the heap such that the JVM memory usage is below the limit when native memory goes up and down. In other words, let the heap heuristics live in the JVM. Perhaps then the limit would also be static, or do the containers themselves actually grow and shrink at runtime, or was the dynamic nature of CurrentMaxHeapSize mostly an artifact of out sourcing the heap heuristics of an otherwise static custom container limit? The custom container's memory limit could dynamically change at runtime, thus -XX:RssLimit or -XX:CurrentMaxHeapSize must be a manageable flag. In fact, cgroup also supports changing memory limit dynamically: https://unix.stackexchange.com/questions/555080/using-cgroup-to-limit-program-memory-as-its-running . Having a manageable -XX:RssLimit, and making the JVM adjust heap size according to RssLimit, could in theory replace CurrentMaxHeapSize. However, I could think of the following issues with the RssLimit approach: 1. Description of https://bugs.openjdk.org/browse/JDK-8321266 indicates RssLimit is intended for debugging and regression testing, to abort the JVM when it uses more Rss than expected. It does not involve resizing the heap to survive the RssLimit. Adding heap resizing seems a significant change to the original intended use. 2. Calculating an appropriate heap size based on RssLimit seems challenging. Typically only part of the heap memory mapping contributes to Rss. The JVM probably has to continuously monitor the total Rss, as well as Rss from heap memory mappings, then apply a heuristic to compute a target heap size. 3. Applications still need a mechanism to dynamically adjust values for RssLimit, just as for CurrentMaxHeapSize. Providing a value for RssLimit is not really easier than for CurrentMaxHeapSize, e.g. when a Java process and several non-Java processes run inside the same container (this is the common case in our deployment). It seems that RssLimit is not necessarily easier to use than CurrentMaxHeapSize, but definitely more complicated to implement (due to 1 and 2). -Man > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.osterlund at oracle.com Sat Apr 12 09:48:01 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Sat, 12 Apr 2025 09:48:01 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > On 12 Apr 2025, at 10:07, Man Cao wrote: > > In fact, the custom container is more of an extension or wrapper on top of cgroup. It is quite likely we have internal patches to the OS kernel to support the custom container. Okay, that makes sense. So you do use cgroups for your containers. And you do want to limit their memory. So why don?t you want to use the cgroup memory limits? > It seems that RssLimit is not necessarily easier to use than CurrentMaxHeapSize, but definitely more complicated to implement (due to 1 and 2). Okay. /Erik From aboldtch at openjdk.org Mon Apr 14 12:53:18 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 14 Apr 2025 12:53:18 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v3] In-Reply-To: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: > Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. > > However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. > > Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 > `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` > > Currently running this through testing. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Update outdated TestZNMT.java comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24589/files - new: https://git.openjdk.org/jdk/pull/24589/files/70b0e923..a33c7e39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24589&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24589&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24589/head:pull/24589 PR: https://git.openjdk.org/jdk/pull/24589 From stefank at openjdk.org Mon Apr 14 13:14:01 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 14 Apr 2025 13:14:01 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v3] In-Reply-To: References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Mon, 14 Apr 2025 12:53:18 GMT, Axel Boldt-Christmas wrote: >> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. >> >> However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. >> >> Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 >> `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` >> >> Currently running this through testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update outdated TestZNMT.java comment Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24589#pullrequestreview-2764278721 From duke at openjdk.org Mon Apr 14 13:16:24 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Mon, 14 Apr 2025 13:16:24 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v2] In-Reply-To: References: Message-ID: > After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. > So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. > > When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. > > Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. > > before this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = false {product lp64_product} {default} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > > after this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = true {product lp64_product} {ergonomic} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) Tongbao Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24541/files - new: https://git.openjdk.org/jdk/pull/24541/files/6b139085..c31c7340 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=00-01 Stats: 88 lines in 1 file changed: 88 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24541/head:pull/24541 PR: https://git.openjdk.org/jdk/pull/24541 From aboldtch at openjdk.org Mon Apr 14 13:31:59 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 14 Apr 2025 13:31:59 GMT Subject: RFR: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly [v3] In-Reply-To: References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: <5cRwpuZ7f9ZUWRVpNawcpP9AEOpiT-Uy-RJGdRu8KlY=.670800f5-f2e6-45fd-ac78-66e89f9c5719@github.com> On Mon, 14 Apr 2025 12:53:18 GMT, Axel Boldt-Christmas wrote: >> Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. >> >> However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. >> >> Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 >> `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` >> >> Currently running this through testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update outdated TestZNMT.java comment Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24589#issuecomment-2801714127 From aboldtch at openjdk.org Mon Apr 14 13:31:59 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 14 Apr 2025 13:31:59 GMT Subject: Integrated: 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly In-Reply-To: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> References: <6zPy4G14yw81LVO7jiCYpXTur3-JuwPYv4eH8PYzcuI=.970690bf-2542-4ca1-8578-9b1637f56611@github.com> Message-ID: On Fri, 11 Apr 2025 06:14:42 GMT, Axel Boldt-Christmas wrote: > Prior to [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the VM would not have started if we received a discontiguous heap reservation with all reservations smaller than the inital heap capacity. Now we crash because `ZPartition::prime` does not take this into account. > > However in contrast to the page cache, the mapped cache makes it trivial to support this scenario. So I propose fixing `ZPartition::prime` to handle any discontiguous heap reservation. > > Can be provoked in a debug build by using ZForceDiscontiguousHeapReservations > 16 > `java -XX:+UseZGC -XX:ZForceDiscontiguousHeapReservations=17 -Xmx128m -Xms128m --version` > > Currently running this through testing. This pull request has now been integrated. Changeset: 97e10757 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/97e10757392859a46360b4ab379429212fbc34b3 Stats: 34 lines in 3 files changed: 7 ins; 4 del; 23 mod 8354358: ZGC: ZPartition::prime handle discontiguous reservations correctly Reviewed-by: stefank, jsikstro, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24589 From kdnilsen at openjdk.org Mon Apr 14 16:40:47 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 14 Apr 2025 16:40:47 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v7] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 18:21:43 GMT, Kelvin Nilsen wrote: >> Not sure about performance impact, other than implementing and testing... > > i suspect performance impact is minimal. I've committed changes that endeavor to implement the suggested refactor. Performance impact does appear to be minimal. This broader refactoring does change behavior slightly. In particular: 1. We now have a better understanding of live-memory evacuated during mixed evacuations. This allows the selection of old-candidates for mixed evacuations to be more conservative. We'll have fewer old regions in order to honor the intended budget. 2. Potentially, this will result in more mixed evacuations, but each mixed evacuation should take less time. 3. There should be no impact on behavior of traditional Shenandoah. On one recently completed test run, we observed the following impacts compared to tip: Shenandoah ------------------------------------------------------------------------------------------------------- +80.69% specjbb2015/trigger_failure p=0.00000 Control: 58.250 (+/- 13.48 ) 110 Test: 105.250 (+/- 33.13 ) 30 Genshen ------------------------------------------------------------------------------------------------------- -19.46% jme/context_switch_count p=0.00176 Control: 117.420 (+/- 28.01 ) 108 Test: 98.292 (+/- 32.76 ) 30 Perhaps we need more data to decide whether this is "significant". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2042510606 From lmesnik at openjdk.org Tue Apr 15 01:35:04 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 15 Apr 2025 01:35:04 GMT Subject: RFR: 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API Message-ID: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> Just minor clean up of WB API usage. Also changed othervm to driver. ------------- Commit messages: - use driver - 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API Changes: https://git.openjdk.org/jdk/pull/24642/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24642&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354559 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24642/head:pull/24642 PR: https://git.openjdk.org/jdk/pull/24642 From lmesnik at openjdk.org Tue Apr 15 01:57:57 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 15 Apr 2025 01:57:57 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v2] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 13:16:24 GMT, Tongbao Zhang wrote: >> After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. >> So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. >> >> When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. >> >> Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. >> >> before this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = false {product lp64_product} {default} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) >> >> >> after this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = true {product lp64_product} {ergonomic} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > Tongbao Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly Marked as reviewed by lmesnik (Reviewer). Sorry, I wanted to ask you to change test, not approve it yet. test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 30: > 28: * @test TestG1CompressedOops > 29: * @bug 8354145 > 30: * @requires vm.gc.G1 & vm.opt.G1HeapRegionSize == null The test ignores external VM flags, so vm.opt.G1HeapRegionSize is not needed. But it is needed to add `* @requires vm.flagless` test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 32: > 30: * @requires vm.gc.G1 & vm.opt.G1HeapRegionSize == null > 31: * @summary Verify that the flag TestG1CompressedOops is updated properly > 32: * @modules java.base/jdk.internal.misc Is any of those 2 modules is used by tests? I don't see it in the test. test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 35: > 33: * @modules java.management/sun.management > 34: * @library /test/lib > 35: * @library / Why this line is needed? I don't see any dependencies on "/" If you use some test code outside directory, better to build them. ------------- PR Review: https://git.openjdk.org/jdk/pull/24541#pullrequestreview-2766273464 Changes requested by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24541#pullrequestreview-2766313637 PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043328713 PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043315584 PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043311695 From duke at openjdk.org Tue Apr 15 02:47:24 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Tue, 15 Apr 2025 02:47:24 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v3] In-Reply-To: References: Message-ID: > After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. > So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. > > When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. > > Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. > > before this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = false {product lp64_product} {default} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > > after this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = true {product lp64_product} {ergonomic} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: remove useless jtreg tags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24541/files - new: https://git.openjdk.org/jdk/pull/24541/files/c31c7340..f08e4177 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24541/head:pull/24541 PR: https://git.openjdk.org/jdk/pull/24541 From duke at openjdk.org Tue Apr 15 02:52:42 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Tue, 15 Apr 2025 02:52:42 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v2] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 01:52:22 GMT, Leonid Mesnik wrote: >> Tongbao Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly > > test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 30: > >> 28: * @test TestG1CompressedOops >> 29: * @bug 8354145 >> 30: * @requires vm.gc.G1 & vm.opt.G1HeapRegionSize == null > > The test ignores external VM flags, so vm.opt.G1HeapRegionSize is not needed. > But it is needed to add > `* @requires vm.flagless` done > test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 32: > >> 30: * @requires vm.gc.G1 & vm.opt.G1HeapRegionSize == null >> 31: * @summary Verify that the flag TestG1CompressedOops is updated properly >> 32: * @modules java.base/jdk.internal.misc > > Is any of those 2 modules is used by tests? I don't see it in the test. removed these two modules > Why this line is needed? I don't see any dependencies on "/" If you use some test code outside directory, better to build them. Yes, the GCArguments depends on the ```@library /``` , many tests in ``` test/hotspot/jtreg/gc/arguments``` use this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043407145 PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043406996 PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2043406500 From duke at openjdk.org Tue Apr 15 03:01:40 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Tue, 15 Apr 2025 03:01:40 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v2] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 01:54:35 GMT, Leonid Mesnik wrote: > Sorry, I wanted to ask you to change test, not approve it yet. Got it, thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24541#issuecomment-2803626376 From duke at openjdk.org Tue Apr 15 03:31:48 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Tue, 15 Apr 2025 03:31:48 GMT Subject: RFR: 8354145: G1GC: keep the CompressedOops same as before when not setting HeapRegionSize explicitly [v4] In-Reply-To: References: Message-ID: > After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. > So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. > > When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. > > Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. > > before this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = false {product lp64_product} {default} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > > after this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = true {product lp64_product} {ergonomic} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24541/files - new: https://git.openjdk.org/jdk/pull/24541/files/f08e4177..17c0a8a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24541&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24541/head:pull/24541 PR: https://git.openjdk.org/jdk/pull/24541 From ayang at openjdk.org Tue Apr 15 07:58:56 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 15 Apr 2025 07:58:56 GMT Subject: RFR: 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API In-Reply-To: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> References: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> Message-ID: On Tue, 15 Apr 2025 01:29:50 GMT, Leonid Mesnik wrote: > Just minor clean up of WB API usage. > Also changed othervm to driver. Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24642#pullrequestreview-2767242168 From kbarrett at openjdk.org Tue Apr 15 09:09:46 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 15 Apr 2025 09:09:46 GMT Subject: RFR: 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API In-Reply-To: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> References: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> Message-ID: On Tue, 15 Apr 2025 01:29:50 GMT, Leonid Mesnik wrote: > Just minor clean up of WB API usage. > Also changed othervm to driver. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24642#pullrequestreview-2767475519 From jbhateja at openjdk.org Tue Apr 15 13:57:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 15 Apr 2025 13:57:38 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Message-ID: ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. Please review and share your feedback. Best Regards, Jatin PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. ------------- Commit messages: - 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Changes: https://git.openjdk.org/jdk/pull/24664/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354668 Stats: 16 lines in 4 files changed: 5 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24664/head:pull/24664 PR: https://git.openjdk.org/jdk/pull/24664 From aboldtch at openjdk.org Tue Apr 15 14:52:58 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 15 Apr 2025 14:52:58 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 13:50:40 GMT, Jatin Bhateja wrote: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. Looks good but need to communicate with JVMCI implementors. Also pre-exisiting but maybe `ZBarrierRelocationFormatLoadGoodAfterShl` should be called `ZBarrierRelocationFormatLoadGoodAfterShX` as we use it for both shr and shl. src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.hpp line 52: > 50: #endif // COMPILER2 > 51: > 52: const int ZBarrierRelocationFormatLoadGoodAfterShl = 0; Suggestion: const int ZBarrierRelocationFormatLoadGoodAfterShl = 0; src/hotspot/cpu/x86/jvmciCodeInstaller_x86.cpp line 223: > 221: return true; > 222: #if INCLUDE_ZGC > 223: case Z_BARRIER_RELOCATION_FORMAT_LOAD_GOOD_BEFORE_SHL: Should probably communicate with the JVMCI / Graal @dougxc so we can both update this exported symbol name to reflect the new behaviour, and give them the opportunity to adapt to the new relocation patching. ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24664#pullrequestreview-2768666320 PR Review Comment: https://git.openjdk.org/jdk/pull/24664#discussion_r2044778342 PR Review Comment: https://git.openjdk.org/jdk/pull/24664#discussion_r2044814373 From manc at google.com Tue Apr 15 19:24:58 2025 From: manc at google.com (Man Cao) Date: Tue, 15 Apr 2025 12:24:58 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > Okay, that makes sense. So you do use cgroups for your containers. And you do want to limit their memory. So why don?t you want to use the cgroup memory limits? One example is that the custom container has a custom implementation for soft limit, and still uses cgroup memory limits as hard limit. Apps that are "good citizens" should strive to stay below the soft limit. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.osterlund at oracle.com Tue Apr 15 20:38:36 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Tue, 15 Apr 2025 20:38:36 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: Hi Man, > On 15 Apr 2025, at 21:25, Man Cao wrote: > > ? > > Okay, that makes sense. So you do use cgroups for your containers. And you do want to limit their memory. So why don?t you want to use the cgroup memory limits? > > One example is that the custom container has a custom implementation for soft limit, and still uses cgroup memory limits as hard limit. Apps that are "good citizens" should strive to stay below the soft limit. That?s exactly what the purpose of memory.high is. With cgroups v2, memory.high is a soft limit while memory.max is a hard limit. AHS should respect both really. /Erik From manc at google.com Tue Apr 15 21:27:14 2025 From: manc at google.com (Man Cao) Date: Tue, 15 Apr 2025 14:27:14 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > > One example is that the custom container has a custom implementation for soft limit, and still uses cgroup memory limits as hard limit. Apps that are "good citizens" should strive to stay below the soft limit. > That?s exactly what the purpose of memory.high is. With cgroups v2, memory.high is a soft limit while memory.max is a hard limit. AHS should respect both really. Supporting both memory.high and memory.max for AHS sounds great. The soft limit for the custom container is only one example. The custom container also has "strange" use cases where the actual limit is larger than cgroup's hard memory limit. Going back to the high level, the point is that it is impractical for organizations such as us to change deployment environments (e.g. migrating from custom container to standard container) in order to use AHS. A flag such as CurrentMaxHeapSize will definitely help these use cases adopt AHS. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlong at openjdk.org Wed Apr 16 02:01:49 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Apr 2025 02:01:49 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 13:50:40 GMT, Jatin Bhateja wrote: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. This looks OK, but we could do better. Instead of making the relocation point to the end of the instruction and then looking up the offset with patch_barrier_relocation_offset(), why not make the offset always 0 and have the relocation point to the data offset inside the instruction? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2807988702 From stuefe at openjdk.org Wed Apr 16 05:28:47 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Apr 2025 05:28:47 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 11:28:12 GMT, Joel Sikstr?m wrote: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Hi @jsikstro, good cleanup, some small nits remain. Cheers, Thomas src/hotspot/share/gc/shared/collectedHeap.cpp line 119: > 117: heap->print_on(&st); > 118: MetaspaceUtils::print_on(&st); > 119: } Pre-existing, the other cases of printing in this file have a preceding ResourceMark. It is either needed here or not needed there. src/hotspot/share/memory/metaspace.cpp line 221: > 219: MetaspaceCombinedStats stats = get_combined_statistics(); > 220: out->print("Metaspace"); > 221: out->fill_to(17); We rely on absolute position here? Will not work well with different indentation levels. src/hotspot/share/utilities/vmError.cpp line 1399: > 1397: st->cr(); > 1398: } > 1399: Universe::heap()->print_on_error(st); Why is print_on_error called outside the indentation scope? ------------- PR Review: https://git.openjdk.org/jdk/pull/24593#pullrequestreview-2770781675 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2046093409 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2046096635 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2046084544 From jbhateja at openjdk.org Wed Apr 16 07:52:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Apr 2025 07:52:09 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: review comment resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24664/files - new: https://git.openjdk.org/jdk/pull/24664/files/1a5a73c0..ffd92c37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24664/head:pull/24664 PR: https://git.openjdk.org/jdk/pull/24664 From jbhateja at openjdk.org Wed Apr 16 07:52:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Apr 2025 07:52:09 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 01:58:53 GMT, Dean Long wrote: > This looks OK, but we could do better. Instead of making the relocation point to the end of the instruction and then looking up the offset with patch_barrier_relocation_offset(), why not make the offset always 0 and have the relocation point to the data offset inside the instruction? Hi @dean-long , As of now, barrier relocations are placed either before[1] or after[2] the instructions, offset is then added to compute the effective address of the patch site. I think you are suggesting to extend the barrier structure itself to cache the patch site address. For this bug fix PR I intend to make the patch offset agnostic to REX/REX2 prefix without disturbing the existing implimentation. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L394 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L397 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2808697302 From stuefe at openjdk.org Wed Apr 16 08:30:51 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Apr 2025 08:30:51 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 11:28:12 GMT, Joel Sikstr?m wrote: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Notes: - We may want to simplify at some point and merge streamIndentor and streamAutoIndentor. That includes checking which existing call sites use streamIndentor *without* wanting auto indentation. Not sure but I guess there are none. I think the existing cases fall into two categories: where streamIndentor was used on a stream that had already autoindent enabled, and where the code uses "cr_indent()" or "indent" to manually indent. - It would be nice to have a short comment in collectedHeap.hpp about when print_on resp print_on_error is used. From your explanation, I expect print_on_error to be used for information that should only be printed in case of a fatal error, right? - To simplify and prevent mistakes, we should consider making set_autoindent in outputStream private and make the indentor RAII classes friends of outputStream. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24593#issuecomment-2808804192 From duke at openjdk.org Wed Apr 16 09:10:53 2025 From: duke at openjdk.org (duke) Date: Wed, 16 Apr 2025 09:10:53 GMT Subject: Withdrawn: 8340434: Excessive Young GCs Triggered by CodeCache GC Threshold In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:43:50 GMT, sli-x wrote: > The trigger of _codecache_GC_threshold in CodeCache::gc_on_allocation is the key to this problem. > > if (used_ratio > threshold) { > // After threshold is reached, scale it by free_ratio so that more aggressive > // GC is triggered as we approach code cache exhaustion > threshold *= free_ratio; > } > // If code cache has been allocated without any GC at all, let's make sure > // it is eventually invoked to avoid trouble. > if (allocated_since_last_ratio > threshold) { > // In case the GC is concurrent, we make sure only one thread requests the GC. > if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { > log_info(codecache)("Triggering threshold (%.3f%%) GC due to allocating %.3f%% since last unloading (%.3f%% used -> %.3f%% used)", > threshold * 100.0, allocated_since_last_ratio * 100.0, last_used_ratio * 100.0, used_ratio * 100.0); > Universe::heap()->collect(GCCause::_codecache_GC_threshold); > } > } > > Here with the limited codecache size, the free_ratio will get lower and lower (so as the threshold) if no methods can be swept and thus leads to a more and more frequent collection behavior. Since the collection happens in stw, the whole performance of gc will also be degraded. > > So a simple solution is to delete the scaling logic here. However, I think here lies some problems worth further exploring. > > There're two options to control a code cache sweeper, StartAggressiveSweepingAt and SweeperThreshold. StartAggressiveSweepingAt is a sweeper triggered for little space in codeCache and does little harm. However, SweeperThreshold, first introduced by [JDK-8244660](https://bugs.openjdk.org/browse/JDK-8244660), was designed for a regular sweep for codecache, when codeCache sweeper and heap collection are actually individual. After [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) and some patches related, the old mechanism of codeCache sweeper is merged into a concurrent heap collection. So the Code cache sweeper heuristics and the unloading behavior will be promised by the concurrent collection. There's no longer any "zombie" methods to be counted. Considering it will introduce lots of useless collection jobs, I think SweeperThreshold should be deleted now. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21084 From erik.osterlund at oracle.com Wed Apr 16 09:45:39 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 16 Apr 2025 09:45:39 +0000 Subject: [External] : Re: Moving Forward with AHS for G1 In-Reply-To: References: <5dc9c3e2-fe3e-4c53-b8dc-3d55337187e5@oracle.com> <6088CF86-8F42-4800-86BB-952426FA2564@oracle.com> <5210B365-EB7D-498F-BF21-02B9629B1338@kodewerk.com> <4E901C51-BBD6-431A-9282-5432A8AD8B9B@oracle.com> Message-ID: > On 15 Apr 2025, at 23:27, Man Cao wrote: > > ? > > > One example is that the custom container has a custom implementation for soft limit, and still uses cgroup memory limits as hard limit. Apps that are "good citizens" should strive to stay below the soft limit. > > That?s exactly what the purpose of memory.high is. With cgroups v2, memory.high is a soft limit while memory.max is a hard limit. AHS should respect both really. > > Supporting both memory.high and memory.max for AHS sounds great. > The soft limit for the custom container is only one example. The custom container also has "strange" use cases where the actual limit is larger than cgroup's hard memory limit. Okay, great. Sounds like AHS + actually using the standardized cgroups memory limits as the way of limiting memory is a viable path forward then? > Going back to the high level, the point is that it is impractical for organizations such as us to change deployment environments (e.g. migrating from custom container to standard container) in order to use AHS. A flag such as CurrentMaxHeapSize will definitely help these use cases adopt AHS. So the main point for introducing CurrentMaxHeapSize, as opposed to going directly to AHS, would be to support all the people out there that already built their own adaptive container infrastructure that doesn?t use industry standard cgroup technology to limit memory. Instead, this group of users use the very proposed CurrentMaxHeapSize functionality (which obviously does not exist in mainline yet) to limit memory adaptively instead. I have to be honest? this sounds like a niche feature to me with a ticking clock attached to it. Yet if it gets integrated, we will not be able to get rid of it for decades and it will cost maintenance overheads along the way. So I think it would be good to see a prominent use case that might be interesting for a long time going forward as well, and not just a way to help you guys stop using the proposed feature in the transition to AHS, which seems to be where we are going. I think what will reach a much broader audience going forward, is AHS. And if that?s the feature we really want, I can?t help but wonder if exposing this user configurable stuff along the way is helping towards that goal rather than slowing us down by inventing yet another set of manually set handcuffs that the JVM and AHS will have to respect for ages, way past its best before date. /Erik From jsikstro at openjdk.org Wed Apr 16 13:25:47 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 13:25:47 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 05:21:41 GMT, Thomas Stuefe wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > src/hotspot/share/gc/shared/collectedHeap.cpp line 119: > >> 117: heap->print_on(&st); >> 118: MetaspaceUtils::print_on(&st); >> 119: } > > Pre-existing, the other cases of printing in this file have a preceding ResourceMark. It is either needed here or not needed there. The ResourceMarks that are used in other places in this file are not needed anymore. The reason they are placed where they are is because previously (a long time ago, since before [this](https://github.com/openjdk/jdk/commit/d12604111ccd6a5da38602077f4574adc850d9b8#diff-f9496186f2b54da5514e073a08b00afe2e2f8fbae899b13c182c8fbccc7aa7a6) commit), they were next to creating a debug stream. When the debug stream was replaced with a LogStream, the ResourceMark should have followed the LogStream, but it didn't in the changes for print_heap_{before,after}_gc(), see universe.cpp in [this](https://github.com/openjdk/jdk/commit/d12604111ccd6a5da38602077f4574adc850d9b8#diff-f9496186f2b54da5514e073a08b00afe2e2f8fbae899b13c182c8fbccc7aa7a6) commit, where the printing methods were before being moved to collectedHeap.cpp. The ResourceMarks should be removed, like Casper has done in [JDK-8294954](https://github.com/openjdk/jdk/pull/24162). I talked with Casper about the ResourceMarks, as he have looked over why the ResourceMarks are there in his patch and he agrees that they should be removed from print_heap_{before,after}_gc(), as they are likely there only for the LogStream. To summarise, no, ResourceMarks are not needed here, and they should be removed in the other places in this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2046931370 From jsikstro at openjdk.org Wed Apr 16 13:56:07 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 13:56:07 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 05:15:40 GMT, Thomas Stuefe wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > src/hotspot/share/utilities/vmError.cpp line 1399: > >> 1397: st->cr(); >> 1398: } >> 1399: Universe::heap()->print_on_error(st); > > Why is print_on_error called outside the indentation scope? This is because print_on() is in its "own" block, inside "Heap:", while print_on_error() prints its own blocks, like "ZGC Globals:" below. Other GCs behave in the same way. Heap: ZHeap used 7740M, capacity 9216M, max capacity 9216M Cache 1476M (2) size classes 128M (1), 1G (1) Metaspace used 18526K, committed 18816K, reserved 1114112K class space used 1603K, committed 1728K, reserved 1048576K ZGC Globals: Young Collection: Mark/51 Old Collection: Mark/18 Offset Max: 144G (0x0000002400000000) Page Size Small: 2M Page Size Medium: 32M ZGC Metadata Bits: LoadGood: 0x000000000000d000 LoadBad: 0x0000000000002000 ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2046992916 From jsikstro at openjdk.org Wed Apr 16 14:08:47 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 14:08:47 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 05:25:31 GMT, Thomas Stuefe wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > src/hotspot/share/memory/metaspace.cpp line 221: > >> 219: MetaspaceCombinedStats stats = get_combined_statistics(); >> 220: out->print("Metaspace"); >> 221: out->fill_to(17); > > We rely on absolute position here? Will not work well with different indentation levels. This was intended to align good with how ZGC does it. After some thought I think a better strategy is to add a space at the end of the string before filling, like: ```c++ out->print("Metaspace "); out->fill_to(17); This still aligns to the 17th column, but will not break printing for deeper indentation levels (currently 6 or more). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2047019290 From jsikstro at openjdk.org Wed Apr 16 14:19:03 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 14:19:03 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v2] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request incrementally with four additional commits since the last revision: - Safety padding for deep indentation - Remove superfluous ResourceMarks - Comment for print_on_error() - Merge 'master' into JDK-8354362_autoindent_collectedheap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/2c0c0b2b..9fea46ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=00-01 Stats: 180592 lines in 408 files changed: 10159 ins; 169115 del; 1318 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Wed Apr 16 14:19:49 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 14:19:49 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v3] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8354362_autoindent_collectedheap - Safety padding for deep indentation - Remove superfluous ResourceMarks - Comment for print_on_error() - 8354362: Use automatic indentation in CollectedHeap printing ------------- Changes: https://git.openjdk.org/jdk/pull/24593/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=02 Stats: 246 lines in 27 files changed: 88 ins; 89 del; 69 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Wed Apr 16 14:19:49 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 14:19:49 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 11:28:12 GMT, Joel Sikstr?m wrote: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Sorry for the force-push, made a mistake when merging with master. No comments should have been removed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24593#issuecomment-2809736966 From jsikstro at openjdk.org Wed Apr 16 14:28:46 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 16 Apr 2025 14:28:46 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 08:28:22 GMT, Thomas Stuefe wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > Notes: > > - We may want to simplify at some point and merge streamIndentor and streamAutoIndentor. That includes checking which existing call sites use streamIndentor *without* wanting auto indentation. Not sure but I guess there are none. > I think the existing cases fall into two categories: where streamIndentor was used on a stream that had already autoindent enabled, and where the code uses "cr_indent()" or "indent" to manually indent. > > - It would be nice to have a short comment in collectedHeap.hpp about when print_on resp print_on_error is used. From your explanation, I expect print_on_error to be used for information that should only be printed in case of a fatal error, right? > > - To simplify and prevent mistakes, we should consider making set_autoindent in outputStream private and make the indentor RAII classes friends of outputStream. Thank you for looking at this @tstuefe! I've addressed some of your comments with new commits. I agree that we likely want to merge streamIndentor and StreamAutoIndentor in a follow up RFE, where it also would be good to look at making set_autoindent() private. I haven't looked into it, but it feels weird to have an indentation level on an outputStream and use it only explicitly via indent() and not via a StreamAutoIndentor. I think a good solution would be to only allow indentation via the StreamAutoIndentor API like you're proposing, and look into whether there should be some API for temporarily disabling indentation with a RAII object (or just some parameters to StreamAutoIndentor) if there are cases that require it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24593#issuecomment-2809764389 From dlong at openjdk.org Wed Apr 16 21:13:52 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Apr 2025 21:13:52 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 07:52:09 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > review comment resolutions Yes, I am suggesting doing something like: __ relocate(__ pc() - 4, barrier_Relocation::spec(), ZBarrierRelocationFormatStoreGoodAfterOr); which would be a bigger change to the implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2810802951 From lmesnik at openjdk.org Wed Apr 16 23:07:53 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 16 Apr 2025 23:07:53 GMT Subject: Integrated: 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API In-Reply-To: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> References: <27kfQFBIUrqLa3513GREjhVQp_iNK0pvYu6Wm1yTF7k=.b8c540a4-f3d1-4a17-9dbf-908be0ae6f7c@github.com> Message-ID: <1ZIcsJTnCri0LVBjSYa15TA8IpyrQxmw0K-SAFtBr5E=.e3f1835a-a3d6-4afa-80d1-fecb9751c859@github.com> On Tue, 15 Apr 2025 01:29:50 GMT, Leonid Mesnik wrote: > Just minor clean up of WB API usage. > Also changed othervm to driver. This pull request has now been integrated. Changeset: db2dffb6 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/db2dffb6e5fed3773080581350f7f5c0bcff8f35 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod 8354559: gc/g1/TestAllocationFailure.java doesn't need WB API Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/24642 From jbhateja at openjdk.org Thu Apr 17 02:22:42 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Apr 2025 02:22:42 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: <1iR9_nrbk0iFlgy28u4dO4-7OWjEkO__AoZ9zHqtm8I=.ae8b0a68-0f85-472d-a810-e9c8417097d9@github.com> On Wed, 16 Apr 2025 21:10:38 GMT, Dean Long wrote: > Yes, I am suggesting doing something like: > > ``` > __ relocate(__ pc() - 4, barrier_Relocation::spec(), ZBarrierRelocationFormatStoreGoodAfterOr); > ``` > > which would be a bigger change to the implementation. Yes, this is what I mean by address caching in my above comment. we already have an existing interface for it in place; the intent of this bug fix PR is not to improve upon the infrastructure but to align the fix with the current scheme. Do you suggest doing that in a follow up PR ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2811561649 From jbhateja at openjdk.org Thu Apr 17 03:21:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Apr 2025 03:21:08 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24664/files - new: https://git.openjdk.org/jdk/pull/24664/files/ffd92c37..dc2b2b16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24664&range=01-02 Stats: 10 lines in 4 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24664/head:pull/24664 PR: https://git.openjdk.org/jdk/pull/24664 From stuefe at openjdk.org Thu Apr 17 05:25:54 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Apr 2025 05:25:54 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v3] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 14:06:21 GMT, Joel Sikstr?m wrote: >> src/hotspot/share/memory/metaspace.cpp line 221: >> >>> 219: MetaspaceCombinedStats stats = get_combined_statistics(); >>> 220: out->print("Metaspace"); >>> 221: out->fill_to(17); >> >> We rely on absolute position here? Will not work well with different indentation levels. > > This was intended to align good with how ZGC does it. After some thought I think a better strategy is to add a space at the end of the string before filling, like: > > ```c++ > out->print("Metaspace "); > out->fill_to(17); > > This still aligns to the 17th column, but will not break printing for deeper indentation levels (currently 6 or more). Yes that sounds better >> src/hotspot/share/utilities/vmError.cpp line 1399: >> >>> 1397: st->cr(); >>> 1398: } >>> 1399: Universe::heap()->print_on_error(st); >> >> Why is print_on_error called outside the indentation scope? > > This is because print_on() is in its "own" block, inside "Heap:", while print_on_error() prints its own blocks, like "ZGC Globals:" below. Other GCs behave in the same way. > > > Heap: > ZHeap used 7740M, capacity 9216M, max capacity 9216M > Cache 1476M (2) > size classes 128M (1), 1G (1) > Metaspace used 18526K, committed 18816K, reserved 1114112K > class space used 1603K, committed 1728K, reserved 1048576K > > ZGC Globals: > Young Collection: Mark/51 > Old Collection: Mark/18 > Offset Max: 144G (0x0000002400000000) > Page Size Small: 2M > Page Size Medium: 32M > > ZGC Metadata Bits: > LoadGood: 0x000000000000d000 > LoadBad: 0x0000000000002000 > ... Hmm, that may be an indication that this should be in its own error reporting STEP, then? Probably does not matter much, just aesthetics ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048252477 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048252150 From jsikstro at openjdk.org Thu Apr 17 09:13:34 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 09:13:34 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v4] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request incrementally with two additional commits since the last revision: - Separate print_heap_on and print_gc_on in VMError printing - Rename print_on and print_on_error to print_heap_on and print_gc_on ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/c1140b86..2979316c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=02-03 Stats: 71 lines in 15 files changed: 19 ins; 6 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Thu Apr 17 09:13:35 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 09:13:35 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v4] In-Reply-To: References: Message-ID: <4Swh7By1eRJ19p7ULrAryORBm97i0783ErfLDJhdnKw=.1a0e864e-e74f-4491-a153-fc1c049688be@github.com> On Thu, 17 Apr 2025 05:23:10 GMT, Thomas Stuefe wrote: >> This is because print_on() is in its "own" block, inside "Heap:", while print_on_error() prints its own blocks, like "ZGC Globals:" below. Other GCs behave in the same way. >> >> >> Heap: >> ZHeap used 7740M, capacity 9216M, max capacity 9216M >> Cache 1476M (2) >> size classes 128M (1), 1G (1) >> Metaspace used 18526K, committed 18816K, reserved 1114112K >> class space used 1603K, committed 1728K, reserved 1048576K >> >> ZGC Globals: >> Young Collection: Mark/51 >> Old Collection: Mark/18 >> Offset Max: 144G (0x0000002400000000) >> Page Size Small: 2M >> Page Size Medium: 32M >> >> ZGC Metadata Bits: >> LoadGood: 0x000000000000d000 >> LoadBad: 0x0000000000002000 >> ... > > Hmm, that may be an indication that this should be in its own error reporting STEP, then? Probably does not matter much, just aesthetics I agree. With some suggestions from @stefank, I've renamed print_on() to print_heap_on() and print_on_error() to print_gc_on() to better reflect their purpose. I've also separated print_heap_on() and print_gc_on() into their own "STEPs" in the printing in vmError.cpp: STEP("printing heap information") ... print_heap_on(); ... STEP("printing GC information") ... print_gc_on() ... With this change it would make better sense to print the precious log in the GC section rather than the heap section. This would change the printing order, which I have not yet done in this patch, so I think it would be better in a follow up RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048556654 From jsikstro at openjdk.org Thu Apr 17 09:47:17 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 09:47:17 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v5] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Shenandoah print rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/2979316c..33d20641 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=03-04 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From stuefe at openjdk.org Thu Apr 17 10:44:53 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Apr 2025 10:44:53 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v5] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 09:47:17 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Shenandoah print rename Looks fine to me, but GC people should look at this too. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24593#pullrequestreview-2775319824 From jsikstro at openjdk.org Thu Apr 17 10:49:39 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 10:49:39 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v6] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Rename ZGC printing to match print_heap_on() and print_gc_on() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/33d20641..0824712c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=04-05 Stats: 24 lines in 5 files changed: 1 ins; 1 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From sjohanss at openjdk.org Thu Apr 17 10:53:29 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 17 Apr 2025 10:53:29 GMT Subject: RFR: 8354929: Update collection stats while holding page allocator lock Message-ID: Please review this change to restructure some code in the mark start pause to do updates while holding the lock. **Summary** We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. **Testing** Mach5 tier1-5 ------------- Commit messages: - Move collection stat update under lock Changes: https://git.openjdk.org/jdk/pull/24719/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24719&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354929 Stats: 45 lines in 3 files changed: 17 ins; 15 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24719/head:pull/24719 PR: https://git.openjdk.org/jdk/pull/24719 From stefank at openjdk.org Thu Apr 17 11:22:55 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Apr 2025 11:22:55 GMT Subject: RFR: 8354922: ZGC: Use MAP_FIXED_NOREPLACE when reserving memory Message-ID: We have seen that some versions of the Linux kernel does not honor the address hint when mmapping memory without MAP_FIXED, if there is an adjacent memory area above the requested memory area. If we use MAP_FIXED_NOREPLACE, the reservation succeeds. I propose that we start using MAP_FIXED_NOREPLACE. Tested via GHA, which runs the gtest that performs a discontiguous, but adjacent reservation. I will run this through a bunch of tiers before integrating. ------------- Commit messages: - 8354922: ZGC: Use MAP_FIXED_NOREPLACE when reserving memory Changes: https://git.openjdk.org/jdk/pull/24716/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24716&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354922 Stats: 10 lines in 2 files changed: 9 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24716/head:pull/24716 PR: https://git.openjdk.org/jdk/pull/24716 From stefank at openjdk.org Thu Apr 17 11:26:51 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Apr 2025 11:26:51 GMT Subject: RFR: 8354929: Update collection stats while holding page allocator lock In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 10:48:54 GMT, Stefan Johansson wrote: > Please review this change to restructure some code in the mark start pause to do updates while holding the lock. > > **Summary** > We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. > > I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. > > **Testing** > Mach5 tier1-5 Looks good. I've add a couple of suggestions for blank lines. src/hotspot/share/gc/z/zPageAllocator.cpp line 1378: > 1376: void ZPageAllocator::update_collection_stats(ZGenerationId id) { > 1377: assert(SafepointSynchronize::is_at_safepoint(), "Should be at safepoint"); > 1378: #ifdef ASSERT Suggestion: #ifdef ASSERT src/hotspot/share/gc/z/zPageAllocator.cpp line 1388: > 1386: assert(total_used == _used, "Must be consistent at safepoint %zu == %zu", total_used, _used); > 1387: #endif > 1388: _collection_stats[(int)id]._used_high = _used; Suggestion: _collection_stats[(int)id]._used_high = _used; ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24719#pullrequestreview-2775405882 PR Review Comment: https://git.openjdk.org/jdk/pull/24719#discussion_r2048745637 PR Review Comment: https://git.openjdk.org/jdk/pull/24719#discussion_r2048745371 From stefank at openjdk.org Thu Apr 17 11:28:04 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Apr 2025 11:28:04 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v6] In-Reply-To: References: Message-ID: <_rpazMMpjOnpEthySjhhfKE_Lit3eMNkH27qke5-Syc=.c0a68c73-9fdc-4ba8-950a-9fead760abda@github.com> On Thu, 17 Apr 2025 10:49:39 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Rename ZGC printing to match print_heap_on() and print_gc_on() src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 680: > 678: } > 679: st->cr(); > 680: Below this line we still have a print_on_error call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048749718 From stefank at openjdk.org Thu Apr 17 12:05:44 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Apr 2025 12:05:44 GMT Subject: RFR: 8354938: ZGC: Disable UseNUMA when ZFakeNUMA is used Message-ID: ZFakeNUMA is used to fake a number of NUMA nodes within ZGC. The intention was to make ZFakeNUMA mutually exclusive with UseNUMA, but the current code allows the user to enable UseNUMA and set ZFakeNUMA, which will trigger to the "mutual exclusion" assert in ZNUMA::initialize. Verified on NUMA machine with -XX:+UseNUMA -XX:ZFakeNUMA=. Will run this through our lower tiers. ------------- Commit messages: - 8354938: ZGC: Disable UseNUMA when ZFakeNUMA is used Changes: https://git.openjdk.org/jdk/pull/24721/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24721&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354938 Stats: 13 lines in 1 file changed: 10 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24721.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24721/head:pull/24721 PR: https://git.openjdk.org/jdk/pull/24721 From jsikstro at openjdk.org Thu Apr 17 12:15:45 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 12:15:45 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v7] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Rename print_on_error() in the remaining call-paths from print_gc_on() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/0824712c..042c0aee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=05-06 Stats: 15 lines in 11 files changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Thu Apr 17 12:15:46 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 17 Apr 2025 12:15:46 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v6] In-Reply-To: <_rpazMMpjOnpEthySjhhfKE_Lit3eMNkH27qke5-Syc=.c0a68c73-9fdc-4ba8-950a-9fead760abda@github.com> References: <_rpazMMpjOnpEthySjhhfKE_Lit3eMNkH27qke5-Syc=.c0a68c73-9fdc-4ba8-950a-9fead760abda@github.com> Message-ID: On Thu, 17 Apr 2025 11:24:56 GMT, Stefan Karlsson wrote: >> Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename ZGC printing to match print_heap_on() and print_gc_on() > > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 680: > >> 678: } >> 679: st->cr(); >> 680: > > Below this line we still have a print_on_error call. I renamed the remaining instances of print_on_error() in GC code with alternative names, all the way down to BitMap::print_on_error() which I renamed to BitMap::print_range_on(). The only remaining print_on_error() is GCLogPrecious::print_on_error(), which I figured might be left unchanged. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048817231 From stefank at openjdk.org Thu Apr 17 13:01:46 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Apr 2025 13:01:46 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v7] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 12:15:45 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, th... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Rename print_on_error() in the remaining call-paths from print_gc_on() I think this looks good. Not yet a full review, thought, but just wanted to send out my +1 on the changes. I've added a couple of more suggestions. src/hotspot/share/gc/serial/tenuredGeneration.cpp line 449: > 447: > 448: StreamAutoIndentor indentor(st, 1); > 449: st->print("the "); _the_space->print_on(st); Suggestion: st->print("the "); _the_space->print_on(st); src/hotspot/share/gc/z/zPageAllocator.cpp line 1171: > 1169: } > 1170: > 1171: void ZPartition::print_extended_cache_on(outputStream* st) const { I would like to suggest that you flip the 'extended' and 'cache' words: Suggestion: void ZPartition::print_cache_extended_on(outputStream* st) const { So, that we have the structure: print__on print__extended_on ------------- PR Review: https://git.openjdk.org/jdk/pull/24593#pullrequestreview-2775538728 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048825842 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2048839661 From sjohanss at openjdk.org Thu Apr 17 18:15:21 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 17 Apr 2025 18:15:21 GMT Subject: RFR: 8354929: Update collection stats while holding page allocator lock [v2] In-Reply-To: References: Message-ID: > Please review this change to restructure some code in the mark start pause to do updates while holding the lock. > > **Summary** > We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. > > I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. > > **Testing** > Mach5 tier1-5 Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: - Additional blank line Co-authored-by: Stefan Karlsson - Additional blank line Co-authored-by: Stefan Karlsson ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24719/files - new: https://git.openjdk.org/jdk/pull/24719/files/473bdff2..b8966d36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24719&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24719&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24719/head:pull/24719 PR: https://git.openjdk.org/jdk/pull/24719 From dlong at openjdk.org Thu Apr 17 19:40:57 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 17 Apr 2025 19:40:57 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 03:21:08 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions When I made my suggestions, I didn't realize it would also require changes on the Graal side. So I would suggest a separate PR only if the Graal team agrees. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2813856674 From manc at google.com Fri Apr 18 05:28:21 2025 From: manc at google.com (Man Cao) Date: Thu, 17 Apr 2025 22:28:21 -0700 Subject: Moving Forward with AHS for G1 Message-ID: >> Supporting both memory.high and memory.max for AHS sounds great. >> The soft limit for the custom container is only one example. The custom container also has "strange" use cases where the actual limit is larger than cgroup's hard memory limit. > Okay, great. Sounds like AHS + actually using the standardized cgroups memory limits as the way of limiting memory is a viable path forward then? Not exactly. It is still impractical to migrate the custom container cases to standard cgroups. Thus those custom container cases cannot use AHS. One reason is the "strange" use cases where the actual limit is larger than cgroup's hard memory limit. There are other reasons that the custom container cannot migrate to standard cgroups. > So the main point for introducing CurrentMaxHeapSize, as opposed to going directly to AHS, would be to support all the people out there that already built their own adaptive container infrastructure that doesn?t use industry standard cgroup technology to limit memory. Instead, this group of users use the very proposed CurrentMaxHeapSize functionality (which obviously does not exist in mainline yet) to limit memory adaptively instead. > I have to be honest? this sounds like a niche feature to me with a ticking clock attached to it. Yet if it gets integrated, we will not be able to get rid of it for decades and it will cost maintenance overheads along the way. So I think it would be good to see a prominent use case that might be interesting for a long time going forward as well, and not just a way to help you guys stop using the proposed feature in the transition to AHS, which seems to be where we are going. > I think what will reach a much broader audience going forward, is AHS. And if that?s the feature we really want, I can?t help but wonder if exposing this user configurable stuff along the way is helping towards that goal rather than slowing us down by inventing yet another set of manually set handcuffs that the JVM and AHS will have to respect for ages, way past its best before date. I'd say the statements above are "overfitting" CurrentMaxHeapSize to the custom container use case. The main point for the value of CurrentMaxHeapSize (or a high-precedence SoftMaxHeapSize) is as mentioned in the previous response : a fully-developed AHS is unlikely to satisfy all use cases and deployment environments out there. CurrentMaxHeapSize (or a high-precedence SoftMaxHeapSize) provides additional flexibility and control for AHS and for non-AHS use cases. The custom container and JVM-external algorithm for calculating CurrentMaxHeapSize/SoftMaxHeapSize is only one example of such use cases. I could think of other use cases for CurrentMaxHeapSize (or high-precedence SoftMaxHeapSize): 1. CRIU (OpenJDK CRaC) from [~rvansa]'s comment on https://bugs.openjdk.org/browse/JDK-8204088. This case needs to shrink the Java heap as much as possible before creating the process snapshot. CRaC has implemented https://bugs.openjdk.org/browse/JDK-8348650 for G1. This is almost the same as the use case for setting -XX:MinHeapFreeRatio=0 -XX:MaxHeapFreeRatio=0 mentioned previously in this thread . Min/MaxHeapFreeRatio only works for G1 and ParallelGC, and will likely stop working for G1 as https://bugs.openjdk.org/browse/JDK-8353716 says. 2. Multiple Java processes with different priorities. If multiple processes run inside the same container and memory is running low, users could set a smaller CurrentMaxHeapSize for low-priority processes, to make more memory available to high-priority processes. 3. Shrinking container memory limit dynamically. Directly setting container memory limit to below the container memory usage will likely fail. However, if user sets a smaller CurrentMaxHeapSize first, the Java process will shrink the heap, thus reducing container memory usage. Then lowering the memory limit will succeed. In addition, these use cases may not want to adopt AHS for various reasons. Instead, they could use CurrentMaxHeapSize/SoftMaxHeapSize to directly solve the problems. -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Fri Apr 18 08:32:41 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 18 Apr 2025 08:32:41 GMT Subject: RFR: 8354929: Update collection stats while holding page allocator lock [v2] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 18:15:21 GMT, Stefan Johansson wrote: >> Please review this change to restructure some code in the mark start pause to do updates while holding the lock. >> >> **Summary** >> We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. >> >> I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. >> >> **Testing** >> Mach5 tier1-5 > > Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: > > - Additional blank line > > Co-authored-by: Stefan Karlsson > - Additional blank line > > Co-authored-by: Stefan Karlsson Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24719#pullrequestreview-2778082113 From tschatzl at openjdk.org Fri Apr 18 09:33:48 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 18 Apr 2025 09:33:48 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v34] In-Reply-To: References: Message-ID: <3VD8WHNeCOwh3vgziKpuOctwd7CsOXM6uEVc1P6HSrg=.961011ff-9e7b-456d-bb70-f6ef89cc6735@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - * ayang review (part 2 - yield duration changes) - * ayang review (part 1) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/068d2a37..a3b2386d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=32-33 Stats: 41 lines in 11 files changed: 1 ins; 11 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Fri Apr 18 09:46:41 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 18 Apr 2025 09:46:41 GMT Subject: RFR: 8346568: G1: Other time can be negative In-Reply-To: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Message-ID: On Fri, 4 Apr 2025 18:00:21 GMT, Sangheon Kim wrote: > Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. > 1. Different scope of measurement > - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. > (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) > - Changed not to be included in sum_of_sub_phases. > - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. > 2. Duplicated measurement > - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. > 3. Pre Concurrent task execution time > - Sometimes the difference between the existing average time and pre-concurrent work is 2 digit milliseconds. Changed to measure exact time rather than accumulating the average value to sum_of_sub_phases and keep displaying concurrent tasks' average execution time. > > Testing: tier 1 ~ 5 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 411: > 409: > 410: double G1GCPhaseTimes::print_pre_evacuate_collection_set() const { > 411: const double pre_concurrent_start_ms = average_time_ms(ResetMarkingState) + Could this assignment be moved down to just before the use? src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 425: > 423: // Concurrent tasks of ResetMarkingState and NoteStartOfMark are triggered during > 424: // young collection. However, their execution time are not included in _gc_pause_time_ms. > 425: if (pre_concurrent_start_ms > 0.0) { Since `pre_concurrent_start_ms` is now actually gathered, maybe print an extra line for it, with the `ResetMarkingState` and `NoteStartOfMark` log lines indented? I.e. something like: if (_cur_prepare_concurrent_task_time_ms > 0.0) { debug_time("Prepare Concurrent Start", _cur_prepare_concurrent_task_time_ms); debug_phase(_gc_par_phases[ResetMarkingState], 1); debug_phase(_gc_par_phases[NoteStartOfMark], 1); } ? Then we can also drop the calculation of the local `pre_concurrent_start_ms`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24454#pullrequestreview-2778191624 PR Review Comment: https://git.openjdk.org/jdk/pull/24454#discussion_r2050415309 PR Review Comment: https://git.openjdk.org/jdk/pull/24454#discussion_r2050420949 From tschatzl at openjdk.org Fri Apr 18 10:08:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 18 Apr 2025 10:08:52 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v34] In-Reply-To: <3VD8WHNeCOwh3vgziKpuOctwd7CsOXM6uEVc1P6HSrg=.961011ff-9e7b-456d-bb70-f6ef89cc6735@github.com> References: <3VD8WHNeCOwh3vgziKpuOctwd7CsOXM6uEVc1P6HSrg=.961011ff-9e7b-456d-bb70-f6ef89cc6735@github.com> Message-ID: On Fri, 18 Apr 2025 09:33:48 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * ayang review (part 2 - yield duration changes) > - * ayang review (part 1) The current use of all filters in the barrier is intentional: there is additional work going on investigating that, and I did not want to anticipate it in this change. When implementing the current `gen_write_ref_array_post` code measurements showed that the current version is slightly better than your suggestion for most arrays (everything larger than a few elements). I may still decide to use your version for now and re-measure later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2815125500 From sangheki at openjdk.org Fri Apr 18 19:09:33 2025 From: sangheki at openjdk.org (Sangheon Kim) Date: Fri, 18 Apr 2025 19:09:33 GMT Subject: RFR: 8346568: G1: Other time can be negative [v2] In-Reply-To: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Message-ID: > Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. > 1. Different scope of measurement > - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. > (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) > - Changed not to be included in sum_of_sub_phases. > - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. > 2. Duplicated measurement > - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. > 3. Pre Concurrent task execution time > - Sometimes the difference between the existing average time and pre-concurrent work is 2 digit milliseconds. Changed to measure exact time rather than accumulating the average value to sum_of_sub_phases and keep displaying concurrent tasks' average execution time. > > Testing: tier 1 ~ 5 Sangheon Kim has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8346568-G1-negative-time - Separate measurement for cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24454/files - new: https://git.openjdk.org/jdk/pull/24454/files/1c1750fd..d5f6b641 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24454&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24454&range=00-01 Stats: 257042 lines in 1817 files changed: 57470 ins; 193153 del; 6419 mod Patch: https://git.openjdk.org/jdk/pull/24454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24454/head:pull/24454 PR: https://git.openjdk.org/jdk/pull/24454 From lmesnik at openjdk.org Sat Apr 19 00:44:18 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 19 Apr 2025 00:44:18 GMT Subject: RFR: 8355069: Allocation::check_out_of_memory() should support CheckUnhandledOops mode Message-ID: The CheckUnhandledOops cause failure if JvmtiExport::post_resource_exhausted(...) is called in MemAllocator::Allocation::check_out_of_memory() The obj is null so it is not a real bug. I am fixing it to reduce noise for CheckUnhandledOops mode for jvmti tests execution. The vmTestbase/nsk/jvmti/ResourceExhausted/resexhausted002/TestDescription.java failed with -XX:+CheckUnhandledOops ------------- Commit messages: - 8355069 Changes: https://git.openjdk.org/jdk/pull/24766/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24766&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355069 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24766.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24766/head:pull/24766 PR: https://git.openjdk.org/jdk/pull/24766 From lmesnik at openjdk.org Sat Apr 19 02:25:33 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 19 Apr 2025 02:25:33 GMT Subject: RFR: 8355069: Allocation::check_out_of_memory() should support CheckUnhandledOops mode [v2] In-Reply-To: References: Message-ID: > The > CheckUnhandledOops > cause failure if JvmtiExport::post_resource_exhausted(...) > is called in > MemAllocator::Allocation::check_out_of_memory() > The obj is null so it is not a real bug. > > I am fixing it to reduce noise for CheckUnhandledOops mode for jvmti tests execution. > The vmTestbase/nsk/jvmti/ResourceExhausted/resexhausted002/TestDescription.java > failed with -XX:+CheckUnhandledOops Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: typo fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24766/files - new: https://git.openjdk.org/jdk/pull/24766/files/aa84af52..cb2904d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24766&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24766&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24766.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24766/head:pull/24766 PR: https://git.openjdk.org/jdk/pull/24766 From sangheki at openjdk.org Sat Apr 19 05:08:26 2025 From: sangheki at openjdk.org (Sangheon Kim) Date: Sat, 19 Apr 2025 05:08:26 GMT Subject: RFR: 8346568: G1: Other time can be negative [v3] In-Reply-To: References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Message-ID: <483hE4M8lfm5sv4bpf9YfN0qim6OwODlXgZj9aLReso=.0bbaadf1-221c-4e9d-a16f-f2e86fffe17a@github.com> On Fri, 18 Apr 2025 09:38:33 GMT, Thomas Schatzl wrote: >> Sangheon Kim has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Review from Thomas >> - Separate measurement for cleanup > > src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 425: > >> 423: // Concurrent tasks of ResetMarkingState and NoteStartOfMark are triggered during >> 424: // young collection. However, their execution time are not included in _gc_pause_time_ms. >> 425: if (pre_concurrent_start_ms > 0.0) { > > Since `pre_concurrent_start_ms` is now actually gathered, maybe print an extra line for it, with the `ResetMarkingState` and `NoteStartOfMark` log lines indented? > > I.e. something like: > > > if (_cur_prepare_concurrent_task_time_ms > 0.0) { > debug_time("Prepare Concurrent Start", _cur_prepare_concurrent_task_time_ms); > debug_phase(_gc_par_phases[ResetMarkingState], 1); > debug_phase(_gc_par_phases[NoteStartOfMark], 1); > } > > ? > > Then we can also drop the calculation of the local `pre_concurrent_start_ms`. Okay, this looks better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24454#discussion_r2051391042 From sangheki at openjdk.org Sat Apr 19 05:08:26 2025 From: sangheki at openjdk.org (Sangheon Kim) Date: Sat, 19 Apr 2025 05:08:26 GMT Subject: RFR: 8346568: G1: Other time can be negative [v3] In-Reply-To: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Message-ID: <1iumndO7Tu352QZf_8tPaSTlYqdRBtNVw7N_VHLj52E=.fc6d1856-8f3f-4e04-80ac-5b34dd3dbcb5@github.com> > Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. > 1. Different scope of measurement > - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. > (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) > - Changed not to be included in sum_of_sub_phases. > - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. > 2. Duplicated measurement > - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. > 3. Pre Concurrent task execution time > - Sometimes the difference between the existing average time and pre-concurrent work is 2 digit milliseconds. Changed to measure exact time rather than accumulating the average value to sum_of_sub_phases and keep displaying concurrent tasks' average execution time. > > Testing: tier 1 ~ 5 Sangheon Kim has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Review from Thomas - Separate measurement for cleanup ------------- Changes: https://git.openjdk.org/jdk/pull/24454/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24454&range=02 Stats: 68 lines in 4 files changed: 36 ins; 20 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24454/head:pull/24454 PR: https://git.openjdk.org/jdk/pull/24454 From gli at openjdk.org Sun Apr 20 11:15:42 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 20 Apr 2025 11:15:42 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v2] In-Reply-To: References: Message-ID: <6lDEcjgVR_AB4ZIAgX7oMHGdXzVGx52RB_EzOqJKqMg=.97d50c03-9113-4309-bd93-35b83d54f470@github.com> On Thu, 10 Apr 2025 11:59:52 GMT, Albert Mingkun Yang wrote: >> Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. >> >> Test: tier1-7 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/gc/parallel/parallelArguments.cpp line 78: > 76: } else { > 77: FLAG_SET_DEFAULT(InitialSurvivorRatio, MinSurvivorRatio); > 78: } If both `InitialSurvivorRatio` and `MinSurvivorRatio` are not set in command line and the condition `InitialSurvivorRatio < MinSurvivorRatio` is true, it seems the corresponding default/ergonomic values, we set before, are wrong. Should we guard this situation (such as printing an error message) to catch the bug in the previous code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24556#discussion_r2051691657 From aboldtch at openjdk.org Tue Apr 22 05:26:44 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 22 Apr 2025 05:26:44 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 03:21:08 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Changes looks good. But coordinate with the Graal team before pushing anything. I think @dean-long's suggestion is good. But it should be done for all relocations in a separate PR. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24664#pullrequestreview-2782793012 From aboldtch at openjdk.org Tue Apr 22 06:07:40 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 22 Apr 2025 06:07:40 GMT Subject: RFR: 8354938: ZGC: Disable UseNUMA when ZFakeNUMA is used In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 12:00:27 GMT, Stefan Karlsson wrote: > ZFakeNUMA is used to fake a number of NUMA nodes within ZGC. The intention was to make ZFakeNUMA mutually exclusive with UseNUMA, but the current code allows the user to enable UseNUMA and set ZFakeNUMA, which will trigger to the "mutual exclusion" assert in ZNUMA::initialize. > > Verified on NUMA machine with -XX:+UseNUMA -XX:ZFakeNUMA=. Will run this through our lower tiers. lgtm. In the future we could probably make UseNUMA work with ZFakeNUMA. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24721#pullrequestreview-2782853524 From aboldtch at openjdk.org Tue Apr 22 06:18:42 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 22 Apr 2025 06:18:42 GMT Subject: RFR: 8354929: Update collection stats while holding page allocator lock [v2] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 18:15:21 GMT, Stefan Johansson wrote: >> Please review this change to restructure some code in the mark start pause to do updates while holding the lock. >> >> **Summary** >> We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. >> >> I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. >> >> **Testing** >> Mach5 tier1-5 > > Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: > > - Additional blank line > > Co-authored-by: Stefan Karlsson > - Additional blank line > > Co-authored-by: Stefan Karlsson lgtm. Just a comment about the assert message. src/hotspot/share/gc/z/zPageAllocator.cpp line 1387: > 1385: } > 1386: > 1387: assert(total_used == _used, "Must be consistent at safepoint %zu == %zu", total_used, _used); Preexisting, the assert message is misleading. Currently it is not the safepoint which guarantees consistency, but the page allocator lock. I wrote this assert at first under the assumption that we did not change _used concurrently with safepoints, but we did, so added the lock but forgot to update the assert text. Maybe just removing the mention of safepoint is enough. Suggestion: assert(total_used == _used, "Must be consistent %zu == %zu", total_used, _used); ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24719#pullrequestreview-2782868409 PR Review Comment: https://git.openjdk.org/jdk/pull/24719#discussion_r2053396904 From aboldtch at openjdk.org Tue Apr 22 06:22:59 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 22 Apr 2025 06:22:59 GMT Subject: RFR: 8354922: ZGC: Use MAP_FIXED_NOREPLACE when reserving memory In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 08:30:27 GMT, Stefan Karlsson wrote: > We have seen that some versions of the Linux kernel does not honor the address hint when mmapping memory without MAP_FIXED, if there is an adjacent memory area above the requested memory area. If we use MAP_FIXED_NOREPLACE, the reservation succeeds. I propose that we start using MAP_FIXED_NOREPLACE. > > Tested via GHA, which runs the gtest that performs a discontiguous, but adjacent reservation. I will run this through a bunch of tiers before integrating. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24716#pullrequestreview-2782878664 From eosterlund at openjdk.org Tue Apr 22 06:40:40 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 22 Apr 2025 06:40:40 GMT Subject: RFR: 8354922: ZGC: Use MAP_FIXED_NOREPLACE when reserving memory In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 08:30:27 GMT, Stefan Karlsson wrote: > We have seen that some versions of the Linux kernel does not honor the address hint when mmapping memory without MAP_FIXED, if there is an adjacent memory area above the requested memory area. If we use MAP_FIXED_NOREPLACE, the reservation succeeds. I propose that we start using MAP_FIXED_NOREPLACE. > > Tested via GHA, which runs the gtest that performs a discontiguous, but adjacent reservation. I will run this through a bunch of tiers before integrating. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24716#pullrequestreview-2782915076 From sjohanss at openjdk.org Tue Apr 22 07:07:04 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 22 Apr 2025 07:07:04 GMT Subject: RFR: 8354929: ZGC: Update collection stats while holding page allocator lock [v3] In-Reply-To: References: Message-ID: > Please review this change to restructure some code in the mark start pause to do updates while holding the lock. > > **Summary** > We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. > > I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. > > **Testing** > Mach5 tier1-5 Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Improved assert message Co-authored-by: Axel Boldt-Christmas ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24719/files - new: https://git.openjdk.org/jdk/pull/24719/files/b8966d36..1dc0beb5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24719&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24719&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24719/head:pull/24719 PR: https://git.openjdk.org/jdk/pull/24719 From stefank at openjdk.org Tue Apr 22 07:23:55 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 22 Apr 2025 07:23:55 GMT Subject: RFR: 8354929: ZGC: Update collection stats while holding page allocator lock [v3] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 07:07:04 GMT, Stefan Johansson wrote: >> Please review this change to restructure some code in the mark start pause to do updates while holding the lock. >> >> **Summary** >> We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. >> >> I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. >> >> **Testing** >> Mach5 tier1-5 > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Improved assert message > > Co-authored-by: Axel Boldt-Christmas Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24719#pullrequestreview-2783062025 From tschatzl at openjdk.org Tue Apr 22 07:25:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 22 Apr 2025 07:25:56 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 11:59:52 GMT, Albert Mingkun Yang wrote: >> Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. >> >> Test: tier1-7 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24556#pullrequestreview-2783068630 From tschatzl at openjdk.org Tue Apr 22 07:47:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 22 Apr 2025 07:47:56 GMT Subject: RFR: 8346568: G1: Other time can be negative [v3] In-Reply-To: <1iumndO7Tu352QZf_8tPaSTlYqdRBtNVw7N_VHLj52E=.fc6d1856-8f3f-4e04-80ac-5b34dd3dbcb5@github.com> References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> <1iumndO7Tu352QZf_8tPaSTlYqdRBtNVw7N_VHLj52E=.fc6d1856-8f3f-4e04-80ac-5b34dd3dbcb5@github.com> Message-ID: On Sat, 19 Apr 2025 05:08:26 GMT, Sangheon Kim wrote: >> Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. >> 1. Different scope of measurement >> - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. >> (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) >> - Changed not to be included in sum_of_sub_phases. >> - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. >> 2. Duplicated measurement >> - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. >> 3. Pre Concurrent task execution time >> - Sometimes the difference between the existing average time and pre-concurrent work is 2 digit milliseconds. Changed to measure exact time rather than accumulating the average value to sum_of_sub_phases and keep displaying concurrent tasks' average execution time. >> >> Testing: tier 1 ~ 5 > > Sangheon Kim has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Review from Thomas > - Separate measurement for cleanup Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24454#pullrequestreview-2783156317 From ayang at openjdk.org Tue Apr 22 07:55:17 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 22 Apr 2025 07:55:17 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v3] In-Reply-To: References: Message-ID: > Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. > > Test: tier1-7 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into pgc-min-initial-fix - review - pgc-min-initial-fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24556/files - new: https://git.openjdk.org/jdk/pull/24556/files/1cd03d17..2bd29e50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24556&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24556&range=01-02 Stats: 225454 lines in 1011 files changed: 36608 ins; 184978 del; 3868 mod Patch: https://git.openjdk.org/jdk/pull/24556.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24556/head:pull/24556 PR: https://git.openjdk.org/jdk/pull/24556 From ayang at openjdk.org Tue Apr 22 07:55:17 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 22 Apr 2025 07:55:17 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v2] In-Reply-To: <6lDEcjgVR_AB4ZIAgX7oMHGdXzVGx52RB_EzOqJKqMg=.97d50c03-9113-4309-bd93-35b83d54f470@github.com> References: <6lDEcjgVR_AB4ZIAgX7oMHGdXzVGx52RB_EzOqJKqMg=.97d50c03-9113-4309-bd93-35b83d54f470@github.com> Message-ID: On Sun, 20 Apr 2025 10:35:21 GMT, Guoxiong Li wrote: > If both InitialSurvivorRatio and MinSurvivorRatio are not set in command line and the condition InitialSurvivorRatio < MinSurvivorRatio is true When will that happen? AFAIS, if neither is set on command line, the default values should be MinSurvivorRatio == 3 and InitialSurvivorRatio == 8 (as defined in gc_globals.hpp), so `MinSurvivorRatio <= InitialSurvivorRatio` should hold. > Should we guard this situation (such as printing an error message) to catch the bug in the previous code? I didn't really understand your suggestion. Could you clarify what you mean by "previous code"? Or maybe some pseudo code to outline your suggestion? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24556#discussion_r2053545854 From sjohanss at openjdk.org Tue Apr 22 08:02:57 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 22 Apr 2025 08:02:57 GMT Subject: Integrated: 8354929: ZGC: Update collection stats while holding page allocator lock In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 10:48:54 GMT, Stefan Johansson wrote: > Please review this change to restructure some code in the mark start pause to do updates while holding the lock. > > **Summary** > We currently update the collection high and low used values during the mark start pause without taking the page allocator lock. This is fine since it is read atomically, but consistency verification in this code requires the lock to be held. We later in the pause take the lock to get the current statistics, this change moves the update code to also happen while holding the lock. > > I've renamed `reset_statistics()` to `update_collection_stats()` to better match what it actually does and made it private. > > **Testing** > Mach5 tier1-5 This pull request has now been integrated. Changeset: 50358d1c Author: Stefan Johansson URL: https://git.openjdk.org/jdk/commit/50358d1ca49c26d100c5c658de29c75f864fdc60 Stats: 47 lines in 3 files changed: 19 ins; 15 del; 13 mod 8354929: ZGC: Update collection stats while holding page allocator lock Reviewed-by: stefank, tschatzl, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/24719 From sjohanss at openjdk.org Tue Apr 22 08:02:50 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 22 Apr 2025 08:02:50 GMT Subject: RFR: 8354929: ZGC: Update collection stats while holding page allocator lock [v3] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 07:20:57 GMT, Stefan Karlsson wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Improved assert message >> >> Co-authored-by: Axel Boldt-Christmas > > Marked as reviewed by stefank (Reviewer). Thanks for the reviews @stefank, @xmas92 and @tschatzl ------------- PR Comment: https://git.openjdk.org/jdk/pull/24719#issuecomment-2820489386 From jsikstro at openjdk.org Tue Apr 22 09:26:41 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 22 Apr 2025 09:26:41 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v8] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to apply an indentation. My reasoning for this is that most places that use streamIndentor also want to use StreamAutoIndentor (either immediately or some time before) so that it is automatically applied. A downside of this change is that any previous uses of StreamAutoIndentor now also needs to store an extra int worth of memory. To me, this is a trade-off worth makin... Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'master' into JDK-8354362_autoindent_collectedheap - Flip extended_on naming order - Revert line-break in tenuredGeneration.cpp - Rename print_on_error() in the remaining call-paths from print_gc_on() - Rename ZGC printing to match print_heap_on() and print_gc_on() - Shenandoah print rename - Separate print_heap_on and print_gc_on in VMError printing - Rename print_on and print_on_error to print_heap_on and print_gc_on - Merge branch 'master' into JDK-8354362_autoindent_collectedheap - Safety padding for deep indentation - ... and 3 more: https://git.openjdk.org/jdk/compare/9eeb86d9...1e9d4be4 ------------- Changes: https://git.openjdk.org/jdk/pull/24593/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=07 Stats: 327 lines in 42 files changed: 105 ins; 92 del; 130 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From stefank at openjdk.org Tue Apr 22 11:25:59 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 22 Apr 2025 11:25:59 GMT Subject: RFR: 8354922: ZGC: Use MAP_FIXED_NOREPLACE when reserving memory In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 08:30:27 GMT, Stefan Karlsson wrote: > We have seen that some versions of the Linux kernel does not honor the address hint when mmapping memory without MAP_FIXED, if there is an adjacent memory area above the requested memory area. If we use MAP_FIXED_NOREPLACE, the reservation succeeds. I propose that we start using MAP_FIXED_NOREPLACE. > > Tested via GHA, which runs the gtest that performs a discontiguous, but adjacent reservation. I will run this through a bunch of tiers before integrating. Thanks for the reviews. ZGC testing in Tier1-7 on Linux looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24716#issuecomment-2821012631 From stefank at openjdk.org Tue Apr 22 11:26:01 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 22 Apr 2025 11:26:01 GMT Subject: Integrated: 8354922: ZGC: Use MAP_FIXED_NOREPLACE when reserving memory In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 08:30:27 GMT, Stefan Karlsson wrote: > We have seen that some versions of the Linux kernel does not honor the address hint when mmapping memory without MAP_FIXED, if there is an adjacent memory area above the requested memory area. If we use MAP_FIXED_NOREPLACE, the reservation succeeds. I propose that we start using MAP_FIXED_NOREPLACE. > > Tested via GHA, which runs the gtest that performs a discontiguous, but adjacent reservation. I will run this through a bunch of tiers before integrating. This pull request has now been integrated. Changeset: 0f1c448c Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/0f1c448ca15485cd7270cf0607acfceacdcefaff Stats: 10 lines in 2 files changed: 9 ins; 0 del; 1 mod 8354922: ZGC: Use MAP_FIXED_NOREPLACE when reserving memory Reviewed-by: aboldtch, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24716 From jsikstro at openjdk.org Tue Apr 22 11:34:07 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 22 Apr 2025 11:34:07 GMT Subject: RFR: 8354938: ZGC: Disable UseNUMA when ZFakeNUMA is used In-Reply-To: References: Message-ID: <0b4IJlfb7c3YN6dPT1NwDsCy6DQwly_AwASOE7EFwn4=.0c27a6e0-22c9-460a-bebc-fd343d47ca74@github.com> On Thu, 17 Apr 2025 12:00:27 GMT, Stefan Karlsson wrote: > ZFakeNUMA is used to fake a number of NUMA nodes within ZGC. The intention was to make ZFakeNUMA mutually exclusive with UseNUMA, but the current code allows the user to enable UseNUMA and set ZFakeNUMA, which will trigger to the "mutual exclusion" assert in ZNUMA::initialize. > > Verified on NUMA machine with -XX:+UseNUMA -XX:ZFakeNUMA=. Will run this through our lower tiers. Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24721#pullrequestreview-2783789345 From stefank at openjdk.org Tue Apr 22 11:52:10 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 22 Apr 2025 11:52:10 GMT Subject: RFR: 8354938: ZGC: Disable UseNUMA when ZFakeNUMA is used In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 06:04:47 GMT, Axel Boldt-Christmas wrote: > In the future we could probably make UseNUMA work with ZFakeNUMA. Yes, the patch here is just a band aid to fix the imminent issue, but we could probably do what you said. Thanks all for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24721#issuecomment-2821072838 From stefank at openjdk.org Tue Apr 22 11:52:10 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 22 Apr 2025 11:52:10 GMT Subject: Integrated: 8354938: ZGC: Disable UseNUMA when ZFakeNUMA is used In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 12:00:27 GMT, Stefan Karlsson wrote: > ZFakeNUMA is used to fake a number of NUMA nodes within ZGC. The intention was to make ZFakeNUMA mutually exclusive with UseNUMA, but the current code allows the user to enable UseNUMA and set ZFakeNUMA, which will trigger to the "mutual exclusion" assert in ZNUMA::initialize. > > Verified on NUMA machine with -XX:+UseNUMA -XX:ZFakeNUMA=. Will run this through our lower tiers. This pull request has now been integrated. Changeset: f2587d9b Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/f2587d9bd2e86c46c49ad972790c60ec394848da Stats: 13 lines in 1 file changed: 10 ins; 0 del; 3 mod 8354938: ZGC: Disable UseNUMA when ZFakeNUMA is used Reviewed-by: aboldtch, jsikstro ------------- PR: https://git.openjdk.org/jdk/pull/24721 From aboldtch at openjdk.org Tue Apr 22 12:17:49 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 22 Apr 2025 12:17:49 GMT Subject: RFR: 8351137: ZGC: Improve ZValueStorage alignment support [v2] In-Reply-To: References: Message-ID: > ZValueStorage only align the allocations to the alignment defined by the storage but ignores the alignment of the types. Right now all usages of our different storages all have types which have an alignment less than or equal to the alignment set by its storage. > > I wish to improve this so that types with greater alignment than the storage alignment can be used. > > The UB caused by using a type larger than the storage alignment is something I have seen materialise as returning bad address (and crashing) on Windows. > > As we use `utilities/align.hpp` for our alignment utilities we only support power of two alignment, I added extra asserts here because we use the fact that `lcm(x, y) = max(x, y)` if both are powers of two. > > Testing: > * tier 1 through tier 5 Oracle supported platforms > * GHA Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8351137 - 8351137: ZGC: Improve ZValueStorage alignment support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23887/files - new: https://git.openjdk.org/jdk/pull/23887/files/f46b7f85..ccca8a5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23887&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23887&range=00-01 Stats: 390461 lines in 4435 files changed: 115293 ins; 251575 del; 23593 mod Patch: https://git.openjdk.org/jdk/pull/23887.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23887/head:pull/23887 PR: https://git.openjdk.org/jdk/pull/23887 From jsikstro at openjdk.org Tue Apr 22 13:25:22 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 22 Apr 2025 13:25:22 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v9] In-Reply-To: References: Message-ID: <7yvwm_aaSWl-nzKgMpUIScBEYSRLLLH_QyXlagLQGCU=.1ed25b3e-fc52-4034-b268-e3ad691947be@github.com> > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > >> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. With this change, printing heap information and gc information is now two distinct steps in vmError.cpp. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoInden... Joel Sikstr?m has updated the pull request incrementally with two additional commits since the last revision: - Ternary instead of conditional in VirtualSpace::print_on - Consistent line-breaking and spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/1e9d4be4..115dc41a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=07-08 Stats: 23 lines in 4 files changed: 7 ins; 5 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Tue Apr 22 13:25:24 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 22 Apr 2025 13:25:24 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v8] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 09:26:41 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >>> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. With this change, printing heap information and gc information is now two distinct steps in vmError.cpp. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optio... > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge branch 'master' into JDK-8354362_autoindent_collectedheap > - Flip extended_on naming order > - Revert line-break in tenuredGeneration.cpp > - Rename print_on_error() in the remaining call-paths from print_gc_on() > - Rename ZGC printing to match print_heap_on() and print_gc_on() > - Shenandoah print rename > - Separate print_heap_on and print_gc_on in VMError printing > - Rename print_on and print_on_error to print_heap_on and print_gc_on > - Merge branch 'master' into JDK-8354362_autoindent_collectedheap > - Safety padding for deep indentation > - ... and 3 more: https://git.openjdk.org/jdk/compare/9eeb86d9...1e9d4be4 After some discussion with @lkorinth and @xmas92 I've opted to use consistent line-breaking and some other style changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24593#issuecomment-2821318776 From jsikstro at openjdk.org Tue Apr 22 14:57:06 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 22 Apr 2025 14:57:06 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v10] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > >> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. With this change, printing heap information and gc information is now two distinct steps in vmError.cpp. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoInden... Joel Sikstr?m has updated the pull request incrementally with three additional commits since the last revision: - Fix Parallel NUMA printing - Use prefix for Parallel printing - Use prefix for Serial printing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/115dc41a..1cb525aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=08-09 Stats: 37 lines in 11 files changed: 7 ins; 8 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Tue Apr 22 15:05:08 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 22 Apr 2025 15:05:08 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v11] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > >> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. With this change, printing heap information and gc information is now two distinct steps in vmError.cpp. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoInden... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Use FormatBuffer instead of local char buf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/1cb525aa..e89b916f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=09-10 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From sviswanathan at openjdk.org Tue Apr 22 19:45:41 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 22 Apr 2025 19:45:41 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: <-pd5SaWoYezSlW6bjQ6s-9URqhtrBioPrBkh9hxDuq8=.8096bdf6-af58-465b-86cb-558ec99415c5@github.com> On Thu, 17 Apr 2025 03:21:08 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24664#pullrequestreview-2785160419 From jbhateja at openjdk.org Wed Apr 23 02:07:47 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Apr 2025 02:07:47 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 19:37:54 GMT, Dean Long wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > When I made my suggestions, I didn't realize it would also require changes on the Graal side. So I would suggest a separate PR only if the Graal team agrees. Hi @dean-long , I have created a follow up JBS (JDK-8355341) to capture your suggestion. Thanks for reviews @xmas92 and @sviswa7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2822875551 From jbhateja at openjdk.org Wed Apr 23 02:07:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Apr 2025 02:07:48 GMT Subject: Integrated: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 13:50:40 GMT, Jatin Bhateja wrote: > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. > > Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. > > Please review and share your feedback. > > Best Regards, > Jatin > > PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. This pull request has now been integrated. Changeset: 4c373703 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/4c373703d9ed63dfc85df7cdcc04ecad5b02ade0 Stats: 16 lines in 4 files changed: 5 ins; 5 del; 6 mod 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Reviewed-by: aboldtch, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/24664 From sspitsyn at openjdk.org Wed Apr 23 04:20:41 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 23 Apr 2025 04:20:41 GMT Subject: RFR: 8355069: Allocation::check_out_of_memory() should support CheckUnhandledOops mode [v2] In-Reply-To: References: Message-ID: On Sat, 19 Apr 2025 02:25:33 GMT, Leonid Mesnik wrote: >> The >> CheckUnhandledOops >> cause failure if JvmtiExport::post_resource_exhausted(...) >> is called in >> MemAllocator::Allocation::check_out_of_memory() >> The obj is null so it is not a real bug. >> >> I am fixing it to reduce noise for CheckUnhandledOops mode for jvmti tests execution. >> The vmTestbase/nsk/jvmti/ResourceExhausted/resexhausted002/TestDescription.java >> failed with -XX:+CheckUnhandledOops >> >> If define are unwelcome here, the >> ``` PreserveObj obj_h(_thread, _obj_ptr);``` >> might be added instead with comment why it is needed for null obj. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > typo fixes Looks okay to me. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24766#pullrequestreview-2785885049 From gli at openjdk.org Wed Apr 23 04:34:42 2025 From: gli at openjdk.org (Guoxiong Li) Date: Wed, 23 Apr 2025 04:34:42 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v2] In-Reply-To: References: <6lDEcjgVR_AB4ZIAgX7oMHGdXzVGx52RB_EzOqJKqMg=.97d50c03-9113-4309-bd93-35b83d54f470@github.com> Message-ID: On Tue, 22 Apr 2025 07:51:55 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/parallel/parallelArguments.cpp line 78: >> >>> 76: } else { >>> 77: FLAG_SET_DEFAULT(InitialSurvivorRatio, MinSurvivorRatio); >>> 78: } >> >> If both `InitialSurvivorRatio` and `MinSurvivorRatio` are not set in command line and the condition `InitialSurvivorRatio < MinSurvivorRatio` is true, it seems the corresponding default/ergonomic values, we set before, are wrong. Should we guard this situation (such as printing an error message) to catch the bug (unexpected default/ergonomic values) in the previous code? > >> If both InitialSurvivorRatio and MinSurvivorRatio are not set in command line and the condition InitialSurvivorRatio < MinSurvivorRatio is true > > When will that happen? AFAIS, if neither is set on command line, the default values should be MinSurvivorRatio == 3 and InitialSurvivorRatio == 8 (as defined in gc_globals.hpp), so `MinSurvivorRatio <= InitialSurvivorRatio` should hold. > >> Should we guard this situation (such as printing an error message) to catch the bug in the previous code? > > I didn't really understand your suggestion. Could you clarify what you mean by "previous code"? Or maybe some pseudo code to outline your suggestion? Please read the following code. It can help us catch the bug about the wrong default/ergonomic values. ```C++ if (InitialSurvivorRatio < MinSurvivorRatio) { if (FLAG_IS_CMDLINE(InitialSurvivorRatio)) { if (FLAG_IS_CMDLINE(MinSurvivorRatio)) { jio_fprintf(defaultStream::error_stream(), "Inconsistent MinSurvivorRatio vs InitialSurvivorRatio: %d vs %d\n", MinSurvivorRatio, InitialSurvivorRatio); } FLAG_SET_DEFAULT(MinSurvivorRatio, InitialSurvivorRatio); } else if (FLAG_IS_CMDLINE(MinSurvivorRatio)) { FLAG_SET_DEFAULT(InitialSurvivorRatio, MinSurvivorRatio); } else { // Here <---------- jio_fprintf(defaultStream::error_stream(), "Inconsistent default/ergonomic MinSurvivorRatio vs InitialSurvivorRatio: %d vs %d\n", MinSurvivorRatio, InitialSurvivorRatio); } } > When will that happen? AFAIS, if neither is set on command line, the default values should be MinSurvivorRatio == 3 and InitialSurvivorRatio == 8 (as defined in gc_globals.hpp), so MinSurvivorRatio <= InitialSurvivorRatio should hold. Yes, the current default values are good, but we may change them in the future. So such guard operation seems good and helps us find the bug earlier. May I be overthinking? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24556#discussion_r2055228577 From gli at openjdk.org Wed Apr 23 04:37:41 2025 From: gli at openjdk.org (Guoxiong Li) Date: Wed, 23 Apr 2025 04:37:41 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v3] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 07:55:17 GMT, Albert Mingkun Yang wrote: >> Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. >> >> Test: tier1-7 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into pgc-min-initial-fix > - review > - pgc-min-initial-fix > the default values should be MinSurvivorRatio == 3 and InitialSurvivorRatio == 8 (as defined in gc_globals.hpp), so MinSurvivorRatio <= InitialSurvivorRatio should hold. The patch looks fine, though I have a different viewpoint. ------------- Marked as reviewed by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24556#pullrequestreview-2785902574 From sjohanss at openjdk.org Wed Apr 23 08:03:41 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 23 Apr 2025 08:03:41 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking Message-ID: Please review this change to improve TLAB handling in ZGC. **Summary** In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: bool update_allocation_history = used > 0.5 * capacity; ``` So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. **Testing** * Functional testing tier1-tier7 * Performance testing in Aurora is neutral * Manual testing looking at TLAB waste shows a clear reduction, in some scenarios the waste could previously be above 2% and now it is below 1% * Manual verification that the worse case pauses are shorter due to the reduced work in the mark start pause ------------- Commit messages: - Change memory order to relaxed - More TLABUsage fixes - Junba-space - Fixes for eden tracking - Fixes for TLABUsage - Track eden-usage in page allocator - Add class to keep track of TLAB usage - Max tlab size should be reported in words Changes: https://git.openjdk.org/jdk/pull/24814/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353184 Stats: 195 lines in 12 files changed: 145 ins; 41 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From thartmann at openjdk.org Wed Apr 23 08:28:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Apr 2025 08:28:50 GMT Subject: RFR: 8354668: Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 03:21:08 GMT, Jatin Bhateja wrote: >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding. While most of the relocation records the patching offsets from the end of the instruction, SHL instruction, which is used for pointer coloring, computes the patching offset from the starting address of the instruction. >> >> Thus, in case the destination register operand of SHL instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte resulting into ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of SHL instruction from end of instruction, thereby making the patch offset agnostic to REX/REX2 prefix. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> PS: Validation were performed using latest Intel Software Development Emulator after modifying static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Backed out again with https://github.com/openjdk/jdk/pull/24815 due to failures in our testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24664#issuecomment-2823473931 From ayang at openjdk.org Wed Apr 23 10:00:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 23 Apr 2025 10:00:55 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v2] In-Reply-To: References: <6lDEcjgVR_AB4ZIAgX7oMHGdXzVGx52RB_EzOqJKqMg=.97d50c03-9113-4309-bd93-35b83d54f470@github.com> Message-ID: On Wed, 23 Apr 2025 04:32:20 GMT, Guoxiong Li wrote: >>> If both InitialSurvivorRatio and MinSurvivorRatio are not set in command line and the condition InitialSurvivorRatio < MinSurvivorRatio is true >> >> When will that happen? AFAIS, if neither is set on command line, the default values should be MinSurvivorRatio == 3 and InitialSurvivorRatio == 8 (as defined in gc_globals.hpp), so `MinSurvivorRatio <= InitialSurvivorRatio` should hold. >> >>> Should we guard this situation (such as printing an error message) to catch the bug in the previous code? >> >> I didn't really understand your suggestion. Could you clarify what you mean by "previous code"? Or maybe some pseudo code to outline your suggestion? > > Please read the following code. It can help us catch the bug about the wrong default/ergonomic values. > > ```C++ > if (InitialSurvivorRatio < MinSurvivorRatio) { > if (FLAG_IS_CMDLINE(InitialSurvivorRatio)) { > if (FLAG_IS_CMDLINE(MinSurvivorRatio)) { > jio_fprintf(defaultStream::error_stream(), > "Inconsistent MinSurvivorRatio vs InitialSurvivorRatio: %d vs %d\n", MinSurvivorRatio, InitialSurvivorRatio); > } > FLAG_SET_DEFAULT(MinSurvivorRatio, InitialSurvivorRatio); > } else if (FLAG_IS_CMDLINE(MinSurvivorRatio)) { > FLAG_SET_DEFAULT(InitialSurvivorRatio, MinSurvivorRatio); > } else { > // Here <---------- > jio_fprintf(defaultStream::error_stream(), > "Inconsistent default/ergonomic MinSurvivorRatio vs InitialSurvivorRatio: %d vs %d\n", MinSurvivorRatio, InitialSurvivorRatio); > } > } > > >> When will that happen? AFAIS, if neither is set on command line, the default values should be MinSurvivorRatio == 3 and InitialSurvivorRatio == 8 (as defined in gc_globals.hpp), so MinSurvivorRatio <= InitialSurvivorRatio should hold. > > Yes, the current default values are good, but we may change them in the future. So such guard operation seems good and helps us find the bug earlier. May I be overthinking? Thank you for the clarification -- the suggestion is about protecting ourselves from invalid default values in future changes. However, I feel such risk is extremely low, because those default values are completely under VM dev's control and do not change often (or even at all). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24556#discussion_r2055700506 From gli at openjdk.org Wed Apr 23 10:38:44 2025 From: gli at openjdk.org (Guoxiong Li) Date: Wed, 23 Apr 2025 10:38:44 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v2] In-Reply-To: References: <6lDEcjgVR_AB4ZIAgX7oMHGdXzVGx52RB_EzOqJKqMg=.97d50c03-9113-4309-bd93-35b83d54f470@github.com> Message-ID: On Wed, 23 Apr 2025 09:58:34 GMT, Albert Mingkun Yang wrote: > I feel such risk is extremely low, because those default values are completely under VM dev's control and do not change often (or even at all). Yes, I was overthinking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24556#discussion_r2055762087 From ayang at openjdk.org Wed Apr 23 10:44:51 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 23 Apr 2025 10:44:51 GMT Subject: RFR: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio [v3] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 07:55:17 GMT, Albert Mingkun Yang wrote: >> Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. >> >> Test: tier1-7 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into pgc-min-initial-fix > - review > - pgc-min-initial-fix Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24556#issuecomment-2823860889 From ayang at openjdk.org Wed Apr 23 10:44:51 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 23 Apr 2025 10:44:51 GMT Subject: Integrated: 8354228: Parallel: Set correct minimum of InitialSurvivorRatio In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 17:33:07 GMT, Albert Mingkun Yang wrote: > Updating the lower bound of InitialSurvivorRatio to match MinSurvivorRatio. The two removed test cases set conflicting Min and Intial SurvivorRatio, which, IMO, is an incorrect configuration, so I removed them. > > Test: tier1-7 This pull request has now been integrated. Changeset: 82c24944 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/82c249446f2bd6f3b0e612c5ef3e6bfcab388c3b Stats: 26 lines in 3 files changed: 12 ins; 13 del; 1 mod 8354228: Parallel: Set correct minimum of InitialSurvivorRatio Reviewed-by: tschatzl, gli ------------- PR: https://git.openjdk.org/jdk/pull/24556 From iwalulya at openjdk.org Wed Apr 23 11:01:41 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 23 Apr 2025 11:01:41 GMT Subject: RFR: 8346568: G1: Other time can be negative [v3] In-Reply-To: <1iumndO7Tu352QZf_8tPaSTlYqdRBtNVw7N_VHLj52E=.fc6d1856-8f3f-4e04-80ac-5b34dd3dbcb5@github.com> References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> <1iumndO7Tu352QZf_8tPaSTlYqdRBtNVw7N_VHLj52E=.fc6d1856-8f3f-4e04-80ac-5b34dd3dbcb5@github.com> Message-ID: On Sat, 19 Apr 2025 05:08:26 GMT, Sangheon Kim wrote: >> Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. >> 1. Different scope of measurement >> - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. >> (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) >> - Changed not to be included in sum_of_sub_phases. >> - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. >> 2. Duplicated measurement >> - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. >> 3. Pre Concurrent task execution time >> - Sometimes the difference between the existing average time and pre-concurrent work is 2 digit milliseconds. Changed to measure exact time rather than accumulating the average value to sum_of_sub_phases and keep displaying concurrent tasks' average execution time. >> >> Testing: tier 1 ~ 5 > > Sangheon Kim has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Review from Thomas > - Separate measurement for cleanup LGTM! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24454#pullrequestreview-2786884863 From shade at openjdk.org Wed Apr 23 11:32:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 11:32:07 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v3] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Touchups - Renames - Fully encapsulate Method* - Merge branch 'master' into JDK-8231269-compile-task-weaks - Shared utility class for method unload blocking - Merge branch 'master' into JDK-8231269-compile-task-weaks - JNIHandles -> VM(Weak) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/d965fef3..7f32b31b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=01-02 Stats: 272647 lines in 2272 files changed: 72121 ins; 193069 del; 7457 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Wed Apr 23 11:32:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 11:32:10 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v2] In-Reply-To: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> References: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> Message-ID: On Mon, 31 Mar 2025 18:46:53 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Shared utility class for method unload blocking > - Merge branch 'master' into JDK-8231269-compile-task-weaks > - JNIHandles -> VM(Weak) Pushed the `Method*` encapsulation. SA needs fixes now, but I'll test how well this works on other tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2823977158 From shade at openjdk.org Wed Apr 23 11:32:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 11:32:10 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v2] In-Reply-To: References: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> Message-ID: On Mon, 31 Mar 2025 23:40:09 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Shared utility class for method unload blocking >> - Merge branch 'master' into JDK-8231269-compile-task-weaks >> - JNIHandles -> VM(Weak) > > src/hotspot/share/runtime/methodUnloadBlocker.inline.hpp line 72: > >> 70: assert(!is_unloaded(), "Pre-condition: should not be unloaded"); >> 71: >> 72: if (!_weak_handle.is_empty()) { > > Does the precondition imply that `!_weak_handle.is_empty()` always hold? Not really. The precondition is: !is_unloaded() -> !(!_weak_handle.is_empty() && _weak_handle.peek() == nullptr) -> (_weak_handle.is_empty() || _weak_handle.peek() != nullptr) So you see, there is a case when weak handle is empty. It is when `_method` is `nullptr` (default initialized, no method is set), or its unload blocker is `nullptr` (method would be unloaded). Then we bypass the majority of weak->strong dance as unnecessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2055846204 From tschatzl at openjdk.org Wed Apr 23 11:46:16 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 23 Apr 2025 11:46:16 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v35] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 51 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review (part 2 - yield duration changes) - * ayang review (part 1) - * indentation fix - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * fixes after merge related to 32 bit x86 removal - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review: revising young gen length * robcasloz review: various minor refactorings - Do not unnecessarily pass around tmp2 in x86 - ... and 41 more: https://git.openjdk.org/jdk/compare/e76f2030...e4bf1ac0 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=34 Stats: 7101 lines in 110 files changed: 2581 ins; 3596 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From shade at openjdk.org Wed Apr 23 11:48:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 11:48:59 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v4] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Purge extra fluff ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/7f32b31b..2ec579ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=02-03 Stats: 7 lines in 2 files changed: 0 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From stefank at openjdk.org Wed Apr 23 13:06:03 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 23 Apr 2025 13:06:03 GMT Subject: RFR: 8355394: ZGC: Windows compile error in ZUtils Message-ID: We got a report of an encountered of the following compilation error: src\hotspot\share\gc/z/zUtils.inline.hpp(87): error C2066: cast to function type is illegal src\hotspot\share\gc/z/zUtils.inline.hpp(97): note: see reference to function template instantiation 'void ZUtils::sort>(T *,size_t,Comparator)' being compiled with [ T=zbacking_index, Comparator=sort_zbacking_index_array:: ] src\hotspot\share\gc\z\zPhysicalMemoryManager.cpp(303): note: see reference to function template instantiation 'void ZUtils::sort>(T *,int,Comparator)' being compiled with [ T=zbacking_index, Comparator=sort_zbacking_index_array:: ] make[3]: *** [lib/CompileJvm.gmk:174: jdk/build/windows-x86_64-server-fastdebug/hotspot/variant-server/libjvm/objs/zPhysicalMemoryManager.obj] Error 1 make[3]: *** Waiting for unfinished jobs.... make[2]: *** [make/Main.gmk:236: hotspot-server-libs] Error 2 (edited) We don't see this failure in our CI runs but with the help of the reporter we managed to tweak the code to get rid of the failure. Tested with tier1-2 in our CI. ------------- Commit messages: - Fix Windows Error Changes: https://git.openjdk.org/jdk/pull/24826/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24826&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355394 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24826.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24826/head:pull/24826 PR: https://git.openjdk.org/jdk/pull/24826 From aboldtch at openjdk.org Wed Apr 23 13:06:03 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 23 Apr 2025 13:06:03 GMT Subject: RFR: 8355394: ZGC: Windows compile error in ZUtils In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 12:59:10 GMT, Stefan Karlsson wrote: > We got a report of an encountered of the following compilation error: > > src\hotspot\share\gc/z/zUtils.inline.hpp(87): error C2066: cast to function type is illegal > src\hotspot\share\gc/z/zUtils.inline.hpp(97): note: see reference to function template instantiation 'void ZUtils::sort>(T *,size_t,Comparator)' being compiled > with > [ > T=zbacking_index, > Comparator=sort_zbacking_index_array:: > ] > src\hotspot\share\gc\z\zPhysicalMemoryManager.cpp(303): note: see reference to function template instantiation 'void ZUtils::sort>(T *,int,Comparator)' being compiled > with > [ > T=zbacking_index, > Comparator=sort_zbacking_index_array:: > ] > make[3]: *** [lib/CompileJvm.gmk:174: jdk/build/windows-x86_64-server-fastdebug/hotspot/variant-server/libjvm/objs/zPhysicalMemoryManager.obj] Error 1 > make[3]: *** Waiting for unfinished jobs.... > make[2]: *** [make/Main.gmk:236: hotspot-server-libs] Error 2 (edited) > > > We don't see this failure in our CI runs but with the help of the reporter we managed to tweak the code to get rid of the failure. > > Tested with tier1-2 in our CI. Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24826#pullrequestreview-2787222537 From stefank at openjdk.org Wed Apr 23 13:45:32 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 23 Apr 2025 13:45:32 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v3] In-Reply-To: References: Message-ID: > When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. > > This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. > > Other GCs have a filter to check for how old the Strings are before they get deduplicated. > > The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. > > Testing: > > * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. > > * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. > > * Tier1-7 > > Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. > > Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge remote-tracking branch 'upstream/master' into 8347337_string_dedup_at_promotion - Make ZPageAge ZForwarding member fileds constant - Review comments - Remove string dedup from marking - 8347337: ZGC: String dedups short-lived strings ------------- Changes: https://git.openjdk.org/jdk/pull/23965/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23965&range=02 Stats: 152 lines in 7 files changed: 108 ins; 34 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23965.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23965/head:pull/23965 PR: https://git.openjdk.org/jdk/pull/23965 From tschatzl at openjdk.org Wed Apr 23 14:05:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 23 Apr 2025 14:05:52 GMT Subject: RFR: 8355394: ZGC: Windows compile error in ZUtils In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 12:59:10 GMT, Stefan Karlsson wrote: > We got a report of an encountered of the following compilation error: > > src\hotspot\share\gc/z/zUtils.inline.hpp(87): error C2066: cast to function type is illegal > src\hotspot\share\gc/z/zUtils.inline.hpp(97): note: see reference to function template instantiation 'void ZUtils::sort>(T *,size_t,Comparator)' being compiled > with > [ > T=zbacking_index, > Comparator=sort_zbacking_index_array:: > ] > src\hotspot\share\gc\z\zPhysicalMemoryManager.cpp(303): note: see reference to function template instantiation 'void ZUtils::sort>(T *,int,Comparator)' being compiled > with > [ > T=zbacking_index, > Comparator=sort_zbacking_index_array:: > ] > make[3]: *** [lib/CompileJvm.gmk:174: jdk/build/windows-x86_64-server-fastdebug/hotspot/variant-server/libjvm/objs/zPhysicalMemoryManager.obj] Error 1 > make[3]: *** Waiting for unfinished jobs.... > make[2]: *** [make/Main.gmk:236: hotspot-server-libs] Error 2 (edited) > > > We don't see this failure in our CI runs but with the help of the reporter we managed to tweak the code to get rid of the failure. > > Tested with tier1-2 in our CI. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24826#pullrequestreview-2787457824 From stefank at openjdk.org Wed Apr 23 14:12:56 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 23 Apr 2025 14:12:56 GMT Subject: RFR: 8355394: ZGC: Windows compile error in ZUtils In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 12:59:10 GMT, Stefan Karlsson wrote: > We got a report of an encountered of the following compilation error: > > src\hotspot\share\gc/z/zUtils.inline.hpp(87): error C2066: cast to function type is illegal > src\hotspot\share\gc/z/zUtils.inline.hpp(97): note: see reference to function template instantiation 'void ZUtils::sort>(T *,size_t,Comparator)' being compiled > with > [ > T=zbacking_index, > Comparator=sort_zbacking_index_array:: > ] > src\hotspot\share\gc\z\zPhysicalMemoryManager.cpp(303): note: see reference to function template instantiation 'void ZUtils::sort>(T *,int,Comparator)' being compiled > with > [ > T=zbacking_index, > Comparator=sort_zbacking_index_array:: > ] > make[3]: *** [lib/CompileJvm.gmk:174: jdk/build/windows-x86_64-server-fastdebug/hotspot/variant-server/libjvm/objs/zPhysicalMemoryManager.obj] Error 1 > make[3]: *** Waiting for unfinished jobs.... > make[2]: *** [make/Main.gmk:236: hotspot-server-libs] Error 2 (edited) > > > We don't see this failure in our CI runs but with the help of the reporter we managed to tweak the code to get rid of the failure. > > Tested with tier1-2 in our CI. Thanks for the reviews! Tests passes tier1-2 and building in GHA, I'm pushing this in order to solve the compilation error for the reporter. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24826#issuecomment-2824442618 From stefank at openjdk.org Wed Apr 23 14:12:57 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 23 Apr 2025 14:12:57 GMT Subject: Integrated: 8355394: ZGC: Windows compile error in ZUtils In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 12:59:10 GMT, Stefan Karlsson wrote: > We got a report of an encountered of the following compilation error: > > src\hotspot\share\gc/z/zUtils.inline.hpp(87): error C2066: cast to function type is illegal > src\hotspot\share\gc/z/zUtils.inline.hpp(97): note: see reference to function template instantiation 'void ZUtils::sort>(T *,size_t,Comparator)' being compiled > with > [ > T=zbacking_index, > Comparator=sort_zbacking_index_array:: > ] > src\hotspot\share\gc\z\zPhysicalMemoryManager.cpp(303): note: see reference to function template instantiation 'void ZUtils::sort>(T *,int,Comparator)' being compiled > with > [ > T=zbacking_index, > Comparator=sort_zbacking_index_array:: > ] > make[3]: *** [lib/CompileJvm.gmk:174: jdk/build/windows-x86_64-server-fastdebug/hotspot/variant-server/libjvm/objs/zPhysicalMemoryManager.obj] Error 1 > make[3]: *** Waiting for unfinished jobs.... > make[2]: *** [make/Main.gmk:236: hotspot-server-libs] Error 2 (edited) > > > We don't see this failure in our CI runs but with the help of the reporter we managed to tweak the code to get rid of the failure. > > Tested with tier1-2 in our CI. This pull request has now been integrated. Changeset: 023f30bc Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/023f30bcaa820080ed5b5aa6f9a0a996a62c7d34 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8355394: ZGC: Windows compile error in ZUtils Co-authored-by: Axel Boldt-Christmas Reviewed-by: aboldtch, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/24826 From kdnilsen at openjdk.org Wed Apr 23 15:47:24 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 23 Apr 2025 15:47:24 GMT Subject: RFR: 8355336: GenShen: Resume Old GC even with back-to-back Young GC triggers Message-ID: Allow old-gen concurrent marking cycles to get their full time slice even when young-gc is triggered back-to-back. ------------- Commit messages: - Fix white space - Allow old-gen incrments of work between young gc - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 26 more: https://git.openjdk.org/jdk/compare/b7e8952a...322e2eca Changes: https://git.openjdk.org/jdk/pull/24810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24810&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355336 Stats: 26 lines in 3 files changed: 8 ins; 14 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24810/head:pull/24810 PR: https://git.openjdk.org/jdk/pull/24810 From jsikstro at openjdk.org Wed Apr 23 16:02:39 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 23 Apr 2025 16:02:39 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v12] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > >> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to ap... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Use MutexLocker instead of lock/unlock in vmError.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/e89b916f..b9b975de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=10-11 Stats: 20 lines in 1 file changed: 2 ins; 2 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From lkorinth at openjdk.org Wed Apr 23 16:02:40 2025 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 23 Apr 2025 16:02:40 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v11] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 15:05:08 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >>> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argu... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Use FormatBuffer instead of local char buf src/hotspot/share/utilities/vmError.cpp line 1395: > 1393: // Take heap lock over both heap and GC printing so that information is > 1394: // consistent. > 1395: Heap_lock->lock(); I would prefer using `MutexLocker` even though it changes indentation (of the source code). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2056384316 From lkorinth at openjdk.org Wed Apr 23 16:35:59 2025 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 23 Apr 2025 16:35:59 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v11] In-Reply-To: References: Message-ID: <1LetmrR7P0aNn-68DtV_k0BnHDnd7LGVGnkxAKDsgCI=.e2e9678b-862c-4ff6-81c5-4d684f263ac1@github.com> On Tue, 22 Apr 2025 15:05:08 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >>> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argu... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Use FormatBuffer instead of local char buf src/hotspot/share/utilities/ostream.hpp line 190: > 188: }; > 189: > 190: class StreamAutoIndentor : public streamIndentor { Just a note, not a request for change: I would personally have used composition instead of inheritance here (of `streamIndentor`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2056458328 From sangheki at openjdk.org Wed Apr 23 16:40:00 2025 From: sangheki at openjdk.org (Sangheon Kim) Date: Wed, 23 Apr 2025 16:40:00 GMT Subject: RFR: 8346568: G1: Other time can be negative [v3] In-Reply-To: References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> <1iumndO7Tu352QZf_8tPaSTlYqdRBtNVw7N_VHLj52E=.fc6d1856-8f3f-4e04-80ac-5b34dd3dbcb5@github.com> Message-ID: On Tue, 22 Apr 2025 07:44:57 GMT, Thomas Schatzl wrote: >> Sangheon Kim has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Review from Thomas >> - Separate measurement for cleanup > > Marked as reviewed by tschatzl (Reviewer). Thanks for your review, @tschatzl and @walulyai ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24454#issuecomment-2824889451 From sangheki at openjdk.org Wed Apr 23 16:40:00 2025 From: sangheki at openjdk.org (Sangheon Kim) Date: Wed, 23 Apr 2025 16:40:00 GMT Subject: Integrated: 8346568: G1: Other time can be negative In-Reply-To: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> References: <0A-BDKTohMv3ziXO4LrtniptKNCWYvZZfVKMWAUK6iA=.7fbd372c-f2ed-417c-8517-073e0a9a5276@github.com> Message-ID: On Fri, 4 Apr 2025 18:00:21 GMT, Sangheon Kim wrote: > Other time described in this bug is displayed at G1GCPhaseTimes::print_other(total_measured_time - sum_of_sub_phases). And the value can be negative for 3 reasons. > 1. Different scope of measurement > - 3 variables is out of scope from total_measured_time. Those used for wait-root-region-scan, verify-before/after. > (_root_region_scan_wait_time_ms, _cur_verify_before_time_ms and _cur_verify_after_time_ms) > - Changed not to be included in sum_of_sub_phases. > - One may want to include them in total_measured_time but I think it is better to be addressed in a separate ticket. > 2. Duplicated measurement > - Initial and optional evacuation time include nmethod-cleanup-time, so separated them as we are already measuring them. As there is no public getter, just added cleanup time when those evacuation time are used internally. > 3. Pre Concurrent task execution time > - Sometimes the difference between the existing average time and pre-concurrent work is 2 digit milliseconds. Changed to measure exact time rather than accumulating the average value to sum_of_sub_phases and keep displaying concurrent tasks' average execution time. > > Testing: tier 1 ~ 5 This pull request has now been integrated. Changeset: 8bd56452 Author: Sangheon Kim URL: https://git.openjdk.org/jdk/commit/8bd564521804e98911cc9ff3b7696165e3243139 Stats: 68 lines in 4 files changed: 36 ins; 20 del; 12 mod 8346568: G1: Other time can be negative Reviewed-by: tschatzl, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/24454 From wkemper at openjdk.org Wed Apr 23 16:50:50 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 23 Apr 2025 16:50:50 GMT Subject: RFR: 8355336: GenShen: Resume Old GC even with back-to-back Young GC triggers In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 00:57:53 GMT, Kelvin Nilsen wrote: > Allow old-gen concurrent marking cycles to get their full time slice even when young-gc is triggered back-to-back. I'm okay with this, but it will have the regulator thread give higher priority to the old generation when the collector is idle. Have we looked closely at performance results? src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp line 80: > 78: _old_heuristics->cancel_trigger_request(); > 79: } else if (start_young_cycle()) { > 80: log_debug(gc)("Heuristics request for young collection accepted"); Indentation looks a little off here. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24810#pullrequestreview-2788071594 PR Review Comment: https://git.openjdk.org/jdk/pull/24810#discussion_r2056487435 From shade at openjdk.org Wed Apr 23 17:26:36 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 17:26:36 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v5] In-Reply-To: References: Message-ID: <9ZFEqmXrFwO-bYV3AC8JAg_B8f0HGDzzKLoMH2z9CAI=.1f4de885-d63a-41e8-a02e-2779007777ca@github.com> > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Fix VMStructs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/2ec579ca..63650fab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=03-04 Stats: 3 lines in 2 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Wed Apr 23 17:26:37 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 17:26:37 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v2] In-Reply-To: References: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> Message-ID: On Wed, 23 Apr 2025 11:29:34 GMT, Aleksey Shipilev wrote: > SA needs fixes now, but I'll test how well this works on other tests. Actually, I can just purge `CompileTask.java`: https://github.com/openjdk/jdk/pull/24832 I see that async-profiler uses the `CompileTask::_method` field directly, I think to see what compiler threads are up to. So it needs to be fixed after this PR lands, and it would dereference through the newly added handle. Luckily, I think that access only happens when compilation is already running, and `method*` is guaranteed to be alive. Paging @apangin for visibility. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2825010041 From shade at openjdk.org Wed Apr 23 17:31:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 17:31:07 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Allow UMH::_method access from VMStructs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/63650fab..91d38ff1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=04-05 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From lkorinth at openjdk.org Wed Apr 23 17:32:43 2025 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 23 Apr 2025 17:32:43 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v12] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 16:02:39 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >>> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argu... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Use MutexLocker instead of lock/unlock in vmError.cpp I think this looks great. Thank you. You might want to run a bit more internal testing; it is too easy to write a test that looks for specific formatting in the logs. Mostly unrelated to this change: I think that `outputStream` probably should have a version of `fill_to` that guarantees a separating space so that a trailing space does not need to be put before the call (it ought to be by *far* the most common case). That is clearly out of scope for this change however. ------------- Marked as reviewed by lkorinth (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24593#pullrequestreview-2788195879 From shade at openjdk.org Wed Apr 23 17:33:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 23 Apr 2025 17:33:48 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 17:31:07 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Allow UMH::_method access from VMStructs I re-ran testing, and it looks green. So we can start polishing this thing for eventual integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2825030285 From wkemper at openjdk.org Wed Apr 23 20:27:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 23 Apr 2025 20:27:53 GMT Subject: RFR: 8353596: GenShen: Test TestClone.java#generational-no-coops intermittent timed out Message-ID: We've identified another scenario that could result in intermittent timeout failures in jtreg tests. If the cause of the gc cycle is `GCCause::_codecache_GC_threshold`, the thread requesting the GC will not be notified. ------------- Commit messages: - Notify gc waiter and alloc failure waiters when a gc completes successfully Changes: https://git.openjdk.org/jdk/pull/24834/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24834&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353596 Stats: 7 lines in 1 file changed: 1 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24834/head:pull/24834 PR: https://git.openjdk.org/jdk/pull/24834 From kdnilsen at openjdk.org Wed Apr 23 20:27:53 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 23 Apr 2025 20:27:53 GMT Subject: RFR: 8353596: GenShen: Test TestClone.java#generational-no-coops intermittent timed out In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 20:17:45 GMT, William Kemper wrote: > We've identified another scenario that could result in intermittent timeout failures in jtreg tests. If the cause of the gc cycle is `GCCause::_codecache_GC_threshold`, the thread requesting the GC will not be notified. Thanks for quick turnaround. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/24834#pullrequestreview-2788605951 From ysr at openjdk.org Wed Apr 23 22:50:55 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 23 Apr 2025 22:50:55 GMT Subject: RFR: 8353596: GenShen: Test TestClone.java#generational-no-coops intermittent timed out In-Reply-To: References: Message-ID: <10EkhLGZbkBYjscXFuo2u7doVzGuBfJ0W1z_LQf8SRQ=.a963a760-21ba-4841-be67-76335acbe6b9@github.com> On Wed, 23 Apr 2025 20:17:45 GMT, William Kemper wrote: > We've identified another scenario that could result in intermittent timeout failures in jtreg tests. If the cause of the gc cycle is `GCCause::_codecache_GC_threshold`, the thread requesting the GC will not be notified. This is a safe change. Did you check if there were any other spots that might have this issue? In particular, I see `ShenandoahControlThread::run_service()` which notifies whenever a GC has been requested (which is, I assume, morally equivalent to what's being done here), except we do this whenever we don't cancel the GC -- it could be unnecessary in some cases, I guess, which the original code for Shenandoah was trying to avoid. Not a big deal, but to the extent we can keep the logic similar in both (or shared as much as possible), the fewer such divergence in behavior between the two for the common cases. It would be good to document more completely this method of `ShenandoahGenerationalControlThread` in terms of who it must notify and when. // Executes one GC cycle void run_gc_cycle(const ShenandoahGCRequest& request); (For example, an equally valid change may have been to change the condition for the cause tested at line 277 to include the code cache induced gc cause.) Change looks good and safe modulo those more general comments. No changes are needed in this PR, but something for us to keep in mind to make this code more robust and maintainable. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24834#pullrequestreview-2788826245 From wkemper at openjdk.org Wed Apr 23 23:02:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 23 Apr 2025 23:02:54 GMT Subject: RFR: 8353596: GenShen: Test TestClone.java#generational-no-coops intermittent timed out In-Reply-To: References: Message-ID: <4B_A_HEoSHsN63co339n9DX4drOcOyXE4rydAoYxilM=.ef92a95f-a67f-4e60-aec7-c6afe0779b18@github.com> On Wed, 23 Apr 2025 20:17:45 GMT, William Kemper wrote: > We've identified another scenario that could result in intermittent timeout failures in jtreg tests. If the cause of the gc cycle is `GCCause::_codecache_GC_threshold`, the thread requesting the GC will not be notified. That's a good call out. I don't believe `ShenandoahControlThread` is affected by this same issue because it will notify if `_gc_requested` is set, and doesn't care about the _cause_ of the gc. I did also consider adding `_codecache_GC_threshold` as a cause for explicit gc, except I'm not sure it really is (in the sense that the user did not explicitly request it). Also, it felt like a losing game because the next time a new gc cause is added, we'd need to evaluate whether or not it was `explicit`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24834#issuecomment-2825691769 From wkemper at openjdk.org Wed Apr 23 23:02:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 23 Apr 2025 23:02:54 GMT Subject: Integrated: 8353596: GenShen: Test TestClone.java#generational-no-coops intermittent timed out In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 20:17:45 GMT, William Kemper wrote: > We've identified another scenario that could result in intermittent timeout failures in jtreg tests. If the cause of the gc cycle is `GCCause::_codecache_GC_threshold`, the thread requesting the GC will not be notified. This pull request has now been integrated. Changeset: ac17449b Author: William Kemper URL: https://git.openjdk.org/jdk/commit/ac17449bdb946d98cb65c8eae9c9671f527a79cb Stats: 7 lines in 1 file changed: 1 ins; 5 del; 1 mod 8353596: GenShen: Test TestClone.java#generational-no-coops intermittent timed out Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/24834 From ysr at openjdk.org Wed Apr 23 23:06:45 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 23 Apr 2025 23:06:45 GMT Subject: RFR: 8353596: GenShen: Test TestClone.java#generational-no-coops intermittent timed out In-Reply-To: <4B_A_HEoSHsN63co339n9DX4drOcOyXE4rydAoYxilM=.ef92a95f-a67f-4e60-aec7-c6afe0779b18@github.com> References: <4B_A_HEoSHsN63co339n9DX4drOcOyXE4rydAoYxilM=.ef92a95f-a67f-4e60-aec7-c6afe0779b18@github.com> Message-ID: On Wed, 23 Apr 2025 22:57:52 GMT, William Kemper wrote: > That's a good call out. I don't believe `ShenandoahControlThread` is affected by this same issue because it will notify if `_gc_requested` is set, and doesn't care about the _cause_ of the gc. Correct. > I did also consider adding `_codecache_GC_threshold` as a cause for explicit gc, except I'm not sure it really is (in the sense that the user did not explicitly request it). Also, it felt like a losing game because the next time a new gc cause is added, we'd need to evaluate whether or not it was `explicit`. I agree that this works better as it notifies gc waiters in all cases. One question is where `_gc_requested` is reset and whether to keep the two controllers' logic similar, if an identical condition were used here (or an identical condition were used in the case of non-generational controller), and if not why it would not work (perhaps because the requests are handled differently). But, yes, such a clean-up is for another day, not this PR. ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24834#issuecomment-2825719660 From vlivanov at openjdk.org Thu Apr 24 00:39:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 24 Apr 2025 00:39:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 17:31:07 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Allow UMH::_method access from VMStructs Nice work! src/hotspot/share/runtime/unloadableMethodHandle.hpp line 43: > 41: // 3. Final released state. Relevant Method* is in unknown state, and cannot be > 42: // accessed. > 43: // Please, elaborate what state transitions are supported. Currently, my understanding is there are 3 transitions and 4 states: * 1 -> 2 * 2 -> 3 (terminal) * 1 -> 3 (terminal) * 0 (empty, terminal) src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 26: > 24: > 25: #ifndef SHARE_RUNTIME_METHOD_UNLOAD_BLOCKER_HANDLE_INLINE_HPP > 26: #define SHARE_RUNTIME_METHOD_UNLOAD_BLOCKER_HANDLE_INLINE_HPP Stale header file name used? src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 37: > 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { > 36: _method = method; > 37: if (method != nullptr) { Is it possible to require `method` (and hence `_method`) to always be non-null? src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 57: > 55: > 56: // Null holder, the relevant class would not be unloaded. > 57: return nullptr; Is this the case of bootstrap classloader? As an optimization opportunity, it can be extended for other system loaders. src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 93: > 91: > 92: inline Method* UnloadableMethodHandle::method() const { > 93: assert(!is_unloaded(), "Should not be unloaded"); Assert that `block_unloading()` was called before? ------------- PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2788983703 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057101817 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057087135 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057089091 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057091706 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057084091 From aboldtch at openjdk.org Thu Apr 24 06:42:50 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 24 Apr 2025 06:42:50 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v3] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 13:45:32 GMT, Stefan Karlsson wrote: >> When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. >> >> This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. >> >> Other GCs have a filter to check for how old the Strings are before they get deduplicated. >> >> The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. >> >> Testing: >> >> * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. >> >> * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. >> >> * Tier1-7 >> >> Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. >> >> Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge remote-tracking branch 'upstream/master' into 8347337_string_dedup_at_promotion > - Make ZPageAge ZForwarding member fileds constant > - Review comments > - Remove string dedup from marking > - 8347337: ZGC: String dedups short-lived strings lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23965#pullrequestreview-2789896793 From kbarrett at openjdk.org Thu Apr 24 07:12:55 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 24 Apr 2025 07:12:55 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v3] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 13:45:32 GMT, Stefan Karlsson wrote: >> When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. >> >> This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. >> >> Other GCs have a filter to check for how old the Strings are before they get deduplicated. >> >> The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. >> >> Testing: >> >> * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. >> >> * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. >> >> * Tier1-7 >> >> Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. >> >> Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge remote-tracking branch 'upstream/master' into 8347337_string_dedup_at_promotion > - Make ZPageAge ZForwarding member fileds constant > - Review comments > - Remove string dedup from marking > - 8347337: ZGC: String dedups short-lived strings Still good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23965#pullrequestreview-2789986871 From stefank at openjdk.org Thu Apr 24 07:23:04 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 24 Apr 2025 07:23:04 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v3] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 13:45:32 GMT, Stefan Karlsson wrote: >> When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. >> >> This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. >> >> Other GCs have a filter to check for how old the Strings are before they get deduplicated. >> >> The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. >> >> Testing: >> >> * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. >> >> * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. >> >> * Tier1-7 >> >> Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. >> >> Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge remote-tracking branch 'upstream/master' into 8347337_string_dedup_at_promotion > - Make ZPageAge ZForwarding member fileds constant > - Review comments > - Remove string dedup from marking > - 8347337: ZGC: String dedups short-lived strings Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23965#issuecomment-2826620564 From stefank at openjdk.org Thu Apr 24 07:23:05 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 24 Apr 2025 07:23:05 GMT Subject: Integrated: 8347337: ZGC: String dedups short-lived strings In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 15:08:16 GMT, Stefan Karlsson wrote: > When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. > > This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. > > Other GCs have a filter to check for how old the Strings are before they get deduplicated. > > The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. > > Testing: > > * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. > > * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. > > * Tier1-7 > > Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. > > Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. This pull request has now been integrated. Changeset: 953eef4f Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/953eef4f113408ab4ae432308f9848f7d226c619 Stats: 152 lines in 7 files changed: 108 ins; 34 del; 10 mod 8347337: ZGC: String dedups short-lived strings Reviewed-by: kbarrett, aboldtch, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/23965 From shade at openjdk.org Thu Apr 24 09:32:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 24 Apr 2025 09:32:51 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 00:20:03 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow UMH::_method access from VMStructs > > src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 26: > >> 24: >> 25: #ifndef SHARE_RUNTIME_METHOD_UNLOAD_BLOCKER_HANDLE_INLINE_HPP >> 26: #define SHARE_RUNTIME_METHOD_UNLOAD_BLOCKER_HANDLE_INLINE_HPP > > Stale header file name used? Right. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2057962282 From stefank at openjdk.org Thu Apr 24 12:46:49 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 24 Apr 2025 12:46:49 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v12] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 16:02:39 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >>> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argu... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Use MutexLocker instead of lock/unlock in vmError.cpp Looks good. One suggestion: src/hotspot/share/gc/shared/collectedHeap.hpp line 442: > 440: > 441: // Print additional information about the GC that is not included in print_heap_on(). > 442: // Generally used for printing information in case of a fatal error. Given that this is also used in VM.info I think it would be better to skip this comment. Suggestion: ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24593#pullrequestreview-2790903684 PR Review Comment: https://git.openjdk.org/jdk/pull/24593#discussion_r2058263736 From jsikstro at openjdk.org Thu Apr 24 14:03:17 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 24 Apr 2025 14:03:17 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v13] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > >> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to ap... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/gc/shared/collectedHeap.hpp Co-authored-by: Stefan Karlsson ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/b9b975de..a9dc76a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From jsikstro at openjdk.org Thu Apr 24 14:07:04 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 24 Apr 2025 14:07:04 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v14] In-Reply-To: References: Message-ID: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > >> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to ap... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24593/files - new: https://git.openjdk.org/jdk/pull/24593/files/a9dc76a3..daf58f9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24593&range=12-13 Stats: 10 lines in 10 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24593/head:pull/24593 PR: https://git.openjdk.org/jdk/pull/24593 From stefank at openjdk.org Thu Apr 24 14:11:46 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 24 Apr 2025 14:11:46 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v14] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 14:07:04 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >>> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argu... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Copyright years Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24593#pullrequestreview-2791359217 From lkorinth at openjdk.org Thu Apr 24 14:11:45 2025 From: lkorinth at openjdk.org (Leo Korinth) Date: Thu, 24 Apr 2025 14:11:45 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v14] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 14:07:04 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >>> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argu... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Copyright years lgtm ------------- Marked as reviewed by lkorinth (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24593#pullrequestreview-2791351835 From jsikstro at openjdk.org Thu Apr 24 14:17:24 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 24 Apr 2025 14:17:24 GMT Subject: RFR: 8354362: Use automatic indentation in CollectedHeap printing [v14] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 14:07:04 GMT, Joel Sikstr?m wrote: >> Hello, >> >>> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. >> >>> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. >> >> Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. >> >> What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: >> 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. >> 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). >> >> Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. >> >> I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. >> >> Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argu... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Copyright years Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24593#issuecomment-2827774330 From jsikstro at openjdk.org Thu Apr 24 14:17:24 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 24 Apr 2025 14:17:24 GMT Subject: Integrated: 8354362: Use automatic indentation in CollectedHeap printing In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 11:28:12 GMT, Joel Sikstr?m wrote: > Hello, > >> This PR only focuses on fixing indentation and re-arranging some callsites. It does *not* change the contents of any output, apart from some (IMO relevant) indentation/whitespace additions. > >> Update: With some suggestions from @stefank, I've renamed print_on to print_heap_on and print_on_error to print_gc_on to better reflect their purpose where they're called. With this I've also renamed other instances of print_on_error to better reflect their purpose. Printing heap information and printing gc information is now two distinct steps in vmError.cpp. > > Currently, the CollectedHeap printing code (print_on and print_on_error, with calls "below") prepends spaces in messages in a way that only makes sense if you write the code and then check the output to see if you've done everything correctly. To make writing and maintaining printing code easy, I propose we move to a system where each printing method, starting at callers of print_on and print_on_error, uses the indentation API in outputStream and does not rely on prepending spaces like is done right now. > > What I propose is that any (GC) printing method should not make any assumptions of the indentation level of its caller(s). This means that each function shall: > 1. Not prepend any spaces to its printing, and instead expect that the caller(s) should handle any indentation before calling this function. > 2. Enforce its own indentation, by enabling auto indentation in its own context and for its "lower level" calls (which is often the desired outcome). > > Combining these two rules means that *any* (GC) printing method can be called from anywhere and give sensible output, without (seemingly random) indentation of expectations elsewhere. > > I have aggregated calls that print on the same indentation level to the same callsite. This makes it clear where to look in the code and also makes it easier to add/enforce indendation. To this end, I have re-arranged print_on_error so that it never includes print_on. The new system I propose is that print_on and print_on_error can be called separately for different information, which aligns well with having the same callsite for the same indentation. See changes in vmError.cpp for how this is implemented. > > Instead of prepending spaces, I use StreamAutoIndentor, defined in ostream.hpp. To make using automatic indentation easier, I've made some changes to StreamAutoIndentor so that it inherits from streamIndentor and also add an *optional* argument to StreamAutoIndentor to ap... This pull request has now been integrated. Changeset: cf96b107 Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/cf96b107d57182ad6ab47125939423dd5286aa88 Stats: 363 lines in 46 files changed: 104 ins; 93 del; 166 mod 8354362: Use automatic indentation in CollectedHeap printing Reviewed-by: stefank, lkorinth, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/24593 From kdnilsen at openjdk.org Thu Apr 24 15:57:49 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 24 Apr 2025 15:57:49 GMT Subject: RFR: 8355340: GenShen: Remove unneeded log messages related to remembered set write table Message-ID: Remove unneeded log messages related to processing of the remembered set write card table. ------------- Commit messages: - Remove one additional log message - Remove extraneous log messages - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 25 more: https://git.openjdk.org/jdk/compare/83c7d3bb...c1f65632 Changes: https://git.openjdk.org/jdk/pull/24809/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24809&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355340 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24809.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24809/head:pull/24809 PR: https://git.openjdk.org/jdk/pull/24809 From xpeng at openjdk.org Thu Apr 24 22:47:28 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 24 Apr 2025 22:47:28 GMT Subject: RFR: 8354431: gc/logging/TestGCId fails on Shenandoah Message-ID: I can't reproduce the issue in Linux, but based on the gc log share in JBS bug and the related code, it is easy to find the root cause of it. [0.133s][info][gc] GC(0) Concurrent reset after collect (unload classes) 0.033ms [0.135s][info][gc] Trigger: Learning 2 of 5. Free (3958K) is below initial threshold (7167K) [0.135s][info][gc] Failed to allocate Shared, 128K [0.137s][info][gc] Trigger: Handle Allocation Failure [0.148s][info][gc] GC(2) Degenerated GC upgrading to Full GC [0.167s][info][gc] GC(2) Pause Degenerated GC (Outside of Cycle) 5M->1M(10M) 30.323ms At 0.135s, a concurrent cycle was triggered: `[0.135s][info][gc] Trigger: Learning 2 of 5.`, meanwhile there was allocation failure causing degen: `[0.135s][info][gc] Failed to allocate Shared, 128K`. In the implementation of ShenandoahControlThread::service_concurrent_normal_cycle, it checks if there degen or cancellation before starting the concurrent cycle and return w/o any gc log, which causes the missing GCID 1 in the GC log. Since technically it is not a bug, to fix the potential failure of test `gc/logging/TestGCId` I'll add one line code to print GC log like "[0.135s][info][gc] GC(1) Cancelled" ### Test - [x] gc/logging/TestGCId - [x] hotspot_gc_shenandoah ------------- Commit messages: - 8354431: gc/logging/TestGCId fails on Shenandoah Changes: https://git.openjdk.org/jdk/pull/24856/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24856&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354431 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24856.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24856/head:pull/24856 PR: https://git.openjdk.org/jdk/pull/24856 From wkemper at openjdk.org Thu Apr 24 22:59:50 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 24 Apr 2025 22:59:50 GMT Subject: RFR: 8354431: gc/logging/TestGCId fails on Shenandoah In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 17:41:52 GMT, Xiaolong Peng wrote: > I can't reproduce the issue in Linux, but based on the gc log share in JBS bug and the related code, it is easy to find the root cause of it. > > > [0.133s][info][gc] GC(0) Concurrent reset after collect (unload classes) 0.033ms > [0.135s][info][gc] Trigger: Learning 2 of 5. Free (3958K) is below initial threshold (7167K) > [0.135s][info][gc] Failed to allocate Shared, 128K > [0.137s][info][gc] Trigger: Handle Allocation Failure > [0.148s][info][gc] GC(2) Degenerated GC upgrading to Full GC > [0.167s][info][gc] GC(2) Pause Degenerated GC (Outside of Cycle) 5M->1M(10M) 30.323ms > > > At 0.135s, a concurrent cycle was triggered: `[0.135s][info][gc] Trigger: Learning 2 of 5.`, meanwhile there was allocation failure causing degen: `[0.135s][info][gc] Failed to allocate Shared, 128K`. > In the implementation of ShenandoahControlThread::service_concurrent_normal_cycle, it checks if there degen or cancellation before starting the concurrent cycle and return w/o any gc log, which causes the missing GCID 1 in the GC log. > > Since technically it is not a bug, to fix the potential failure of test `gc/logging/TestGCId` I'll add one line code to print GC log like "[0.135s][info][gc] GC(1) Cancelled" > > ### Test > - [x] gc/logging/TestGCId > - [x] hotspot_gc_shenandoah Looks good to me. Thank you for the quick diagnosis and fix. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24856#pullrequestreview-2792691440 From phh at openjdk.org Thu Apr 24 23:26:55 2025 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 24 Apr 2025 23:26:55 GMT Subject: RFR: 8354431: gc/logging/TestGCId fails on Shenandoah In-Reply-To: References: Message-ID: <7DwiLIqXphVlnHTCdhU43JiL-5Wz21yAzyxs-KH7X_M=.72eaad80-eea4-4a65-bc39-9e8b98e32af5@github.com> On Thu, 24 Apr 2025 17:41:52 GMT, Xiaolong Peng wrote: > I can't reproduce the issue in Linux, but based on the gc log shared in JBS bug and the related code, it is easy to find the root cause of it. > > > [0.133s][info][gc] GC(0) Concurrent reset after collect (unload classes) 0.033ms > [0.135s][info][gc] Trigger: Learning 2 of 5. Free (3958K) is below initial threshold (7167K) > [0.135s][info][gc] Failed to allocate Shared, 128K > [0.137s][info][gc] Trigger: Handle Allocation Failure > [0.148s][info][gc] GC(2) Degenerated GC upgrading to Full GC > [0.167s][info][gc] GC(2) Pause Degenerated GC (Outside of Cycle) 5M->1M(10M) 30.323ms > > > At 0.135s, a concurrent cycle was triggered: `[0.135s][info][gc] Trigger: Learning 2 of 5.`, meanwhile there was allocation failure causing degen: `[0.135s][info][gc] Failed to allocate Shared, 128K`. > In the implementation of ShenandoahControlThread::service_concurrent_normal_cycle, it checks if there degen or cancellation before starting the concurrent cycle and return w/o any gc log, which causes the missing GCID 1 in the GC log. > > Since technically it is not a bug, to fix the potential failure of test `gc/logging/TestGCId` I'll add one line code to print GC log like "[0.135s][info][gc] GC(1) Cancelled" > > ### Test > - [x] gc/logging/TestGCId > - [x] hotspot_gc_shenandoah Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24856#pullrequestreview-2792713348 From xpeng at openjdk.org Thu Apr 24 23:26:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 24 Apr 2025 23:26:56 GMT Subject: RFR: 8354431: gc/logging/TestGCId fails on Shenandoah In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 17:41:52 GMT, Xiaolong Peng wrote: > I can't reproduce the issue in Linux, but based on the gc log shared in JBS bug and the related code, it is easy to find the root cause of it. > > > [0.133s][info][gc] GC(0) Concurrent reset after collect (unload classes) 0.033ms > [0.135s][info][gc] Trigger: Learning 2 of 5. Free (3958K) is below initial threshold (7167K) > [0.135s][info][gc] Failed to allocate Shared, 128K > [0.137s][info][gc] Trigger: Handle Allocation Failure > [0.148s][info][gc] GC(2) Degenerated GC upgrading to Full GC > [0.167s][info][gc] GC(2) Pause Degenerated GC (Outside of Cycle) 5M->1M(10M) 30.323ms > > > At 0.135s, a concurrent cycle was triggered: `[0.135s][info][gc] Trigger: Learning 2 of 5.`, meanwhile there was allocation failure causing degen: `[0.135s][info][gc] Failed to allocate Shared, 128K`. > In the implementation of ShenandoahControlThread::service_concurrent_normal_cycle, it checks if there degen or cancellation before starting the concurrent cycle and return w/o any gc log, which causes the missing GCID 1 in the GC log. > > Since technically it is not a bug, to fix the potential failure of test `gc/logging/TestGCId` I'll add one line code to print GC log like "[0.135s][info][gc] GC(1) Cancelled" > > ### Test > - [x] gc/logging/TestGCId > - [x] hotspot_gc_shenandoah Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24856#issuecomment-2829060545 From xpeng at openjdk.org Thu Apr 24 23:26:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 24 Apr 2025 23:26:56 GMT Subject: Integrated: 8354431: gc/logging/TestGCId fails on Shenandoah In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 17:41:52 GMT, Xiaolong Peng wrote: > I can't reproduce the issue in Linux, but based on the gc log shared in JBS bug and the related code, it is easy to find the root cause of it. > > > [0.133s][info][gc] GC(0) Concurrent reset after collect (unload classes) 0.033ms > [0.135s][info][gc] Trigger: Learning 2 of 5. Free (3958K) is below initial threshold (7167K) > [0.135s][info][gc] Failed to allocate Shared, 128K > [0.137s][info][gc] Trigger: Handle Allocation Failure > [0.148s][info][gc] GC(2) Degenerated GC upgrading to Full GC > [0.167s][info][gc] GC(2) Pause Degenerated GC (Outside of Cycle) 5M->1M(10M) 30.323ms > > > At 0.135s, a concurrent cycle was triggered: `[0.135s][info][gc] Trigger: Learning 2 of 5.`, meanwhile there was allocation failure causing degen: `[0.135s][info][gc] Failed to allocate Shared, 128K`. > In the implementation of ShenandoahControlThread::service_concurrent_normal_cycle, it checks if there degen or cancellation before starting the concurrent cycle and return w/o any gc log, which causes the missing GCID 1 in the GC log. > > Since technically it is not a bug, to fix the potential failure of test `gc/logging/TestGCId` I'll add one line code to print GC log like "[0.135s][info][gc] GC(1) Cancelled" > > ### Test > - [x] gc/logging/TestGCId > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: 8a39f07d Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/8a39f07d07f8c4e30dc29b14f28e33c9d8e2e65f Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8354431: gc/logging/TestGCId fails on Shenandoah Reviewed-by: wkemper, phh ------------- PR: https://git.openjdk.org/jdk/pull/24856 From shade at openjdk.org Fri Apr 25 09:47:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Apr 2025 09:47:01 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 00:16:25 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow UMH::_method access from VMStructs > > src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 93: > >> 91: >> 92: inline Method* UnloadableMethodHandle::method() const { >> 93: assert(!is_unloaded(), "Should not be unloaded"); > > Assert that `block_unloading()` was called before? Cannot do, since lifecycle allows accessing `method()` shortly after initialization. See the new lifecycle docs. `CompilerBroker` does it now, checking that `block_unloading()` was called here would fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2059916538 From shade at openjdk.org Fri Apr 25 09:49:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Apr 2025 09:49:53 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: <60zLnWRtRgOKEPcmmdcAnh_QCqf-kEajreRzMMMwee4=.cd312ca6-3aaa-44cf-b731-6adcbeca5833@github.com> On Thu, 24 Apr 2025 00:22:47 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow UMH::_method access from VMStructs > > src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 37: > >> 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { >> 36: _method = method; >> 37: if (method != nullptr) { > > Is it possible to require `method` (and hence `_method`) to always be non-null? Yes, we can. This is a remnant of the implementation that accepted `_hot_method == nullptr`. Not needed now, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2059922071 From shade at openjdk.org Fri Apr 25 09:56:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Apr 2025 09:56:47 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v7] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Do not accept nullptr methods - Attempt at phasing doc - Merge branch 'master' into JDK-8231269-compile-task-weaks - Inline guard - Merge branch 'master' into JDK-8231269-compile-task-weaks - Allow UMH::_method access from VMStructs - Fix VMStructs - Purge extra fluff - Touchups - Renames - ... and 5 more: https://git.openjdk.org/jdk/compare/b41e0b17...07a3cae4 ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=06 Stats: 292 lines in 11 files changed: 243 ins; 25 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Fri Apr 25 09:56:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Apr 2025 09:56:48 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: Message-ID: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> On Thu, 24 Apr 2025 00:36:18 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow UMH::_method access from VMStructs > > src/hotspot/share/runtime/unloadableMethodHandle.hpp line 43: > >> 41: // 3. Final released state. Relevant Method* is in unknown state, and cannot be >> 42: // accessed. >> 43: // > > Please, elaborate what state transitions are supported. Currently, my understanding is there are 3 transitions and 4 states: > * 1 -> 2 > * 2 -> 3 (terminal) > * 1 -> 3 (terminal) > * 0 (empty, terminal) I added class-level docs for this handle, see if it reads well? > src/hotspot/share/runtime/unloadableMethodHandle.inline.hpp line 57: > >> 55: >> 56: // Null holder, the relevant class would not be unloaded. >> 57: return nullptr; > > Is this the case of bootstrap classloader? As an optimization opportunity, it can be extended for other system loaders. Yes, this is about null (bootstrap) classloader; the system returns `nullptr` in this case. I don't think `UMH` gets to decide whether `!nullptr` holder is always alive or not, and it is safer to hold on to it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2059930415 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2059929919 From cnorrbin at openjdk.org Fri Apr 25 10:45:10 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Fri, 25 Apr 2025 10:45:10 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler Message-ID: Hi everyone, This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. ------------- Commit messages: - removed the PerfDataSamplingInterval flag - calculate timestamp in jstat instead of sampling - StatSampler + sampling code removed Changes: https://git.openjdk.org/jdk/pull/24872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24872&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8241678 Stats: 864 lines in 23 files changed: 158 ins; 652 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/24872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24872/head:pull/24872 PR: https://git.openjdk.org/jdk/pull/24872 From kdnilsen at openjdk.org Fri Apr 25 16:55:54 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 25 Apr 2025 16:55:54 GMT Subject: RFR: 8355340: GenShen: Remove unneeded log messages related to remembered set write table In-Reply-To: References: Message-ID: <7p8Fu1-NV6DaANACeWuUkSA3efOPCihlHenFYGlGRbo=.0058e11f-bca1-4d71-9a55-c9a460a68f78@github.com> On Wed, 23 Apr 2025 00:54:45 GMT, Kelvin Nilsen wrote: > Remove unneeded log messages related to processing of the remembered set write card table. Wonder if I should have just changes these to log_debug? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24809#issuecomment-2830929313 From kdnilsen at openjdk.org Fri Apr 25 17:22:22 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 25 Apr 2025 17:22:22 GMT Subject: RFR: 8355336: GenShen: Resume Old GC even with back-to-back Young GC triggers [v2] In-Reply-To: References: Message-ID: > Allow old-gen concurrent marking cycles to get their full time slice even when young-gc is triggered back-to-back. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24810/files - new: https://git.openjdk.org/jdk/pull/24810/files/322e2eca..eaeb4efc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24810&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24810&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24810.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24810/head:pull/24810 PR: https://git.openjdk.org/jdk/pull/24810 From kdnilsen at openjdk.org Fri Apr 25 17:22:22 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 25 Apr 2025 17:22:22 GMT Subject: RFR: 8355336: GenShen: Resume Old GC even with back-to-back Young GC triggers [v2] In-Reply-To: References: Message-ID: <53UTjjKs1cwqSvkgKOUakj8Rvubnv8_FsBsTXqf_ym0=.06a7ca11-f267-47fe-b2ab-90039addc3f4@github.com> On Wed, 23 Apr 2025 16:45:24 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > src/hotspot/share/gc/shenandoah/shenandoahRegulatorThread.cpp line 80: > >> 78: _old_heuristics->cancel_trigger_request(); >> 79: } else if (start_young_cycle()) { >> 80: log_debug(gc)("Heuristics request for young collection accepted"); > > Indentation looks a little off here. Thanks. I've fixed this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24810#discussion_r2060618736 From ysr at openjdk.org Fri Apr 25 18:45:50 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 25 Apr 2025 18:45:50 GMT Subject: RFR: 8355340: GenShen: Remove unneeded log messages related to remembered set write table In-Reply-To: References: Message-ID: <1mUVuO7AJv3bl-GAOaiElHe86ioMwYFS9IP6CTCUn6c=.a3b3f2c4-e57b-46aa-8d67-4372eed5eabb@github.com> On Wed, 23 Apr 2025 00:54:45 GMT, Kelvin Nilsen wrote: > Remove unneeded log messages related to processing of the remembered set write card table. OK to remove. Alternatively, could make them debug or trace level. (Or whatever is the most verbose level: i always confuse myself between trace and debug.) Just looked it up: trace is finest; you could may be make it `log_trace(...)` in each case, if you want. Whatever makes sense. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24809#pullrequestreview-2794985272 From wkemper at openjdk.org Fri Apr 25 19:19:51 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 25 Apr 2025 19:19:51 GMT Subject: RFR: 8355340: GenShen: Remove unneeded log messages related to remembered set write table In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 00:54:45 GMT, Kelvin Nilsen wrote: > Remove unneeded log messages related to processing of the remembered set write card table. Do you want to keep these messages, but change `log_info` to `log_develop_debug`? `log_develop_debug` would remove them entirely from `product` builds. ------------- PR Review: https://git.openjdk.org/jdk/pull/24809#pullrequestreview-2795059743 From wkemper at openjdk.org Fri Apr 25 20:16:51 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 25 Apr 2025 20:16:51 GMT Subject: RFR: 8355336: GenShen: Resume Old GC even with back-to-back Young GC triggers [v2] In-Reply-To: References: Message-ID: <1nSaDMoIA3N7ijkIaglsp_7sNein5jKh3BMNUEwYM9g=.7e515645-99cc-456f-a6e1-1d9967de579e@github.com> On Fri, 25 Apr 2025 17:22:22 GMT, Kelvin Nilsen wrote: >> Allow old-gen concurrent marking cycles to get their full time slice even when young-gc is triggered back-to-back. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24810#pullrequestreview-2795190699 From wkemper at openjdk.org Fri Apr 25 20:45:22 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 25 Apr 2025 20:45:22 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled Message-ID: Add a test case for `-XX:+UseCompactObjectHeaders`, increase pressure on old generation. I ran the test (which includes a compact object headers case now) fifty times without failure. ------------- Commit messages: - Add test case for compact object headers, increase pressure on old generation Changes: https://git.openjdk.org/jdk/pull/24888/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24888&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355372 Stats: 20 lines in 1 file changed: 12 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24888.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24888/head:pull/24888 PR: https://git.openjdk.org/jdk/pull/24888 From vlivanov at openjdk.org Fri Apr 25 21:20:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Apr 2025 21:20:50 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Fri, 25 Apr 2025 09:52:43 GMT, Aleksey Shipilev wrote: > I don't think UMH gets to decide whether !nullptr holder is always alive or not, and it is safer to hold on to it. I looked around and stumbled upon the following code in `ClassLoaderData` [1]. I haven't checked myself, but it looks like a hidden class injected into bootstrap loader has `klass_holder == nullptr` while still is amenable to GC... IMO a check for `method->method_holder()->class_loader_data()->is_permanent_class_loader_data()` would do a better job serving the immediate needs and communicating the intentions. [1] bool ClassLoaderData::is_permanent_class_loader_data() const { return is_builtin_class_loader_data() && !has_class_mirror_holder(); } // Returns true if the class loader for this class loader data is one of // the 3 builtin (boot application/system or platform) class loaders, // including a user-defined system class loader. Note that if the class // loader data is for a non-strong hidden class then it may // get freed by a GC even if its class loader is one of these loaders. bool ClassLoaderData::is_builtin_class_loader_data() const { ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2060892632 From vlivanov at openjdk.org Fri Apr 25 21:27:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Apr 2025 21:27:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Fri, 25 Apr 2025 09:53:03 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/runtime/unloadableMethodHandle.hpp line 43: >> >>> 41: // 3. Final released state. Relevant Method* is in unknown state, and cannot be >>> 42: // accessed. >>> 43: // >> >> Please, elaborate what state transitions are supported. Currently, my understanding is there are 3 transitions and 4 states: >> * 1 -> 2 >> * 2 -> 3 (terminal) >> * 1 -> 3 (terminal) >> * 0 (empty, terminal) > > I added class-level docs for this handle, see if it reads well? Looks good. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2060899202 From bchristi at openjdk.org Fri Apr 25 23:42:23 2025 From: bchristi at openjdk.org (Brent Christian) Date: Fri, 25 Apr 2025 23:42:23 GMT Subject: RFR: 8355632: WhiteBox.waitForReferenceProcessing() fails assert for return type Message-ID: The newly-added `WhiteBox.waitForReferenceProcessing()` (see [8305186](https://bugs.openjdk.org/browse/JDK-8305186)) always fails with assertions enabled. I've updated the assertion, and also added the test I used locally to test the new method (just not with assertions enabled, apparently.) ------------- Commit messages: - fix assert, add test case Changes: https://git.openjdk.org/jdk/pull/24892/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24892&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355632 Stats: 31 lines in 2 files changed: 30 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24892/head:pull/24892 PR: https://git.openjdk.org/jdk/pull/24892 From ysr at openjdk.org Fri Apr 25 23:50:51 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 25 Apr 2025 23:50:51 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 20:40:09 GMT, William Kemper wrote: > Add a test case for `-XX:+UseCompactObjectHeaders`, increase pressure on old generation. I ran the test (which includes a compact object headers case now) fifty times without failure. Looks fine but left some thoughts, although you probably don't want to change anything if this setting is working to induce the behavior you want to test. A whitebox test may be better in the future rather than having to tweak test settings to induce the right behavior to trigger old growth. test/hotspot/jtreg/gc/shenandoah/generational/TestOldGrowthTriggers.java line 60: > 58: int replaceIndex = r.nextInt(ArraySize); > 59: int deriveIndex = r.nextInt(ArraySize); > 60: switch (i & 0x3) { ...otherwise could you just do (i & 0x1) here and change cases 2 & 3 to cases 0 & 1? test/hotspot/jtreg/gc/shenandoah/generational/TestOldGrowthTriggers.java line 68: > 66: // 50% chance of creating garbage > 67: array[replaceIndex] = array[replaceIndex].min(array[deriveIndex]); > 68: break; Could you have left this in place & reduced the heap size of the UseCompactHeaders test to half the current setting to induce the trigger? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24888#pullrequestreview-2795532213 PR Review Comment: https://git.openjdk.org/jdk/pull/24888#discussion_r2061035944 PR Review Comment: https://git.openjdk.org/jdk/pull/24888#discussion_r2061034174 From smarks at openjdk.org Sat Apr 26 00:12:44 2025 From: smarks at openjdk.org (Stuart Marks) Date: Sat, 26 Apr 2025 00:12:44 GMT Subject: RFR: 8355632: WhiteBox.waitForReferenceProcessing() fails assert for return type In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 23:37:13 GMT, Brent Christian wrote: > The newly-added `WhiteBox.waitForReferenceProcessing()` (see [8305186](https://bugs.openjdk.org/browse/JDK-8305186)) always fails with assertions enabled. > I've updated the assertion, and also added the test I used locally to test the new method (just not with assertions enabled, apparently.) test/lib/jdk/test/whitebox/WhiteBox.java line 572: > 570: wfrp = Reference.class.getDeclaredMethod("waitForReferenceProcessing"); > 571: wfrp.setAccessible(true); > 572: assert wfrp.getReturnType().equals(Class.forPrimitiveName("boolean")); Does `boolean.class` work? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24892#discussion_r2061055041 From kbarrett at openjdk.org Sat Apr 26 09:24:46 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 26 Apr 2025 09:24:46 GMT Subject: RFR: 8355632: WhiteBox.waitForReferenceProcessing() fails assert for return type In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 23:37:13 GMT, Brent Christian wrote: > The newly-added `WhiteBox.waitForReferenceProcessing()` (see [8305186](https://bugs.openjdk.org/browse/JDK-8305186)) always fails with assertions enabled. > I've updated the assertion, and also added the test I used locally to test the new method (just not with assertions enabled, apparently.) Changes requested by kbarrett (Reviewer). test/hotspot/jtreg/vmTestbase/gc/gctests/ReferencesGC/WaitForRefProTest.java line 1: > 1: /* Wrong place for this test. vmTestbase is old tests, converted from an old testing infrastructure. See the readme here: https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/vmTestbase In particular, at the bottom of that page: "New tests must *not* be added into this directory." This new test belongs in `test/lib-test/jdk/test/whitebox/`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24892#pullrequestreview-2795859193 PR Review Comment: https://git.openjdk.org/jdk/pull/24892#discussion_r2061239344 From kbarrett at openjdk.org Sat Apr 26 09:24:46 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 26 Apr 2025 09:24:46 GMT Subject: RFR: 8355632: WhiteBox.waitForReferenceProcessing() fails assert for return type In-Reply-To: References: Message-ID: <7uq7PHOFSjfI7edckxGfoYsHjJJ9ba-VkV1t7nCIriE=.7a92c288-b3b6-482d-97c4-79552cd43c95@github.com> On Sat, 26 Apr 2025 09:18:35 GMT, Kim Barrett wrote: >> The newly-added `WhiteBox.waitForReferenceProcessing()` (see [8305186](https://bugs.openjdk.org/browse/JDK-8305186)) always fails with assertions enabled. >> I've updated the assertion, and also added the test I used locally to test the new method (just not with assertions enabled, apparently.) > > test/hotspot/jtreg/vmTestbase/gc/gctests/ReferencesGC/WaitForRefProTest.java line 1: > >> 1: /* > > Wrong place for this test. vmTestbase is old tests, converted from an old testing infrastructure. > See the readme here: https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/vmTestbase > In particular, at the bottom of that page: "New tests must *not* be added into this directory." > > This new test belongs in `test/lib-test/jdk/test/whitebox/`. Also, s/Pro/Proc/ in the name of the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24892#discussion_r2061240134 From ayang at openjdk.org Sat Apr 26 11:33:51 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sat, 26 Apr 2025 11:33:51 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler In-Reply-To: References: Message-ID: <30koYWQ8Z8s6wId_9EmUxpAmyeBVeOYEP-u-nKzm_OQ=.10a36bef-4f65-4b53-a526-a0dc60b18de9@github.com> On Fri, 25 Apr 2025 10:38:39 GMT, Casper Norrbin wrote: > Hi everyone, > > This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. > > For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. > > The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. There are still a few matches for "hrt.ticks"; don't know if they should be removed (in this PR or a followup). src/hotspot/share/runtime/arguments.cpp line 538: > 536: { "ZGenerational", JDK_Version::jdk(23), JDK_Version::jdk(24), JDK_Version::undefined() }, > 537: { "ZMarkStackSpaceLimit", JDK_Version::undefined(), JDK_Version::jdk(25), JDK_Version::undefined() }, > 538: { "PerfDataSamplingInterval", JDK_Version::undefined(), JDK_Version::jdk(25), JDK_Version::undefined() }, Since the CSR says it will be removed in 26, the final arg should reflect that, IMO. src/hotspot/share/runtime/perfData.cpp line 419: > 417: */ > 418: void PerfDataManager::assert_system_property(const char* name, const char* value, TRAPS) { > 419: #ifdef ASSERT The indentation seems odd. (I recall `#ifdef` itself should not indented.) src/hotspot/share/runtime/perfData.cpp line 455: > 453: assert(value != nullptr, "property name should be have a value: %s", name); > 454: assert_system_property(name, value, CHECK); > 455: if (value != nullptr) { Why checking null again? Didn't we just asserted that 2 lines above? src/hotspot/share/runtime/threads.cpp line 852: > 850: #endif // INCLUDE_MANAGEMENT > 851: > 852: PerfDataManager::create_misc_perfdata(); Should this be guarded by `UsePerfData`? ------------- PR Review: https://git.openjdk.org/jdk/pull/24872#pullrequestreview-2795935371 PR Review Comment: https://git.openjdk.org/jdk/pull/24872#discussion_r2061262368 PR Review Comment: https://git.openjdk.org/jdk/pull/24872#discussion_r2061263070 PR Review Comment: https://git.openjdk.org/jdk/pull/24872#discussion_r2061263372 PR Review Comment: https://git.openjdk.org/jdk/pull/24872#discussion_r2061264533 From kdnilsen at openjdk.org Sun Apr 27 20:12:54 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 27 Apr 2025 20:12:54 GMT Subject: Integrated: 8355336: GenShen: Resume Old GC even with back-to-back Young GC triggers In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 00:57:53 GMT, Kelvin Nilsen wrote: > Allow old-gen concurrent marking cycles to get their full time slice even when young-gc is triggered back-to-back. This pull request has now been integrated. Changeset: cd6f0d19 Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/cd6f0d19d5da03eafde68142528c0f85d783cbea Stats: 26 lines in 3 files changed: 8 ins; 14 del; 4 mod 8355336: GenShen: Resume Old GC even with back-to-back Young GC triggers Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/24810 From mbaesken at openjdk.org Mon Apr 28 07:06:56 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 28 Apr 2025 07:06:56 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 20:40:09 GMT, William Kemper wrote: > Add a test case for `-XX:+UseCompactObjectHeaders`, increase pressure on old generation. I ran the test (which includes a compact object headers case now) fifty times without failure. Hi @earthling-amzn I added your PR to our build/test queue, let's see if the issues we observed go away on all platforms! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24888#issuecomment-2834179698 From iwalulya at openjdk.org Mon Apr 28 11:37:20 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 28 Apr 2025 11:37:20 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation Message-ID: Hi, Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. Testing: Tier 1-3 ------------- Commit messages: - init Changes: https://git.openjdk.org/jdk/pull/24915/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24915&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355681 Stats: 13 lines in 2 files changed: 5 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24915.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24915/head:pull/24915 PR: https://git.openjdk.org/jdk/pull/24915 From kdnilsen at openjdk.org Mon Apr 28 13:28:47 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 28 Apr 2025 13:28:47 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 20:40:09 GMT, William Kemper wrote: > Add a test case for `-XX:+UseCompactObjectHeaders`, increase pressure on old generation. I ran the test (which includes a compact object headers case now) fifty times without failure. Changes look good to me. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/24888#pullrequestreview-2799301235 From shade at openjdk.org Mon Apr 28 16:07:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 16:07:48 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Fri, 25 Apr 2025 21:17:08 GMT, Vladimir Ivanov wrote: >> Yes, this is about null (bootstrap) classloader; the system returns `nullptr` in this case. I don't think `UMH` gets to decide whether `!nullptr` holder is always alive or not, and it is safer to hold on to it. > >> I don't think UMH gets to decide whether !nullptr holder is always alive or not, and it is safer to hold on to it. > > I looked around and stumbled upon the following code in `ClassLoaderData` [1]. I haven't checked myself, but it looks like a hidden class injected into bootstrap loader has `klass_holder == nullptr` while still is amenable to GC... > > IMO a check for `ik->class_loader_data()->is_permanent_class_loader_data()` would do a better job serving the immediate needs and communicating the intentions. > > [1] > > bool ClassLoaderData::is_permanent_class_loader_data() const { > return is_builtin_class_loader_data() && !has_class_mirror_holder(); > } > > // Returns true if the class loader for this class loader data is one of > // the 3 builtin (boot application/system or platform) class loaders, > // including a user-defined system class loader. Note that if the class > // loader data is for a non-strong hidden class then it may > // get freed by a GC even if its class loader is one of these loaders. > bool ClassLoaderData::is_builtin_class_loader_data() const { > ... I remember looking at this, and convinced myself that non-strong hidden classes report their related Java mirror as `klass_holder`, and that is enough to maintain them as alive. See calls to `ClassLoaderData::initialize_holder`: https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/classfile/classLoaderData.cpp#L159-L162 https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/classfile/systemDictionary.cpp#L810-L814 This is what the comment in `UnloadableMethodHandle::get_unload_blocker` refers to. So I believe current code is correct. I agree `is_permanent_class_loader_data()` captures the intent better. Let me see if it fits well here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2064005189 From shade at openjdk.org Mon Apr 28 16:14:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 16:14:02 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v8] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' into JDK-8231269-compile-task-weaks - Do not accept nullptr methods - Attempt at phasing doc - Merge branch 'master' into JDK-8231269-compile-task-weaks - Inline guard - Merge branch 'master' into JDK-8231269-compile-task-weaks - Allow UMH::_method access from VMStructs - Fix VMStructs - Purge extra fluff - Touchups - ... and 6 more: https://git.openjdk.org/jdk/compare/2447b981...be3a3d62 ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=07 Stats: 292 lines in 11 files changed: 243 ins; 25 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Mon Apr 28 16:20:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 16:20:34 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v9] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Simplify a bit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/be3a3d62..eaf3f14d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Mon Apr 28 16:20:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Apr 2025 16:20:34 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Mon, 28 Apr 2025 16:05:06 GMT, Aleksey Shipilev wrote: > I agree is_permanent_class_loader_data() captures the intent better. Let me see if it fits well here. Ah wait, it does not. We need to hold on to something that blocks the unloading. Just checking `is_permanent_class_loader_data()` does not get us there. We would need to ask for some holder for it. For the reasons above, `method->method_holder()->klass_holder()` works for non-strong hidden classes as well. This is also why current mainline code works -- it captures the same thing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2064032610 From vlivanov at openjdk.org Mon Apr 28 18:51:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 28 Apr 2025 18:51:48 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Mon, 28 Apr 2025 16:16:41 GMT, Aleksey Shipilev wrote: >> I remember looking at this, and convinced myself that non-strong hidden classes report their related Java mirror as `klass_holder`, and that is enough to maintain them as alive. See calls to `ClassLoaderData::initialize_holder`: >> >> https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/classfile/classLoaderData.cpp#L159-L162 >> >> https://github.com/openjdk/jdk/blob/b1e778d9d2ad13ee5f1ed629a8805008580f86c0/src/hotspot/share/classfile/systemDictionary.cpp#L810-L814 >> >> This is what the comment in `UnloadableMethodHandle::get_unload_blocker` refers to. So I believe current code is correct. >> >> I agree `is_permanent_class_loader_data()` captures the intent better. Let me see if it fits well here. > >> I agree is_permanent_class_loader_data() captures the intent better. Let me see if it fits well here. > > Ah wait, it does not. We need to hold on to something that blocks the unloading. Just checking `is_permanent_class_loader_data()` does not get us there. We would need to ask for some holder for it. For the reasons above, `method->method_holder()->klass_holder()` works for non-strong hidden classes as well. > > This is also why current mainline code works -- it captures the same thing. Ok, thanks for checking! Good to know there's no existing bug. What I had in mind is as follows: InstanceKlass* holder = method->method_holder(); if (holder->class_loader_data()->is_permanent_class_loader_data()) { return nullptr; // method holder class can't be unloaded } else { // Normal class, return the holder that would block unloading. // This would be either classloader oop for non-hidden classes, // or Java mirror oop for hidden classes. assert(holder->klass_holder() != nullptr, ""); return holder->klass_holder(); } IMO it makes the check more precise and, at the same time, communicates the intent better. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2064315717 From bchristi at openjdk.org Mon Apr 28 18:58:47 2025 From: bchristi at openjdk.org (Brent Christian) Date: Mon, 28 Apr 2025 18:58:47 GMT Subject: RFR: 8355632: WhiteBox.waitForReferenceProcessing() fails assert for return type In-Reply-To: References: Message-ID: On Sat, 26 Apr 2025 00:10:04 GMT, Stuart Marks wrote: >> The newly-added `WhiteBox.waitForReferenceProcessing()` (see [8305186](https://bugs.openjdk.org/browse/JDK-8305186)) always fails with assertions enabled. >> I've updated the assertion, and also added the test I used locally to test the new method (just not with assertions enabled, apparently.) > > test/lib/jdk/test/whitebox/WhiteBox.java line 572: > >> 570: wfrp = Reference.class.getDeclaredMethod("waitForReferenceProcessing"); >> 571: wfrp.setAccessible(true); >> 572: assert wfrp.getReturnType().equals(Class.forPrimitiveName("boolean")); > > Does `boolean.class` work? Yes, thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24892#discussion_r2064328929 From bchristi at openjdk.org Mon Apr 28 19:21:05 2025 From: bchristi at openjdk.org (Brent Christian) Date: Mon, 28 Apr 2025 19:21:05 GMT Subject: RFR: 8355632: WhiteBox.waitForReferenceProcessing() fails assert for return type [v2] In-Reply-To: References: Message-ID: > The newly-added `WhiteBox.waitForReferenceProcessing()` (see [8305186](https://bugs.openjdk.org/browse/JDK-8305186)) always fails with assertions enabled. > I've updated the assertion, and also added the test I used locally to test the new method (just not with assertions enabled, apparently.) Brent Christian has updated the pull request incrementally with four additional commits since the last revision: - move test - rename test class - enable assertions on test itself - use boolean.class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24892/files - new: https://git.openjdk.org/jdk/pull/24892/files/0931c0b3..b8dce1ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24892&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24892&range=00-01 Stats: 61 lines in 3 files changed: 30 ins; 30 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24892/head:pull/24892 PR: https://git.openjdk.org/jdk/pull/24892 From bchristi at openjdk.org Mon Apr 28 19:21:06 2025 From: bchristi at openjdk.org (Brent Christian) Date: Mon, 28 Apr 2025 19:21:06 GMT Subject: RFR: 8355632: WhiteBox.waitForReferenceProcessing() fails assert for return type [v2] In-Reply-To: <7uq7PHOFSjfI7edckxGfoYsHjJJ9ba-VkV1t7nCIriE=.7a92c288-b3b6-482d-97c4-79552cd43c95@github.com> References: <7uq7PHOFSjfI7edckxGfoYsHjJJ9ba-VkV1t7nCIriE=.7a92c288-b3b6-482d-97c4-79552cd43c95@github.com> Message-ID: On Sat, 26 Apr 2025 09:22:30 GMT, Kim Barrett wrote: >> test/hotspot/jtreg/vmTestbase/gc/gctests/ReferencesGC/WaitForRefProTest.java line 1: >> >>> 1: /* >> >> Wrong place for this test. vmTestbase is old tests, converted from an old testing infrastructure. >> See the readme here: https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/vmTestbase >> In particular, at the bottom of that page: "New tests must *not* be added into this directory." >> >> This new test belongs in `test/lib-test/jdk/test/whitebox/`. > > Also, s/Pro/Proc/ in the name of the test. Sounds good, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24892#discussion_r2064366407 From duke at openjdk.org Mon Apr 28 19:50:26 2025 From: duke at openjdk.org (Alexandre Jacob) Date: Mon, 28 Apr 2025 19:50:26 GMT Subject: RFR: 8350621: Code cache stops scheduling GC Message-ID: The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`. This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling. Unfortunately this can't work properly under certain circumstances. For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)). I have observed this behavior on JVM in version 21 that were migrated recently from java 17. Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen. I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. In order to reproduce this issue, I found a very simple and convenient way: public class CodeCacheMain { public static void main(String[] args) throws InterruptedException { while (true) { Thread.sleep(100); } } } Run this simple app with the following JVM flags: -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15 - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC - low `ReservedCodeCacheSize` to put pressure on code cache quickly - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will: - allows us to monitor code cache - indirectly generate activity on the code cache, just what we need to reproduce the bug Some logs related to code cache will show up at some point with GC activity: [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory And then it will stop and we'll end up with the following message: [672.714s][info][codecache ] Code cache is full - disabling compilation Leaving the JVM in an unstable situation. I considered a few different options before making this change: 1) Always call `Universe::heap()->collect(...)` without making any check (the GC impl should handle the situation) 2) Fix all GCs implementation to ensure `_unloading_threshold_gc_requested` gets back to `false` at some point (probably what is supposed to happen today) 3) Change `CollectedHeap::collect` to return a `bool` instead of `void` to indicate if GC was run or scheduled But I discarded them: 1) Dumb option that I used to check that the bug would be corrected, but will probably put a bit of pressure on resources when allocation need to be performed at code cache level (as it will be called at each allocation attempt). In addition, the log indicating that we trigger GC is spammed, not easy to decide how to handle the log correctly. 2) This option is possible and was my favorite up to some point. GC's implementation can have quite a lot of branches and it can be difficult to ensure we don't forget a case when to reset the flag. This could eventually be a solution to be explored in addition to the solution I propose in the PR. We could introduce a static method in `CodeCache` that would let a GC implementation to just reset the flag in a case the GC will not actually run for example (to be discussed) 3) I explored this solution, but it adds quite a lot of changes and is risky in the long term (in my opinion). G1GC already has a [G1CollectedHeap::try_collect](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1870) method that returns a `bool`, but this bool is `true` even when the GC is not run. As a result, I decided to simply add a way for `CodeCache` to recover from this situation. The idea is to let the GC code as-is but keep in memory the time of the last GC request and reset the flag to `false` if it was not reset in a certain amount of time (250ms in my PR). This should only be helpful in corner cases where the GC impl has not reset the flag by itself. Among the advantages of this solution: it gives a security to recover from a situation that may be created by changes in GC implementation, because someone forgot to take care about code cache. I took a lot of time investigating this issue and exploring solutions, and am willing to take any input on it as it is my first PR on the project. ------------- Commit messages: - _unloading_gc_requested should remain volatile - remove early returns from gc_on_allocation - fix race condition in try_to_gc - log before GC - fix log message - XXXXXXX: Fix code cache GC Changes: https://git.openjdk.org/jdk/pull/23656/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23656&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350621 Stats: 77 lines in 2 files changed: 45 ins; 10 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/23656.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23656/head:pull/23656 PR: https://git.openjdk.org/jdk/pull/23656 From duke at openjdk.org Mon Apr 28 19:50:26 2025 From: duke at openjdk.org (Alexandre Jacob) Date: Mon, 28 Apr 2025 19:50:26 GMT Subject: RFR: 8350621: Code cache stops scheduling GC In-Reply-To: References: Message-ID: On Sun, 16 Feb 2025 18:39:29 GMT, Alexandre Jacob wrote: > The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`. > > This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling. > > Unfortunately this can't work properly under certain circumstances. > For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)). > > I have observed this behavior on JVM in version 21 that were migrated recently from java 17. > Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen. > > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. > > In order to reproduce this issue, I found a very simple and convenient way: > > > public class CodeCacheMain { > public static void main(String[] args) throws InterruptedException { > while (true) { > Thread.sleep(100); > } > } > } > > > Run this simple app with the following JVM flags: > > > -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15 > > > - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC > - low `ReservedCodeCacheSize` to put pressure on code cache quickly > - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction > > Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will: > - allows us to monitor code cache > - indirectly generate activity on the code cache, just what we need to reproduce the bug > > Some logs related to code cache will show up at some point with GC activity: > > > [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory > > > And then it will stop and we'll end up with the following message: > > > [672.714s][info][codecache ] Code cache is full - disabling compilation > > > L... Here is a log sample that shows how it behaves when the bug occurs. Logs starting with `>>>` are some logs I added when working on steps to reproduce. The bug occurs at ~648.762s, G1GC has reset the flag to `false` but is still running, CodeCache has called `Universe::heap()->collect(...)`, which was discarded because of the current GC routine. Note that during this test `-XX:StartAggressiveSweepingAt` was set to `25` instead of `15` but I confirm I can reproduce with `15` as well (as explained in the description of the PR) [648.733s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 0) [648.733s][info][codecache ] Triggering aggressive GC due to having only 24.970% free memory [648.733s][info][gc,start ] GC(6210) Pause Young (CodeCache GC Aggressive) [648.733s][info][gc,heap ] GC(6210) PSYoungGen: 2851K(132096K)->224K(132096K) Eden: 2851K(131584K)->0K(131584K) From: 0K(512K)->224K(512K) [648.733s][info][gc,heap ] GC(6210) ParOldGen: 11691K(349696K)->11691K(349696K) [648.733s][info][gc,metaspace ] GC(6210) Metaspace: 7690K(8192K)->7690K(8192K) NonClass: 6904K(7168K)->6904K(7168K) Class: 786K(1024K)->786K(1024K) [648.733s][info][gc ] GC(6210) Pause Young (CodeCache GC Aggressive) 14M->11M(470M) 0.238ms [648.733s][info][gc,cpu ] GC(6210) User=0.00s Sys=0.00s Real=0.00s [648.733s][info][gc,start ] GC(6211) Pause Full (CodeCache GC Aggressive) [648.733s][info][gc,phases,start] GC(6211) Marking Phase [648.742s][info][gc,phases ] GC(6211) Marking Phase 8.585ms [648.742s][info][gc,phases,start] GC(6211) Summary Phase [648.742s][info][gc,phases ] GC(6211) Summary Phase 0.009ms [648.742s][info][gc,phases,start] GC(6211) Adjust Roots [648.742s][info][gc,phases ] GC(6211) Adjust Roots 0.311ms [648.742s][info][gc,phases,start] GC(6211) Compaction Phase [648.747s][info][gc,phases ] GC(6211) Compaction Phase 4.701ms [648.747s][info][gc,phases,start] GC(6211) Post Compact [648.747s][info][codecache ] >>> CodeCache::update_cold_gc_count _unloading_threshold_gc_requested = false [648.747s][info][codecache ] Code cache critically low; use aggressive aging [648.747s][info][gc,phases ] GC(6211) Post Compact 0.106ms [648.747s][info][gc,heap ] GC(6211) PSYoungGen: 224K(132096K)->0K(132096K) Eden: 0K(131584K)->0K(131584K) From: 224K(512K)->0K(512K) [648.747s][info][gc,heap ] GC(6211) ParOldGen: 11691K(349696K)->11688K(349696K) [648.747s][info][gc,metaspace ] GC(6211) Metaspace: 7690K(8192K)->7690K(8192K) NonClass: 6904K(7168K)->6904K(7168K) Class: 786K(1024K)->786K(1024K) [648.747s][info][gc ] GC(6211) Pause Full (CodeCache GC Aggressive) 11M->11M(470M) 13.799ms [648.747s][info][gc,cpu ] GC(6211) User=0.11s Sys=0.00s Real=0.01s [648.747s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 0) [648.747s][info][codecache ] Triggering aggressive GC due to having only 24.865% free memory [648.747s][info][gc,start ] GC(6212) Pause Young (CodeCache GC Aggressive) [648.748s][info][gc,heap ] GC(6212) PSYoungGen: 2851K(132096K)->224K(132096K) Eden: 2851K(131584K)->0K(131584K) From: 0K(512K)->224K(512K) [648.748s][info][gc,heap ] GC(6212) ParOldGen: 11688K(349696K)->11688K(349696K) [648.748s][info][gc,metaspace ] GC(6212) Metaspace: 7690K(8192K)->7690K(8192K) NonClass: 6904K(7168K)->6904K(7168K) Class: 786K(1024K)->786K(1024K) [648.748s][info][gc ] GC(6212) Pause Young (CodeCache GC Aggressive) 14M->11M(470M) 0.257ms [648.748s][info][gc,cpu ] GC(6212) User=0.00s Sys=0.00s Real=0.00s [648.748s][info][gc,start ] GC(6213) Pause Full (CodeCache GC Aggressive) [648.748s][info][gc,phases,start] GC(6213) Marking Phase [648.756s][info][gc,phases ] GC(6213) Marking Phase 8.512ms [648.756s][info][gc,phases,start] GC(6213) Summary Phase [648.756s][info][gc,phases ] GC(6213) Summary Phase 0.007ms [648.756s][info][gc,phases,start] GC(6213) Adjust Roots [648.757s][info][gc,phases ] GC(6213) Adjust Roots 0.331ms [648.757s][info][gc,phases,start] GC(6213) Compaction Phase [648.761s][info][gc,phases ] GC(6213) Compaction Phase 4.734ms [648.761s][info][gc,phases,start] GC(6213) Post Compact [648.761s][info][codecache ] >>> CodeCache::update_cold_gc_count _unloading_threshold_gc_requested = false [648.761s][info][codecache ] Code cache critically low; use aggressive aging [648.761s][info][gc,phases ] GC(6213) Post Compact 0.059ms [648.761s][info][gc,heap ] GC(6213) PSYoungGen: 224K(132096K)->0K(132096K) Eden: 0K(131584K)->0K(131584K) From: 224K(512K)->0K(512K) [648.761s][info][gc,heap ] GC(6213) ParOldGen: 11688K(349696K)->11689K(349696K) [648.761s][info][gc,metaspace ] GC(6213) Metaspace: 7690K(8192K)->7690K(8192K) NonClass: 6904K(7168K)->6904K(7168K) Class: 786K(1024K)->786K(1024K) [648.761s][info][gc ] GC(6213) Pause Full (CodeCache GC Aggressive) 11M->11M(470M) 13.725ms [648.761s][info][gc,cpu ] GC(6213) User=0.09s Sys=0.02s Real=0.01s [648.762s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 0) [648.762s][info][codecache ] Triggering aggressive GC due to having only 24.895% free memory [648.762s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 1) [648.762s][info][gc,start ] GC(6214) Pause Young (GCLocker Initiated GC) [648.762s][info][gc,heap ] GC(6214) PSYoungGen: 1973K(132096K)->224K(132096K) Eden: 1973K(131584K)->0K(131584K) From: 0K(512K)->224K(512K) [648.762s][info][gc,heap ] GC(6214) ParOldGen: 11689K(349696K)->11689K(349696K) [648.762s][info][gc,metaspace ] GC(6214) Metaspace: 7691K(8192K)->7691K(8192K) NonClass: 6905K(7168K)->6905K(7168K) Class: 786K(1024K)->786K(1024K) [648.762s][info][gc ] GC(6214) Pause Young (GCLocker Initiated GC) 13M->11M(470M) 0.310ms [648.762s][info][gc,cpu ] GC(6214) User=0.00s Sys=0.00s Real=0.00s [648.762s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 1) ** removed 278 occurrences of the same log ** [672.714s][info][codecache ] >>> should start GC (_unloading_threshold_gc_requested = 1) [672.714s][info][codecache ] Code cache is full - disabling compilation [672.714s][warning][codecache ] CodeCache is full. Compiler has been disabled. [672.714s][warning][codecache ] Try increasing the code cache size using -XX:ReservedCodeCacheSize= OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize= CodeCache: size=2496Kb used=2479Kb max_used=2479Kb free=16Kb bounds [0x00007923ed490000, 0x00007923ed700000, 0x00007923ed700000] total_blobs=1127 nmethods=640 adapters=399 compilation: disabled (not enough contiguous free space left) stopped_count=1, restarted_count=0 full_count=1 Both JVM were started for ~20 minutes ### jconsole (reproducting the bug) ![image](https://github.com/user-attachments/assets/61db5c41-5ce8-4aad-ba98-dbbef142f420) Started to misbehave at ~315.181s ### jconsole (with the fix from the PR) ![image](https://github.com/user-attachments/assets/f21dd6fc-7b7d-4b7c-a57a-86e60b2577ce) [13.078s][debug][codecache ] Previous GC request has not been reset after 13.018797s, force auto-reset [412.985s][debug][codecache ] Previous GC request has not been reset after 23.985252s, force auto-reset [464.974s][debug][codecache ] Previous GC request has not been reset after 7.970082s, force auto-reset [524.953s][debug][codecache ] Previous GC request has not been reset after 3.937477s, force auto-reset Converted to draft: I would like to change it to ensure we log before calling `Universe::heap()->collect(...)` (same way as before) Performing more tests on this (different configuration, different GC, ...), I noticed that I had a race condition when multiple threads enter the `try_to_gc` method I introduced. The race condition impact was : - an unwanted auto-reset of the flag - an invalid "duration since last GC request" log - an unneeded GC request Possible in the following conditions: - thread1: reads `_unloading_gc_requested_time` (with `elapsed_since_last_gc_request` > 250ms) - thread2: has `_unloading_gc_requested == false` ? it requests GC - thread1: has `_unloading_gc_requested == true` ? it resets `_unloading_gc_requested` + log + request GC ? In order to avoid that I propose to simply ensure we don't have multiple threads performing the checks in `gc_on_allocation` ------------- PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2661567656 PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2661578204 PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2662439874 PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2664920007 From wkemper at openjdk.org Mon Apr 28 21:20:47 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 28 Apr 2025 21:20:47 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 07:04:28 GMT, Matthias Baesken wrote: >> Add a test case for `-XX:+UseCompactObjectHeaders`, increase pressure on old generation. I ran the test (which includes a compact object headers case now) fifty times without failure. > > Hi @earthling-amzn I added your PR to our build/test queue, let's see if the issues we observed go away on all platforms! @MBaesken - thank you! Please let us know how it goes. @ysramakrishna - I wanted to increase pressure specifically on the old generation. I'm not sure reducing the heap size alone would do that. As you point out, the cases in the switch statement do look a bit silly now. I'll make the change you suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24888#issuecomment-2836708777 From wkemper at openjdk.org Mon Apr 28 22:26:00 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 28 Apr 2025 22:26:00 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled [v2] In-Reply-To: References: Message-ID: > Add a test case for `-XX:+UseCompactObjectHeaders`, increase pressure on old generation. I ran the test (which includes a compact object headers case now) fifty times without failure. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Clarify cases with comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24888/files - new: https://git.openjdk.org/jdk/pull/24888/files/9c991cfa..70b36ac5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24888&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24888&range=00-01 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24888.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24888/head:pull/24888 PR: https://git.openjdk.org/jdk/pull/24888 From wkemper at openjdk.org Tue Apr 29 00:02:20 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 29 Apr 2025 00:02:20 GMT Subject: RFR: 8355789: GenShen: assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle Message-ID: When old generation marking is cancelled to run a young collection. we still set a `_degen_point ` for reasons that became vestigial after [JDK-8349094](https://bugs.openjdk.org/browse/JDK-8349094). When old marking is cancelled, the `_degen_point` should only be set if the marking was cancelled because of an allocation failure (and it should still only be set to "outside of cycle"). The following sequence could lead to this assertion failure: 1. Control thread is marking old 2. Young GC preempts it 3. Control thread sets the degen point because the old GC was "cancelled" 4. The concurrent young GC fails and attempts to set a degenerated point 5. This trips the assert because we already (incorrectly) set the degen point in `3`. ------------- Commit messages: - Only set degeneration point for allocation failures during old marking Changes: https://git.openjdk.org/jdk/pull/24940/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24940&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355789 Stats: 11 lines in 1 file changed: 0 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24940/head:pull/24940 PR: https://git.openjdk.org/jdk/pull/24940 From tschatzl at openjdk.org Tue Apr 29 07:33:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 29 Apr 2025 07:33:46 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 10:57:48 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. > > Testing: Tier 1-3 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24915#pullrequestreview-2802356207 From tschatzl at openjdk.org Tue Apr 29 07:39:22 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 29 Apr 2025 07:39:22 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v36] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: - Merge branch 'master' into card-table-as-dcq-merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review (part 2 - yield duration changes) - * ayang review (part 1) - * indentation fix - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * fixes after merge related to 32 bit x86 removal - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review: revising young gen length * robcasloz review: various minor refactorings - ... and 42 more: https://git.openjdk.org/jdk/compare/44374a57...51dfbe54 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=35 Stats: 7102 lines in 110 files changed: 2583 ins; 3598 del; 921 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From mbaesken at openjdk.org Tue Apr 29 07:41:47 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 29 Apr 2025 07:41:47 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 21:18:31 GMT, William Kemper wrote: > thank you! Please let us know how it goes. Unfortunately, not so good . On darwin x86_64 fastdebug, gc/shenandoah/generational/TestOldGrowthTriggers.java triggers now this crash/assert # Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-dev-macos_x86_64-dbg/jdk/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp:607), pid=58469, tid=18179 # assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle Stack: [0x000070000fc5b000,0x000070000fd5b000], sp=0x000070000fd5aae0, free space=1022k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x1576749] VMError::report(outputStream*, bool)+0x1ef9 (shenandoahGenerationalControlThread.cpp:607) V [libjvm.dylib+0x157a65b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x60b V [libjvm.dylib+0x72bd08] report_vm_error(char const*, int, char const*, char const*, ...)+0xd8 V [libjvm.dylib+0x121e75a] ShenandoahGenerationalControlThread::check_cancellation_or_degen(ShenandoahGC::ShenandoahDegenPoint)+0x14a V [libjvm.dylib+0x121e455] ShenandoahGenerationalControlThread::service_concurrent_cycle(ShenandoahGeneration*, GCCause::Cause, bool)+0x165 V [libjvm.dylib+0x121cd72] ShenandoahGenerationalControlThread::run_gc_cycle(ShenandoahGenerationalControlThread::ShenandoahGCRequest const&)+0x1a2 V [libjvm.dylib+0x121c8b2] ShenandoahGenerationalControlThread::run_service()+0x142 V [libjvm.dylib+0x6a798b] ConcurrentGCThread::run()+0x1b V [libjvm.dylib+0x14bfa5c] Thread::call_run()+0xbc V [libjvm.dylib+0x1060ff7] thread_native_entry(Thread*)+0x137 C [libsystem_pthread.dylib+0x618b] _pthread_start+0x63 C [libsystem_pthread.dylib+0x1ae3] thread_start+0xf ------------- PR Comment: https://git.openjdk.org/jdk/pull/24888#issuecomment-2837812285 From tschatzl at openjdk.org Tue Apr 29 08:08:49 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 29 Apr 2025 08:08:49 GMT Subject: RFR: 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size [v4] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:31:48 GMT, Tongbao Zhang wrote: >> After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. >> So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. >> >> When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. >> >> Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. >> >> before this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = false {product lp64_product} {default} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) >> >> >> after this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = true {product lp64_product} {ergonomic} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: > > typo Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24541#pullrequestreview-2802438734 From tschatzl at openjdk.org Tue Apr 29 08:08:50 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 29 Apr 2025 08:08:50 GMT Subject: RFR: 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size [v2] In-Reply-To: References: Message-ID: <_9sAy3jLZp0HZu6OtXTxQvn5HrDIvEMmWpBsiCBMgq8=.c0e475c0-ef36-4ac3-992f-5367e06225d3@github.com> On Tue, 15 Apr 2025 02:49:52 GMT, Tongbao Zhang wrote: >> test/hotspot/jtreg/gc/arguments/TestG1CompressedOops.java line 35: >> >>> 33: * @modules java.management/sun.management >>> 34: * @library /test/lib >>> 35: * @library / >> >> Why this line is needed? I don't see any dependencies on "/" >> If you use some test code outside directory, better to build them. > >> Why this line is needed? I don't see any dependencies on "/" If you use some test code outside directory, better to build them. > > Yes, the GCArguments depends on the ```@library /``` , many tests in ``` test/hotspot/jtreg/gc/arguments``` use this Afaict `GCArguments` only depends on `test.lib.*` too. Other than that, the use of the `@library`directives is often just copy&paste without particular meaning. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24541#discussion_r2065747094 From iwalulya at openjdk.org Tue Apr 29 09:09:57 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 29 Apr 2025 09:09:57 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size Message-ID: Hi, Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. Testing: Tier 1-3 ------------- Commit messages: - nit - refactor full collection Changes: https://git.openjdk.org/jdk/pull/24944/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24944&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355756 Stats: 44 lines in 8 files changed: 14 ins; 0 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24944.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24944/head:pull/24944 PR: https://git.openjdk.org/jdk/pull/24944 From shade at openjdk.org Tue Apr 29 09:22:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 09:22:10 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Improve get_method_blocker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/eaf3f14d..9f44cb5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=08-09 Stats: 12 lines in 1 file changed: 4 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Tue Apr 29 09:22:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 09:22:10 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Mon, 28 Apr 2025 18:48:47 GMT, Vladimir Ivanov wrote: >>> I agree is_permanent_class_loader_data() captures the intent better. Let me see if it fits well here. >> >> Ah wait, it does not. We need to hold on to something that blocks the unloading. Just checking `is_permanent_class_loader_data()` does not get us there. We would need to ask for some holder for it. For the reasons above, `method->method_holder()->klass_holder()` works for non-strong hidden classes as well. >> >> This is also why current mainline code works -- it captures the same thing. > > Ok, thanks for checking! Good to know there's no existing bug. > > What I had in mind is as follows: > > InstanceKlass* holder = method->method_holder(); > if (holder->class_loader_data()->is_permanent_class_loader_data()) { > return nullptr; // method holder class can't be unloaded > } else { > // Normal class, return the holder that would block unloading. > // This would be either classloader oop for non-hidden classes, > // or Java mirror oop for hidden classes. > assert(holder->klass_holder() != nullptr, ""); > return holder->klass_holder(); > } > > > IMO it makes the check more precise and, at the same time, communicates the intent better. What do you think? Yes, OK, let's do a variant of that. Committed. I'll re-run test to see if there are any surprises about these asserts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2065897750 From cnorrbin at openjdk.org Tue Apr 29 09:48:05 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 29 Apr 2025 09:48:05 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v2] In-Reply-To: References: Message-ID: > Hi everyone, > > This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. > > For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. > > The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: feedback fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24872/files - new: https://git.openjdk.org/jdk/pull/24872/files/7f8141ba..ed3670eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24872&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24872&range=00-01 Stats: 28 lines in 3 files changed: 1 ins; 5 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24872/head:pull/24872 PR: https://git.openjdk.org/jdk/pull/24872 From cnorrbin at openjdk.org Tue Apr 29 09:53:48 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 29 Apr 2025 09:53:48 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v2] In-Reply-To: <30koYWQ8Z8s6wId_9EmUxpAmyeBVeOYEP-u-nKzm_OQ=.10a36bef-4f65-4b53-a526-a0dc60b18de9@github.com> References: <30koYWQ8Z8s6wId_9EmUxpAmyeBVeOYEP-u-nKzm_OQ=.10a36bef-4f65-4b53-a526-a0dc60b18de9@github.com> Message-ID: On Sat, 26 Apr 2025 11:31:39 GMT, Albert Mingkun Yang wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback fixes > > There are still a few matches for "hrt.ticks"; don't know if they should be removed (in this PR or a followup). Thank you for reviewing @albertnetymk! I'll look into the last traces of "hrt.ticks" to see if they can be removed here. > src/hotspot/share/runtime/perfData.cpp line 455: > >> 453: assert(value != nullptr, "property name should be have a value: %s", name); >> 454: assert_system_property(name, value, CHECK); >> 455: if (value != nullptr) { > > Why checking null again? Didn't we just asserted that 2 lines above? This was from the original moved function, but I agree its redundant. Removed it now. > src/hotspot/share/runtime/threads.cpp line 852: > >> 850: #endif // INCLUDE_MANAGEMENT >> 851: >> 852: PerfDataManager::create_misc_perfdata(); > > Should this be guarded by `UsePerfData`? It should, thank you for spotting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24872#issuecomment-2838158557 PR Review Comment: https://git.openjdk.org/jdk/pull/24872#discussion_r2065957329 PR Review Comment: https://git.openjdk.org/jdk/pull/24872#discussion_r2065955553 From stuefe at openjdk.org Tue Apr 29 09:53:48 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 29 Apr 2025 09:53:48 GMT Subject: RFR: 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size [v4] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:31:48 GMT, Tongbao Zhang wrote: >> After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. >> So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. >> >> When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. >> >> Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. >> >> before this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = false {product lp64_product} {default} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) >> >> >> after this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = true {product lp64_product} {ergonomic} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: > > typo This may be a stupid question, but why does the heap region size factor into this decision at all? I assume that both heap base and heap max size are aligned to heap region size? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24541#issuecomment-2838159188 From ayang at openjdk.org Tue Apr 29 10:00:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 29 Apr 2025 10:00:52 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v2] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:48:05 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. >> >> For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. >> >> The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > feedback fixes Reviewed only `hotspot` changes. Not familiar with other parts. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24872#pullrequestreview-2802821239 From tschatzl at openjdk.org Tue Apr 29 10:38:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 29 Apr 2025 10:38:46 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:02:43 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. > > Testing: Tier 1-3 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 860: > 858: do_full_collection(clear_all_soft_refs, > 859: false /* do_maximal_compaction */, > 860: size_t(0) /* allocation_word_size*/); Suggestion: size_t(0) /* allocation_word_size */); ------------- PR Review: https://git.openjdk.org/jdk/pull/24944#pullrequestreview-2802899902 PR Review Comment: https://git.openjdk.org/jdk/pull/24944#discussion_r2066019574 From tschatzl at openjdk.org Tue Apr 29 10:45:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 29 Apr 2025 10:45:55 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:02:43 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. > > Testing: Tier 1-3 src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 239: > 237: // unnecessary shrinking that would be followed by an expand call to satisfy the > 238: // allocation. > 239: size_t allocation_bytes = allocation_word_size * HeapWordSize; I think we should do better here for humongous allocations: afaict we get the actual object size passed here, but in reality g1 needs to allocate on a full region basis. I.e. for humongous objects, `allocation_word_size` should be padded. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24944#discussion_r2066033403 From ivan.walulya at oracle.com Tue Apr 29 10:46:35 2025 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Tue, 29 Apr 2025 10:46:35 +0000 Subject: Request for Feedback and Testing on G1 Heap Resizing Prototype Message-ID: <6B0649C0-8188-47AB-8EA1-B4A48172898C@oracle.com> As part of our preparations for AHS, we are prototyping changes to the G1 heap resizing policy to improve the effectiveness of the GCTimeRatio [1]. The GCTimeRatio is set to manage the balance between GC time and Application execution time. G1's current implementation of GCTimeRatio appears to have drifted from its intended purpose over time. It may no longer accurately guide heap sizing in response to GC overhead. Therefore, we need to change this mechanism with the goal that G1 better manages heap sizes without the need for additional tuning knobs. The prototype allows both expansion and shrinking of the heap at the end of any GC, as opposed to the current behavior where shrinking is only allowed at Remark or Full GC pauses [2]. We also increase the default GCTimeRatio from 12 to 24 [3] (we are choosing 24 but open to suggestions). The existing default causes the heap to shrink too aggressively under the new policy in order to maintain the target GCTimeRatio. A higher default provides a better balance and avoids shrinking heap. Additionally, we are removing the heap resizing at the end of the Remark pause which was based on MinHeapFreeRatio and MaxHeapFreeRatio. This resizing of the heap ignores current application behaviour and may lead to pathological cases of repeated concurrent mark cycles: * we shrink the heap at remark, * a smaller heap triggers a concurrent marking in the subsequent GCs as well as expanding the heap * the concurrent cycle ends in another remark pause where the cycle restarts. We keep this MinHeapFreeRatio-MaxHeapFreeRatio based resizing logic at the end of Full GC. As a result of these changes, applications may settle at more appropriate and in some cases smaller heap sizes for a given GCTimeRatio. While this may show as regression in some benchmarks that are sensitive to heap size, it is still improved control over GC behaviour. We are requesting for feedback or testing of these changes before propose to merge them with mainline. Some of the changes that are independent of the GCTimeRatio are already out for review [4, 5], other minor fixes will be split out and pushed independently. // Ivan References: [1] https://github.com/openjdk/jdk/compare/master...walulyai:jdk:G1HeapResizePolicy [2] JDK-8238687 Investigate memory uncommit during young collections in G1 [https://bugs.openjdk.org/browse/JDK-8238687] [3] JDK-8247843 Reconsider G1 default GCTimeRatio value [https://bugs.openjdk.org/browse/JDK-8247843] [4] JDK-8355681 G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [https://bugs.openjdk.org/browse/JDK-8355681] [5] JDK-8355756 G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [https://bugs.openjdk.org/browse/JDK-8355756] -------------- next part -------------- An HTML attachment was scrubbed... URL: From ayang at openjdk.org Tue Apr 29 10:57:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 29 Apr 2025 10:57:52 GMT Subject: RFR: 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size [v4] In-Reply-To: References: Message-ID: <4CHbRYJQUin42NBK8P74ET3O0OeGIS5ZakqpRLFDTqM=.94f3d876-102e-4946-b965-45eeddfce44b@github.com> On Tue, 29 Apr 2025 09:51:05 GMT, Thomas Stuefe wrote: > why does the heap region size factor into this decision at all? I wonder if it's due to `Arguments::max_heap_for_compressed_oops`: // We need to fit both the null page and the heap into the memory budget, while // keeping alignment constraints of the heap. To guarantee the latter, as the // null page is located before the heap, we pad the null page to the conservative // maximum alignment that the GC may ever impose upon the heap. size_t displacement_due_to_null_page = align_up(os::vm_page_size(), _conservative_max_heap_alignment); LP64_ONLY(return OopEncodingHeapMax - displacement_due_to_null_page); The actual heap starts after the null-page, and the null-page takes the heap-region size in order for heap-start to be aligned. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24541#issuecomment-2838313156 From tschatzl at openjdk.org Tue Apr 29 10:57:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 29 Apr 2025 10:57:52 GMT Subject: RFR: 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size [v4] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:51:05 GMT, Thomas Stuefe wrote: > This may be a stupid question, but why does the heap region size factor into this decision at all? I assume that both heap base and heap max size are aligned to heap region size? >From the calculation for max heap for compressed oops: size_t Arguments::max_heap_for_compressed_oops() { // Avoid sign flip. assert(OopEncodingHeapMax > (uint64_t)os::vm_page_size(), "Unusual page size"); // We need to fit both the null page and the heap into the memory budget, while // keeping alignment constraints of the heap. To guarantee the latter, as the // null page is located before the heap, we pad the null page to the conservative // maximum alignment that the GC may ever impose upon the heap. size_t displacement_due_to_null_page = align_up(os::vm_page_size(), _conservative_max_heap_alignment); LP64_ONLY(return OopEncodingHeapMax - displacement_due_to_null_page); NOT_LP64(ShouldNotReachHere(); return 0); } This conservative max heap alignment is what we change from absolute maximum (512M) to what the ergonomics would at most use (32M). At that point we can only use these conservative values, because ergonomics only later decides heap region size based on max heap size which the code may not have at this point. (Afair) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24541#issuecomment-2838315376 From thomas.schatzl at oracle.com Tue Apr 29 11:33:56 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 29 Apr 2025 13:33:56 +0200 Subject: Request for Feedback and Testing on G1 Heap Resizing Prototype In-Reply-To: <6B0649C0-8188-47AB-8EA1-B4A48172898C@oracle.com> References: <6B0649C0-8188-47AB-8EA1-B4A48172898C@oracle.com> Message-ID: <91b4d64f-261c-4355-b6d3-279af4583b1a@oracle.com> Hi Ivan, thanks for working on this! Some comments for people (Man, Monica, Kirk) potentially taking this for a spin: On 29.04.25 12:46, Ivan Walulya wrote: > As part of our preparations for AHS, we are prototyping changes to the > G1 heap resizing policy to improve the effectiveness of the GCTimeRatio > [1]. The GCTimeRatio is set to manage the balance between GC time and > Application execution time. G1's current implementation of GCTimeRatio > appears to have drifted from its intended purpose over time. It may no > longer accurately guide heap sizing in response to GC overhead. > Therefore, we need to change this mechanism with the goal that G1 better > manages heap sizes without the need for additional tuning knobs. > > The prototype allows both expansion and shrinking of the heap at the end > of any GC, as opposed to the current behavior where shrinking is only > allowed at Remark or Full GC pauses [2]. ?We also increase the default > GCTimeRatio from 12 to 24 [3] (we are choosing 24 but open to > suggestions). The existing default causes the heap to shrink too > aggressively under the new policy in order to maintain the target > GCTimeRatio. A higher default provides a better balance and avoids > shrinking heap. So if one were to make GCTimeRatio manageable (just for testing purposes), and made it a float (for better control), changes to it should reflect on the used heap size in the next few GCs automatically. A SoftMaxHeapSize implementation based on the discussion in the PR [0] that only guides IHOP with changes in ?G1AdaptiveIHOPControl::actual_target_threshold() should be effective now, but there may be issues with this GCTimeRatio based heap sizing that would be interesting to explore. > Additionally, we are removing the heap resizing at the end of the Remark > pause which was based on MinHeapFreeRatio and MaxHeapFreeRatio. This > resizing of the heap ignores current application behaviour and may lead > to pathological cases of repeated concurrent mark cycles: > > * ? ? ?we shrink the heap at remark, > * ? ? ?a smaller heap triggers a concurrent marking in the subsequent > GCs as well as expanding the heap > * ? ? ?the concurrent cycle ends in another remark pause where the > cycle restarts. > > > We keep this MinHeapFreeRatio-MaxHeapFreeRatio based resizing logic at > the end of Full GC. The use case for this might be ones similar to CraC to temporarily compact the heap as much as possible; however it might be better to have explicit control for that (e.g. a jcmd). Ultimately there may be need to remove it as well for full gcs, replacing it with something else. > As a result of these changes, applications may settle at more > appropriate and in some cases smaller heap sizes for a given > GCTimeRatio. While this may show as regression in some benchmarks that > are sensitive to heap size, it is still improved control over GC behaviour. > > We are requesting for feedback or testing of these changes before > propose to merge them with mainline. > > Some of the changes that are independent of the GCTimeRatio are already > out for review [4, 5], other minor fixes will be split out and pushed > independently. > [0] https://github.com/openjdk/jdk/pull/24211 Hth, Thomas From kbarrett at openjdk.org Tue Apr 29 11:50:45 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 29 Apr 2025 11:50:45 GMT Subject: RFR: 8355632: WhiteBox.waitForReferenceProcessing() fails assert for return type [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 19:21:05 GMT, Brent Christian wrote: >> The newly-added `WhiteBox.waitForReferenceProcessing()` (see [8305186](https://bugs.openjdk.org/browse/JDK-8305186)) always fails with assertions enabled. >> I've updated the assertion, and also added the test I used locally to test the new method (just not with assertions enabled, apparently.) > > Brent Christian has updated the pull request incrementally with four additional commits since the last revision: > > - move test > - rename test class > - enable assertions on test itself > - use boolean.class Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24892#pullrequestreview-2803110349 From stuefe at openjdk.org Tue Apr 29 12:02:47 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 29 Apr 2025 12:02:47 GMT Subject: RFR: 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size [v4] In-Reply-To: <4CHbRYJQUin42NBK8P74ET3O0OeGIS5ZakqpRLFDTqM=.94f3d876-102e-4946-b965-45eeddfce44b@github.com> References: <4CHbRYJQUin42NBK8P74ET3O0OeGIS5ZakqpRLFDTqM=.94f3d876-102e-4946-b965-45eeddfce44b@github.com> Message-ID: <5Orme_GDc0qqrFxvYatAak-PS2N9K7zBDvnqirhJP6Y=.487478eb-9773-4672-8d66-b50081111c66@github.com> On Tue, 29 Apr 2025 10:54:24 GMT, Albert Mingkun Yang wrote: > > why does the heap region size factor into this decision at all? > > I wonder if it's due to `Arguments::max_heap_for_compressed_oops`: > > ``` > // We need to fit both the null page and the heap into the memory budget, while > // keeping alignment constraints of the heap. To guarantee the latter, as the > // null page is located before the heap, we pad the null page to the conservative > // maximum alignment that the GC may ever impose upon the heap. > size_t displacement_due_to_null_page = align_up(os::vm_page_size(), > _conservative_max_heap_alignment); > > LP64_ONLY(return OopEncodingHeapMax - displacement_due_to_null_page); > ``` > > The actual heap starts after the null-page, and the null-page takes the heap-region size in order for heap-start to be aligned. Thanks @albertnetymk @tbzhang, that makes sense: - Null area must be located at encoding base (heap base) - Heap is split into regions, regions must be pow2-sized and (I assume) start at region-size-aligned addresses. - Null area must directly precede first region. First region follows Null area. So Null area must be sized to be region size. In Metaspace, I do this differently: https://github.com/openjdk/jdk/blob/6a0c24f9db0b15a00ecadca6e853ed5aa3775b78/src/hotspot/share/memory/metaspace.cpp#L816 Here, the Null area is part of the first Root chunk segment. I set up class space starting at encoding base, then allocate a smaller area from that chunk, protect it, and never free it again. I very briefly wondered why this was not possible in the java heap (e.g. protect the first page of the first region) but that would be a worse solution for many reasons. It would mean region 0 requires special attention, would prevent it from being collected normally, would mean we had to find a separate solution for every collector etc. Rather just live with wasting a bit of address space at the front of the heap. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24541#issuecomment-2838548537 From ayang at openjdk.org Tue Apr 29 14:00:46 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 29 Apr 2025 14:00:46 GMT Subject: RFR: 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size [v4] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:31:48 GMT, Tongbao Zhang wrote: >> After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. >> So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. >> >> When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. >> >> Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. >> >> before this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = false {product lp64_product} {default} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) >> >> >> after this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = true {product lp64_product} {ergonomic} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: > > typo This PR is good by itself. Maybe in a followup, one can explore the possibility of adjusting (e.g. align-down instead) max-heap-size somewhere in order to avoid the current circular dependency. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24541#pullrequestreview-2803711479 From mbaesken at openjdk.org Tue Apr 29 14:29:49 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 29 Apr 2025 14:29:49 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: Message-ID: <_6MD1OrkbiBPcjVkKGXvlH4xGplX11i7L_FAYKXZls8=.1a8d7276-7eac-443c-aa74-a45a3ef65e17@github.com> On Tue, 29 Apr 2025 07:39:20 GMT, Matthias Baesken wrote: >Unfortunately, not so good . >On darwin x86_64 fastdebug, > >gc/shenandoah/generational/TestOldGrowthTriggers.java > >triggers now this crash/assert Seems we have for this already https://bugs.openjdk.org/browse/JDK-8355789 JDK-8355789: GenShen: assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle ------------- PR Comment: https://git.openjdk.org/jdk/pull/24888#issuecomment-2839141827 From shade at openjdk.org Tue Apr 29 15:07:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 15:07:50 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v6] In-Reply-To: References: <49DMOKtJkD7AtmJuFif9VIIZmM4VYYFYmb-aUmXnG7Q=.7b926824-a81e-4b38-902e-16191b4e46ac@github.com> Message-ID: On Tue, 29 Apr 2025 09:18:59 GMT, Aleksey Shipilev wrote: >> Ok, thanks for checking! Good to know there's no existing bug. >> >> What I had in mind is as follows: >> >> InstanceKlass* holder = method->method_holder(); >> if (holder->class_loader_data()->is_permanent_class_loader_data()) { >> return nullptr; // method holder class can't be unloaded >> } else { >> // Normal class, return the holder that would block unloading. >> // This would be either classloader oop for non-hidden classes, >> // or Java mirror oop for hidden classes. >> assert(holder->klass_holder() != nullptr, ""); >> return holder->klass_holder(); >> } >> >> >> IMO it makes the check more precise and, at the same time, communicates the intent better. What do you think? > > Yes, OK, let's do a variant of that. Committed. I'll re-run test to see if there are any surprises about these asserts. Testing is still green, no surprises. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2066776548 From shade at openjdk.org Tue Apr 29 15:15:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Apr 2025 15:15:47 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:22:10 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Improve get_method_blocker @kimbarrett, is this what you had in mind when suggesting this originally: "But I don't think that's the way to go, because I think this code shouldn't be using JNIHandles and jobjects at all. It should be using oop* from VMGlobal and VMWeak."? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2839296934 From iwalulya at openjdk.org Tue Apr 29 15:27:46 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 29 Apr 2025 15:27:46 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 10:42:41 GMT, Thomas Schatzl wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 239: > >> 237: // unnecessary shrinking that would be followed by an expand call to satisfy the >> 238: // allocation. >> 239: size_t allocation_bytes = allocation_word_size * HeapWordSize; > > I think we should do better here for humongous allocations: afaict we get the actual object size passed here, but in reality g1 needs to allocate on a full region basis. > I.e. for humongous objects, `allocation_word_size` should be padded. At this point all computation is considering bytes, later in the shrink helper we align_down since the shrinking is done on a region basis. So except for documentation purposes, I don't think we need to do the padding here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24944#discussion_r2066819626 From wkemper at openjdk.org Tue Apr 29 17:35:45 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 29 Apr 2025 17:35:45 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: <_6MD1OrkbiBPcjVkKGXvlH4xGplX11i7L_FAYKXZls8=.1a8d7276-7eac-443c-aa74-a45a3ef65e17@github.com> References: <_6MD1OrkbiBPcjVkKGXvlH4xGplX11i7L_FAYKXZls8=.1a8d7276-7eac-443c-aa74-a45a3ef65e17@github.com> Message-ID: On Tue, 29 Apr 2025 14:26:43 GMT, Matthias Baesken wrote: >>> thank you! Please let us know how it goes. >> >> Unfortunately, not so good . >> On darwin x86_64 fastdebug, >> >> gc/shenandoah/generational/TestOldGrowthTriggers.java >> >> triggers now this crash/assert >> >> >> # Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-dev-macos_x86_64-dbg/jdk/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp:607), pid=58469, tid=18179 >> # assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle >> >> Stack: [0x000070000fc5b000,0x000070000fd5b000], sp=0x000070000fd5aae0, free space=1022k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.dylib+0x1576749] VMError::report(outputStream*, bool)+0x1ef9 (shenandoahGenerationalControlThread.cpp:607) >> V [libjvm.dylib+0x157a65b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x60b >> V [libjvm.dylib+0x72bd08] report_vm_error(char const*, int, char const*, char const*, ...)+0xd8 >> V [libjvm.dylib+0x121e75a] ShenandoahGenerationalControlThread::check_cancellation_or_degen(ShenandoahGC::ShenandoahDegenPoint)+0x14a >> V [libjvm.dylib+0x121e455] ShenandoahGenerationalControlThread::service_concurrent_cycle(ShenandoahGeneration*, GCCause::Cause, bool)+0x165 >> V [libjvm.dylib+0x121cd72] ShenandoahGenerationalControlThread::run_gc_cycle(ShenandoahGenerationalControlThread::ShenandoahGCRequest const&)+0x1a2 >> V [libjvm.dylib+0x121c8b2] ShenandoahGenerationalControlThread::run_service()+0x142 >> V [libjvm.dylib+0x6a798b] ConcurrentGCThread::run()+0x1b >> V [libjvm.dylib+0x14bfa5c] Thread::call_run()+0xbc >> V [libjvm.dylib+0x1060ff7] thread_native_entry(Thread*)+0x137 >> C [libsystem_pthread.dylib+0x618b] _pthread_start+0x63 >> C [libsystem_pthread.dylib+0x1ae3] thread_start+0xf > >>Unfortunately, not so good . >>On darwin x86_64 fastdebug, >> >>gc/shenandoah/generational/TestOldGrowthTriggers.java >> >>triggers now this crash/assert > > Seems we have for this already > https://bugs.openjdk.org/browse/JDK-8355789 > JDK-8355789: GenShen: assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle @MBaesken - we have a PR for that assert under review: https://github.com/openjdk/jdk/pull/24940 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24888#issuecomment-2839687949 From kdnilsen at openjdk.org Tue Apr 29 17:38:47 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 29 Apr 2025 17:38:47 GMT Subject: RFR: 8355789: GenShen: assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 22:57:09 GMT, William Kemper wrote: > When old generation marking is cancelled to run a young collection. we still set a `_degen_point ` for reasons that became vestigial after [JDK-8349094](https://bugs.openjdk.org/browse/JDK-8349094). When old marking is cancelled, the `_degen_point` should only be set if the marking was cancelled because of an allocation failure (and it should still only be set to "outside of cycle"). The following sequence could lead to this assertion failure: > 1. Control thread is marking old > 2. Young GC preempts it > 3. Control thread sets the degen point because the old GC was "cancelled" > 4. The concurrent young GC fails and attempts to set a degenerated point > 5. This trips the assert because we already (incorrectly) set the degen point in `3`. Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24940#pullrequestreview-2804472540 From jbhateja at openjdk.org Tue Apr 29 18:52:23 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 29 Apr 2025 18:52:23 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Message-ID: This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. Please review and share your feedback. Best Regards, Jatin [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. ------------- Commit messages: - 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Changes: https://git.openjdk.org/jdk/pull/24919/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24919&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355364 Stats: 18 lines in 4 files changed: 7 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24919/head:pull/24919 PR: https://git.openjdk.org/jdk/pull/24919 From jbhateja at openjdk.org Tue Apr 29 18:52:23 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 29 Apr 2025 18:52:23 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. Please refer to following comments in relocInfo, which warns against recording relocation against exact patch site as it may pose problems in querying / iterating over relocations corresponding to particular instruction starting address. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L85 @TobiHartmann confirmed that the patch fixed crashes. https://bugs.openjdk.org/browse/JDK-8355363#:~:text=Sounds%20reasonable%2C%20maybe%20mention%20that%20in%20the%20PR%20as%20well.%20All%20testing%20passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2839879447 From vlivanov at openjdk.org Tue Apr 29 19:17:53 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 29 Apr 2025 19:17:53 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: <6GKM--4QaU2R3zwcgKb-zueVIrKX9MvYHsE-95HDHYI=.350ce1e8-0190-4d09-b35c-b37ab66eb883@github.com> On Tue, 29 Apr 2025 09:22:10 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Improve get_method_blocker Looks good. I'll submit it for testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2804718558 From qamai at openjdk.org Tue Apr 29 19:34:48 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 29 Apr 2025 19:34:48 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: <17WJx_sXIF4A7rrZmzOLuJ4WjyvTNm957aJ35MG2XLU=.063b8bf5-8626-4113-b8f5-814a3b314d47@github.com> On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. I think it is more future-proof to enhance the relocation information with the offset of the exact relocation patch from the instruction start instead. I also don't agree with adding `nop` to the fast path, especially `uncolor` is used in the load fast path IIUC. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2840022636 From coleenp at openjdk.org Tue Apr 29 19:39:52 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 29 Apr 2025 19:39:52 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:22:10 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Improve get_method_blocker src/hotspot/share/runtime/unloadableMethodHandle.hpp line 26: > 24: > 25: #ifndef SHARE_RUNTIME_UNLOADABLE_METHOD_HANDLE_HPP > 26: #define SHARE_RUNTIME_UNLOADABLE_METHOD_HANDLE_HPP I think this should be in the oops directory like OopHandle and WeakHandle and Method*, the thing it contains. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2067225946 From ysr at openjdk.org Tue Apr 29 20:17:49 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 29 Apr 2025 20:17:49 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled [v2] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 22:26:00 GMT, William Kemper wrote: >> Add a test case for `-XX:+UseCompactObjectHeaders`, increase pressure on old generation. I ran the test (which includes a compact object headers case now) fifty times without failure. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Clarify cases with comment ok ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24888#pullrequestreview-2804940817 From bchristi at openjdk.org Tue Apr 29 20:17:58 2025 From: bchristi at openjdk.org (Brent Christian) Date: Tue, 29 Apr 2025 20:17:58 GMT Subject: Integrated: 8355632: WhiteBox.waitForReferenceProcessing() fails assert for return type In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 23:37:13 GMT, Brent Christian wrote: > The newly-added `WhiteBox.waitForReferenceProcessing()` (see [8305186](https://bugs.openjdk.org/browse/JDK-8305186)) always fails with assertions enabled. > I've updated the assertion, and also added the test I used locally to test the new method (just not with assertions enabled, apparently.) This pull request has now been integrated. Changeset: bf52eb03 Author: Brent Christian URL: https://git.openjdk.org/jdk/commit/bf52eb035865353fdf5c6c242f9676a51dcc9e22 Stats: 31 lines in 2 files changed: 30 ins; 0 del; 1 mod 8355632: WhiteBox.waitForReferenceProcessing() fails assert for return type Reviewed-by: kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/24892 From ysr at openjdk.org Tue Apr 29 21:18:44 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 29 Apr 2025 21:18:44 GMT Subject: RFR: 8355789: GenShen: assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 22:57:09 GMT, William Kemper wrote: > When old generation marking is cancelled to run a young collection. we still set a `_degen_point ` for reasons that became vestigial after [JDK-8349094](https://bugs.openjdk.org/browse/JDK-8349094). When old marking is cancelled, the `_degen_point` should only be set if the marking was cancelled because of an allocation failure (and it should still only be set to "outside of cycle"). The following sequence could lead to this assertion failure: > 1. Control thread is marking old > 2. Young GC preempts it > 3. Control thread sets the degen point because the old GC was "cancelled" > 4. The concurrent young GC fails and attempts to set a degenerated point > 5. This trips the assert because we already (incorrectly) set the degen point in `3`. This looks fine, but it also leaves me with the feeling that we should sit down and review the interaction of the state machines for the young and old collections with the mutator allocations, and see if we can simplify the interaction protocol. ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24940#pullrequestreview-2805105686 From vlivanov at openjdk.org Tue Apr 29 21:48:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 29 Apr 2025 21:48:48 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: <8whv0N23B1N6GRZl7ASNlvvRObm0Y7RVWldnaRIXplo=.39171c24-7b80-4721-8ef5-d5c55affddab@github.com> On Tue, 29 Apr 2025 09:22:10 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Improve get_method_blocker Testing results (hs-tier1 - hs-tier4) look good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2805157762 From dlong at openjdk.org Tue Apr 29 22:38:50 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Apr 2025 22:38:50 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. An alternative fix would be to change CompiledDirectCall::find_stub_for() so that it ignores relocInfo::barrier_type. Adding a nop for ZBarrierRelocationFormatLoadGoodAfterShX but not other relocations, like ZBarrierRelocationFormatStoreGoodAfterOr, seems less robust. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2840387447 From wkemper at openjdk.org Tue Apr 29 22:58:50 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 29 Apr 2025 22:58:50 GMT Subject: Integrated: 8355789: GenShen: assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 22:57:09 GMT, William Kemper wrote: > When old generation marking is cancelled to run a young collection. we still set a `_degen_point ` for reasons that became vestigial after [JDK-8349094](https://bugs.openjdk.org/browse/JDK-8349094). When old marking is cancelled, the `_degen_point` should only be set if the marking was cancelled because of an allocation failure (and it should still only be set to "outside of cycle"). The following sequence could lead to this assertion failure: > 1. Control thread is marking old > 2. Young GC preempts it > 3. Control thread sets the degen point because the old GC was "cancelled" > 4. The concurrent young GC fails and attempts to set a degenerated point > 5. This trips the assert because we already (incorrectly) set the degen point in `3`. This pull request has now been integrated. Changeset: 5e27547e Author: William Kemper URL: https://git.openjdk.org/jdk/commit/5e27547e2d577e17316ae1a91f83e4091e9729c5 Stats: 11 lines in 1 file changed: 0 ins; 3 del; 8 mod 8355789: GenShen: assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/24940 From lmesnik at openjdk.org Tue Apr 29 23:47:50 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 29 Apr 2025 23:47:50 GMT Subject: Integrated: 8355069: Allocation::check_out_of_memory() should support CheckUnhandledOops mode In-Reply-To: References: Message-ID: On Sat, 19 Apr 2025 00:39:00 GMT, Leonid Mesnik wrote: > The > CheckUnhandledOops > cause failure if JvmtiExport::post_resource_exhausted(...) > is called in > MemAllocator::Allocation::check_out_of_memory() > The obj is null so it is not a real bug. > > I am fixing it to reduce noise for CheckUnhandledOops mode for jvmti tests execution. > The vmTestbase/nsk/jvmti/ResourceExhausted/resexhausted002/TestDescription.java > failed with -XX:+CheckUnhandledOops > > If define are unwelcome here, the > ``` PreserveObj obj_h(_thread, _obj_ptr);``` > might be added instead with comment why it is needed for null obj. This pull request has now been integrated. Changeset: 83d0bd85 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/83d0bd85afaf1b5724c12f4d2f6e9c7087bab4e8 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8355069: Allocation::check_out_of_memory() should support CheckUnhandledOops mode Reviewed-by: sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/24766 From never at openjdk.org Tue Apr 29 23:52:19 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 29 Apr 2025 23:52:19 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Message-ID: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. ------------- Commit messages: - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Changes: https://git.openjdk.org/jdk/pull/24957/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24957&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343158 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24957.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24957/head:pull/24957 PR: https://git.openjdk.org/jdk/pull/24957 From never at openjdk.org Tue Apr 29 23:58:36 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 29 Apr 2025 23:58:36 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation [v2] In-Reply-To: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: > JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into tkr-zgc-deoptimize-allocation - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24957/files - new: https://git.openjdk.org/jdk/pull/24957/files/1cffe543..aba20dc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24957&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24957&range=00-01 Stats: 435 lines in 17 files changed: 268 ins; 42 del; 125 mod Patch: https://git.openjdk.org/jdk/pull/24957.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24957/head:pull/24957 PR: https://git.openjdk.org/jdk/pull/24957 From jbhateja at openjdk.org Wed Apr 30 01:46:49 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 30 Apr 2025 01:46:49 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: <17WJx_sXIF4A7rrZmzOLuJ4WjyvTNm957aJ35MG2XLU=.063b8bf5-8626-4113-b8f5-814a3b314d47@github.com> References: <17WJx_sXIF4A7rrZmzOLuJ4WjyvTNm957aJ35MG2XLU=.063b8bf5-8626-4113-b8f5-814a3b314d47@github.com> Message-ID: <1_USVhqRqOqwC7RkPEUyIJ2Mew529yUNVEA-hTcNJY4=.dbe3355f-9b1a-48a7-99ae-ee56760ae9f3@github.com> On Tue, 29 Apr 2025 19:31:46 GMT, Quan Anh Mai wrote: > I think it is more future-proof to enhance the relocation information with the offset of the exact relocation patch from the instruction start instead. I also don't agree with adding `nop` to the fast path, especially `uncolor` is used in the load fast path IIUC. Thanks for supporting this idea, specializing barrier relocation is an alternative we already discussed, but it may not be able shield against false mapping with subsequent relocatable instruction which is what is causing crash currently. https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2025-April/088895.html @dean-long's suggestion to map a relocation to exact patch site was a fullproof solution overcome any such limitation, but it may pose problems while querying/iterating over relocation set against starting address of instruction and is a bigger change which we plan to address with after evaluation and considering alternative scheme https://bugs.openjdk.org/browse/JDK-8355341 Current scheme of adding relocation from end of instruction is not robust either to prevent incorrect mapping with subsequent relocatable instruction, NOP is not dispatched to execution unit by add additional byte to code cache. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2840602447 From qamai at openjdk.org Wed Apr 30 02:31:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 30 Apr 2025 02:31:44 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. What I meant is that we should map a relocation to BOTH the instruction start and the patch site. APX has not even released yet so I think it is more efficient to make a better fix than to make a quicker one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2840648169 From duke at openjdk.org Wed Apr 30 03:28:46 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Wed, 30 Apr 2025 03:28:46 GMT Subject: RFR: 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size [v4] In-Reply-To: References: Message-ID: <9LJXRnXJYSP31o0qt0cd7C_F7qy_cN0ozvU4GSmkksE=.98596096-7247-4c56-bd03-fa76857531eb@github.com> On Tue, 15 Apr 2025 03:31:48 GMT, Tongbao Zhang wrote: >> After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. >> So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. >> >> When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. >> >> Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. >> >> before this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = false {product lp64_product} {default} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) >> >> >> after this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = true {product lp64_product} {ergonomic} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: > > typo Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24541#issuecomment-2840707141 From duke at openjdk.org Wed Apr 30 03:31:48 2025 From: duke at openjdk.org (duke) Date: Wed, 30 Apr 2025 03:31:48 GMT Subject: RFR: 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size [v4] In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:31:48 GMT, Tongbao Zhang wrote: >> After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. >> So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. >> >> When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. >> >> Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. >> >> before this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = false {product lp64_product} {default} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) >> >> >> after this patch: >> >> ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops >> bool UseCompressedOops = true {product lp64_product} {ergonomic} >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: > > typo @tbzhang Your change (at version 17c0a8a03e4577fed20c40bb1f209c017522ffbc) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24541#issuecomment-2840710313 From shade at openjdk.org Wed Apr 30 07:23:39 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 30 Apr 2025 07:23:39 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Move to oops ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/9f44cb5c..baea6cde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=09-10 Stats: 11 lines in 5 files changed: 1 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From iwalulya at openjdk.org Wed Apr 30 07:23:45 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 30 Apr 2025 07:23:45 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 15:24:44 GMT, Ivan Walulya wrote: >> src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 239: >> >>> 237: // unnecessary shrinking that would be followed by an expand call to satisfy the >>> 238: // allocation. >>> 239: size_t allocation_bytes = allocation_word_size * HeapWordSize; >> >> I think we should do better here for humongous allocations: afaict we get the actual object size passed here, but in reality g1 needs to allocate on a full region basis. >> I.e. for humongous objects, `allocation_word_size` should be padded. > > At this point all computation is considering bytes, later in the shrink helper we align_down since the shrinking is done on a region basis. So except for documentation purposes, I don't think we need to do the padding here. On second look, you are right, we need to pad the humongous. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24944#discussion_r2068039988 From shade at openjdk.org Wed Apr 30 07:23:41 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 30 Apr 2025 07:23:41 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v10] In-Reply-To: References: Message-ID: <-ZNfojmvOVBb11JmAr_91o6CnxXMnv2DLe82gbZNwEs=.45c66691-575a-48db-b1d6-f1c5611c6ea3@github.com> On Tue, 29 Apr 2025 19:37:14 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve get_method_blocker > > src/hotspot/share/runtime/unloadableMethodHandle.hpp line 26: > >> 24: >> 25: #ifndef SHARE_RUNTIME_UNLOADABLE_METHOD_HANDLE_HPP >> 26: #define SHARE_RUNTIME_UNLOADABLE_METHOD_HANDLE_HPP > > I think this should be in the oops directory like OopHandle and WeakHandle and Method*, the thing it contains. Good call, moved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2068035888 From duke at openjdk.org Wed Apr 30 09:59:53 2025 From: duke at openjdk.org (Tongbao Zhang) Date: Wed, 30 Apr 2025 09:59:53 GMT Subject: Integrated: 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 10:37:24 GMT, Tongbao Zhang wrote: > After [JDK-8275056](https://bugs.openjdk.org/browse/JDK-8275056), The max heap region size became 512M, and the calculation of CompressedOops based on the max_heap_size - max_heap_region_size. > So before this patch, the CompressedOops will turn on below 32G - 32m, After this patch is 32G -512m. > > When our Apps migrating from JDK11 to JDK21, the heap size parameters(Xmx32736m) will turn off the CompressedOops. > > Since the current max ergonomics size is still 32m, We hoped that the original behavior will not be changed if HeapRegionSize is not explicitly set. > > before this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = false {product lp64_product} {default} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) > > > after this patch: > > ./build/linux-x86_64-server-release/images/jdk/bin/java -Xmx32736m -XX:+PrintFlagsFinal -version | grep CompressedOops > bool UseCompressedOops = true {product lp64_product} {ergonomic} > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.root.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.root.jdk, mixed mode, sharing) This pull request has now been integrated. Changeset: 526951db Author: Tongbao Zhang Committer: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/526951dba731f0e733e22a3bff7ac7a18ce9dece Stats: 94 lines in 4 files changed: 94 ins; 0 del; 0 mod 8354145: G1: UseCompressedOops boundary is calculated on maximum heap region size instead of maxiumum ergonomic heap region size Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/24541 From iwalulya at openjdk.org Wed Apr 30 10:15:30 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 30 Apr 2025 10:15:30 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v2] In-Reply-To: References: Message-ID: > Hi, > > Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Thomas Review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24944/files - new: https://git.openjdk.org/jdk/pull/24944/files/130eda71..6ef77f71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24944&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24944&range=00-01 Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24944.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24944/head:pull/24944 PR: https://git.openjdk.org/jdk/pull/24944 From ayang at openjdk.org Wed Apr 30 10:36:51 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 30 Apr 2025 10:36:51 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation In-Reply-To: References: Message-ID: <8OFJ2lP5ECUqK6bh56ThD1jUJfXGb6UHXh0rrD6XptU=.4ad9e344-dffe-4ed8-8188-ea470fb4cb4a@github.com> On Mon, 28 Apr 2025 10:57:48 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. > > Testing: Tier 1-3 src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 909: > 907: // For humongous objects, we should have expanded the heap on the first > 908: // attempt_allocation_at_safepoint above. > 909: result = expand_and_allocate(word_size); Why `attempt_allocation_at_safepoint` performs expansion for humongous objs but not ordinary objs? Since `attempt_allocation_at_safepoint` doesn't contain "expand" in its name, I'd expect it not to perform expansion at all. (If expansion for humongous objs is critical, I wonder if it makes sense to branch on `is_humongous` at the beginning of this method and handle those two cases in two diff paths.) src/hotspot/share/gc/g1/g1HeapRegionManager.cpp line 481: > 479: uint G1HeapRegionManager::find_contiguous_allow_expand(uint num_regions) { > 480: // Check if we can actually satisfy the allocation. > 481: if (num_regions > (num_free_regions() + available())) { The name "available" is too vague -- without looking at its impl, one could think free regions should be "available" as well. I wonder if sth like "num_uncommitted" is more precise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2068400093 PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2068393125 From tschatzl at openjdk.org Wed Apr 30 11:21:41 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 30 Apr 2025 11:21:41 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v37] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * ayang review: remove sweep_epoch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/51dfbe54..8b568806 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=35-36 Stats: 21 lines in 4 files changed: 0 ins; 15 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Wed Apr 30 12:04:50 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 30 Apr 2025 12:04:50 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation In-Reply-To: <8OFJ2lP5ECUqK6bh56ThD1jUJfXGb6UHXh0rrD6XptU=.4ad9e344-dffe-4ed8-8188-ea470fb4cb4a@github.com> References: <8OFJ2lP5ECUqK6bh56ThD1jUJfXGb6UHXh0rrD6XptU=.4ad9e344-dffe-4ed8-8188-ea470fb4cb4a@github.com> Message-ID: On Wed, 30 Apr 2025 10:33:10 GMT, Albert Mingkun Yang wrote: >> Hi, >> >> Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. >> >> Testing: Tier 1-3 > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 909: > >> 907: // For humongous objects, we should have expanded the heap on the first >> 908: // attempt_allocation_at_safepoint above. >> 909: result = expand_and_allocate(word_size); > > Why `attempt_allocation_at_safepoint` performs expansion for humongous objs but not ordinary objs? Since `attempt_allocation_at_safepoint` doesn't contain "expand" in its name, I'd expect it not to perform expansion at all. (If expansion for humongous objs is critical, I wonder if it makes sense to branch on `is_humongous` at the beginning of this method and handle those two cases in two diff paths.) This also seems pre-existing. Can you file an issue? > src/hotspot/share/gc/g1/g1HeapRegionManager.cpp line 481: > >> 479: uint G1HeapRegionManager::find_contiguous_allow_expand(uint num_regions) { >> 480: // Check if we can actually satisfy the allocation. >> 481: if (num_regions > (num_free_regions() + available())) { > > The name "available" is too vague -- without looking at its impl, one could think free regions should be "available" as well. I wonder if sth like "num_uncommitted" is more precise. This seems to be a pre-existing issue. Filed https://bugs.openjdk.org/browse/JDK-8355976. Afair this member has always been called that way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2068523076 PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2068522375 From tschatzl at openjdk.org Wed Apr 30 12:15:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 30 Apr 2025 12:15:46 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation In-Reply-To: References: <8OFJ2lP5ECUqK6bh56ThD1jUJfXGb6UHXh0rrD6XptU=.4ad9e344-dffe-4ed8-8188-ea470fb4cb4a@github.com> Message-ID: On Wed, 30 Apr 2025 12:02:10 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 909: >> >>> 907: // For humongous objects, we should have expanded the heap on the first >>> 908: // attempt_allocation_at_safepoint above. >>> 909: result = expand_and_allocate(word_size); >> >> Why `attempt_allocation_at_safepoint` performs expansion for humongous objs but not ordinary objs? Since `attempt_allocation_at_safepoint` doesn't contain "expand" in its name, I'd expect it not to perform expansion at all. (If expansion for humongous objs is critical, I wonder if it makes sense to branch on `is_humongous` at the beginning of this method and handle those two cases in two diff paths.) > > This also seems pre-existing. Can you file an issue? Actually the whole hunk seems to be unrelated to the actual functional change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2068538638 From iwalulya at openjdk.org Wed Apr 30 12:15:47 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 30 Apr 2025 12:15:47 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation In-Reply-To: References: <8OFJ2lP5ECUqK6bh56ThD1jUJfXGb6UHXh0rrD6XptU=.4ad9e344-dffe-4ed8-8188-ea470fb4cb4a@github.com> Message-ID: On Wed, 30 Apr 2025 12:01:37 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1HeapRegionManager.cpp line 481: >> >>> 479: uint G1HeapRegionManager::find_contiguous_allow_expand(uint num_regions) { >>> 480: // Check if we can actually satisfy the allocation. >>> 481: if (num_regions > (num_free_regions() + available())) { >> >> The name "available" is too vague -- without looking at its impl, one could think free regions should be "available" as well. I wonder if sth like "num_uncommitted" is more precise. > > This seems to be a pre-existing issue. Filed https://bugs.openjdk.org/browse/JDK-8355976. > > Afair this member has always been called that way. Yeah, I tripped up on that too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2068536314 From iwalulya at openjdk.org Wed Apr 30 12:18:45 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 30 Apr 2025 12:18:45 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation In-Reply-To: References: <8OFJ2lP5ECUqK6bh56ThD1jUJfXGb6UHXh0rrD6XptU=.4ad9e344-dffe-4ed8-8188-ea470fb4cb4a@github.com> Message-ID: On Wed, 30 Apr 2025 12:13:20 GMT, Thomas Schatzl wrote: >> This also seems pre-existing. Can you file an issue? > > Actually the whole hunk seems to be unrelated to the actual functional change. I added this instead of an assert on failing `expand_and_allocate` for humongous objects, but then figured we could just skip the `expand_and_allocate` attempt which is guaranteed to fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2068542865 From tschatzl at openjdk.org Wed Apr 30 13:11:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 30 Apr 2025 13:11:52 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v2] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 10:15:30 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas Review Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1HeapSizingPolicy.cpp line 242: > 240: // Humongous objects are allocated in entire regions, we must calculate > 241: // required space in terms of full regions, not just the object size. > 242: allocation_word_size = align_up(allocation_word_size, G1HeapRegion::GrainWords); Maybe it is worth adding a (static?) helper function for the padding to humongous object size because this would be the third time this is done in `G1CollectedHeap.cpp`. Also I would prefer if the parts of the whole calculation were done before the actual sum in similar style to existing code: I.e. something like: if (_g1h->is_humongous(...)) { allocation_word_size = _g1h->pad_to_humongous(...); } const size_t used_after_gc = summand_1 + summand_2 + ... + allocation_word_size. ------------- PR Review: https://git.openjdk.org/jdk/pull/24944#pullrequestreview-2807000400 PR Review Comment: https://git.openjdk.org/jdk/pull/24944#discussion_r2068631730 From ayang at openjdk.org Wed Apr 30 14:00:57 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 30 Apr 2025 14:00:57 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation In-Reply-To: References: <8OFJ2lP5ECUqK6bh56ThD1jUJfXGb6UHXh0rrD6XptU=.4ad9e344-dffe-4ed8-8188-ea470fb4cb4a@github.com> Message-ID: On Wed, 30 Apr 2025 12:16:23 GMT, Ivan Walulya wrote: >> Actually the whole hunk seems to be unrelated to the actual functional change. > > I added this instead of an assert on failing `expand_and_allocate` for humongous objects, but then figured we could just skip the `expand_and_allocate` attempt which is guaranteed to fail. Not sure what to write in a ticket. Those are just some questions I had while reading the coed. Anyway, if this part is not supper related to the actual functional change, can it be dealt with in its own PR? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2068725968 From wkemper at openjdk.org Wed Apr 30 15:38:51 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 30 Apr 2025 15:38:51 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: <_6MD1OrkbiBPcjVkKGXvlH4xGplX11i7L_FAYKXZls8=.1a8d7276-7eac-443c-aa74-a45a3ef65e17@github.com> References: <_6MD1OrkbiBPcjVkKGXvlH4xGplX11i7L_FAYKXZls8=.1a8d7276-7eac-443c-aa74-a45a3ef65e17@github.com> Message-ID: On Tue, 29 Apr 2025 14:26:43 GMT, Matthias Baesken wrote: >>> thank you! Please let us know how it goes. >> >> Unfortunately, not so good . >> On darwin x86_64 fastdebug, >> >> gc/shenandoah/generational/TestOldGrowthTriggers.java >> >> triggers now this crash/assert >> >> >> # Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-dev-macos_x86_64-dbg/jdk/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp:607), pid=58469, tid=18179 >> # assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle >> >> Stack: [0x000070000fc5b000,0x000070000fd5b000], sp=0x000070000fd5aae0, free space=1022k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.dylib+0x1576749] VMError::report(outputStream*, bool)+0x1ef9 (shenandoahGenerationalControlThread.cpp:607) >> V [libjvm.dylib+0x157a65b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x60b >> V [libjvm.dylib+0x72bd08] report_vm_error(char const*, int, char const*, char const*, ...)+0xd8 >> V [libjvm.dylib+0x121e75a] ShenandoahGenerationalControlThread::check_cancellation_or_degen(ShenandoahGC::ShenandoahDegenPoint)+0x14a >> V [libjvm.dylib+0x121e455] ShenandoahGenerationalControlThread::service_concurrent_cycle(ShenandoahGeneration*, GCCause::Cause, bool)+0x165 >> V [libjvm.dylib+0x121cd72] ShenandoahGenerationalControlThread::run_gc_cycle(ShenandoahGenerationalControlThread::ShenandoahGCRequest const&)+0x1a2 >> V [libjvm.dylib+0x121c8b2] ShenandoahGenerationalControlThread::run_service()+0x142 >> V [libjvm.dylib+0x6a798b] ConcurrentGCThread::run()+0x1b >> V [libjvm.dylib+0x14bfa5c] Thread::call_run()+0xbc >> V [libjvm.dylib+0x1060ff7] thread_native_entry(Thread*)+0x137 >> C [libsystem_pthread.dylib+0x618b] _pthread_start+0x63 >> C [libsystem_pthread.dylib+0x1ae3] thread_start+0xf > >>Unfortunately, not so good . >>On darwin x86_64 fastdebug, >> >>gc/shenandoah/generational/TestOldGrowthTriggers.java >> >>triggers now this crash/assert > > Seems we have for this already > https://bugs.openjdk.org/browse/JDK-8355789 > JDK-8355789: GenShen: assert(_degen_point == ShenandoahGC::_degenerated_unset) failed: Should not be set yet: Outside of Cycle @MBaesken , have you had a chance to retest after PR#24940 was integrated? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24888#issuecomment-2842405876