From duke at openjdk.org Thu May 1 04:01:50 2025 From: duke at openjdk.org (duke) Date: Thu, 1 May 2025 04:01:50 GMT Subject: Withdrawn: 8351137: ZGC: Improve ZValueStorage alignment support In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:34:36 GMT, Axel Boldt-Christmas wrote: > ZValueStorage only align the allocations to the alignment defined by the storage but ignores the alignment of the types. Right now all usages of our different storages all have types which have an alignment less than or equal to the alignment set by its storage. > > I wish to improve this so that types with greater alignment than the storage alignment can be used. > > The UB caused by using a type larger than the storage alignment is something I have seen materialise as returning bad address (and crashing) on Windows. > > As we use `utilities/align.hpp` for our alignment utilities we only support power of two alignment, I added extra asserts here because we use the fact that `lcm(x, y) = max(x, y)` if both are powers of two. > > Testing: > * tier 1 through tier 5 Oracle supported platforms > * GHA This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23887 From duke at openjdk.org Thu May 1 05:41:39 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 1 May 2025 05:41:39 GMT Subject: RFR: 8350860: Max GC memory overhead tests Message-ID: The G1 GC metadata has increased from JDK8 to the current tip. When upgrading JDK for an application from JDK8, applications might observe native memory increases. GC is one of the top contributors. Small applications tend to get impacted more significantly. See sample test in description in https://bugs.openjdk.org/browse/JDK-8350860, when heap is 128m, the native memory used by gc can be over 80m. In order to make sure we don't bring dramatic native memory increase while developing G1, adding this metadata guardrail test. The test calculates the native memory based on existing GC usages and provides some headroom. When there are significant increase, the test would fail and we should look back to see if the added native memory make sense. ------------- Commit messages: - Remove trailing whitespaces - 8350860: Max GC memory overhead tests Changes: https://git.openjdk.org/jdk/pull/24981/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24981&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350860 Stats: 174 lines in 1 file changed: 174 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24981.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24981/head:pull/24981 PR: https://git.openjdk.org/jdk/pull/24981 From manc at google.com Thu May 1 07:07:03 2025 From: manc at google.com (Man Cao) Date: Thu, 1 May 2025 00:07:03 -0700 Subject: Request for Feedback and Testing on G1 Heap Resizing Prototype In-Reply-To: <91b4d64f-261c-4355-b6d3-279af4583b1a@oracle.com> References: <6B0649C0-8188-47AB-8EA1-B4A48172898C@oracle.com> <91b4d64f-261c-4355-b6d3-279af4583b1a@oracle.com> Message-ID: Great progress! Thank you, Ivan. Optimistically, many of these changes could make it to JDK 25. I'm happy to do some experiments and provide feedback. Does [1] contain all necessary changes? Will that branch be updated as parts of it merge into master? Some early questions/feedback below. > We also increase the default GCTimeRatio from 12 to 24 [3] (we are choosing 24 but open to suggestions). The existing default causes the heap to shrink too aggressively under the new policy in order to maintain the target GCTimeRatio. A higher default provides a better balance and avoids shrinking heap. This changes the pause overhead target from ~8% (1/13) to 4% (1/25). Would it make G1 expand the heap more aggressively after incremental collections compared to existing behavior? Could you share some early/rough performance numbers about 12 vs 24 with the prototype, such as actual heap sizes, throughput differences? > Additionally, we are removing the heap resizing at the end of the Remark pause which was based on MinHeapFreeRatio and MaxHeapFreeRatio. This resizing of the heap ignores current application behaviour and may lead to pathological cases of repeated concurrent mark cycles: In the new prototype, does the pathological case happen with the default MinHeapFreeRatio=40 MaxHeapFreeRatio=70 value? Or mainly with smaller user-defined values for MinHeapFreeRatio/MaxHeapFreeRatio? Re Thomas's comments: > So if one were to make GCTimeRatio manageable (just for testing > purposes), and made it a float (for better control), changes to it > should reflect on the used heap size in the next few GCs automatically. Making GCTimeRatio manageable sounds like a good idea. Do we plan to do this eventually (why "just for testing purposes")? > A SoftMaxHeapSize implementation based on the discussion in the PR [0] > that only guides IHOP with changes in > ?G1AdaptiveIHOPControl::actual_target_threshold() should be effective > now, but there may be issues with this GCTimeRatio based heap sizing > that would be interesting to explore. If G1 strives to respect GCTimeRatio as the prototype proposes, our existing use cases probably no longer need to set SoftMaxHeapSize (and maintains a separate algorithm to calculate values for SoftMaxHeapSize). Our use case still needs CurrentMaxHeapSize, but it could be followed up in https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-April/051996.html. [1] https://github.com/openjdk/jdk/compare/master...walulyai:jdk:G1HeapResizePolicy -Man On Tue, Apr 29, 2025 at 4:34?AM Thomas Schatzl wrote: > Hi Ivan, > > thanks for working on this! > > Some comments for people (Man, Monica, Kirk) potentially taking this for > a spin: > > On 29.04.25 12:46, Ivan Walulya wrote: > > As part of our preparations for AHS, we are prototyping changes to the > > G1 heap resizing policy to improve the effectiveness of the GCTimeRatio > > [1]. The GCTimeRatio is set to manage the balance between GC time and > > Application execution time. G1's current implementation of GCTimeRatio > > appears to have drifted from its intended purpose over time. It may no > > longer accurately guide heap sizing in response to GC overhead. > > Therefore, we need to change this mechanism with the goal that G1 better > > manages heap sizes without the need for additional tuning knobs. > > > The prototype allows both expansion and shrinking of the heap at > the end > > of any GC, as opposed to the current behavior where shrinking is only > > allowed at Remark or Full GC pauses [2]. We also increase the default > > GCTimeRatio from 12 to 24 [3] (we are choosing 24 but open to > > suggestions). The existing default causes the heap to shrink too > > aggressively under the new policy in order to maintain the target > > GCTimeRatio. A higher default provides a better balance and avoids > > shrinking heap. > > So if one were to make GCTimeRatio manageable (just for testing > purposes), and made it a float (for better control), changes to it > should reflect on the used heap size in the next few GCs automatically. > > A SoftMaxHeapSize implementation based on the discussion in the PR [0] > that only guides IHOP with changes in > ?G1AdaptiveIHOPControl::actual_target_threshold() should be effective > now, but there may be issues with this GCTimeRatio based heap sizing > that would be interesting to explore. > > Additionally, we are removing the heap resizing at the end of the Remark > > pause which was based on MinHeapFreeRatio and MaxHeapFreeRatio. This > > resizing of the heap ignores current application behaviour and may lead > > to pathological cases of repeated concurrent mark cycles: > > > > * we shrink the heap at remark, > > * a smaller heap triggers a concurrent marking in the subsequent > > GCs as well as expanding the heap > > * the concurrent cycle ends in another remark pause where the > > cycle restarts. > > > > > > We keep this MinHeapFreeRatio-MaxHeapFreeRatio based resizing logic at > > the end of Full GC. > > The use case for this might be ones similar to CraC to temporarily > compact the heap as much as possible; however it might be better to have > explicit control for that (e.g. a jcmd). > > Ultimately there may be need to remove it as well for full gcs, > replacing it with something else. > > As a result of these changes, applications may settle at more > > appropriate and in some cases smaller heap sizes for a given > > GCTimeRatio. While this may show as regression in some benchmarks that > > are sensitive to heap size, it is still improved control over GC > behaviour. > > > > We are requesting for feedback or testing of these changes before > > propose to merge them with mainline. > > > > Some of the changes that are independent of the GCTimeRatio are already > > out for review [4, 5], other minor fixes will be split out and pushed > > independently. > > > [0] https://github.com/openjdk/jdk/pull/24211 > > Hth, > Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iwalulya at openjdk.org Thu May 1 08:27:32 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 1 May 2025 08:27:32 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: Message-ID: > Hi, > > Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Albert review - Merge branch 'master' into 8355681-find_contiguous_allow_expand - init ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24915/files - new: https://git.openjdk.org/jdk/pull/24915/files/5e8e4a73..5085e54b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24915&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24915&range=00-01 Stats: 13278 lines in 385 files changed: 9400 ins; 1696 del; 2182 mod Patch: https://git.openjdk.org/jdk/pull/24915.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24915/head:pull/24915 PR: https://git.openjdk.org/jdk/pull/24915 From iwalulya at openjdk.org Thu May 1 08:27:33 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 1 May 2025 08:27:33 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: <8OFJ2lP5ECUqK6bh56ThD1jUJfXGb6UHXh0rrD6XptU=.4ad9e344-dffe-4ed8-8188-ea470fb4cb4a@github.com> Message-ID: On Wed, 30 Apr 2025 13:57:45 GMT, Albert Mingkun Yang wrote: >> I added this instead of an assert on failing `expand_and_allocate` for humongous objects, but then figured we could just skip the `expand_and_allocate` attempt which is guaranteed to fail. > > Not sure what to write in a ticket. Those are just some questions I had while reading the coed. Anyway, if this part is not supper related to the actual functional change, can it be dealt with in its own PR? I have removed that check. Will be done with the clean up of `attempt_allocation_at_safepoint` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2069982724 From thomas.schatzl at oracle.com Thu May 1 09:55:00 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 1 May 2025 11:55:00 +0200 Subject: Request for Feedback and Testing on G1 Heap Resizing Prototype In-Reply-To: References: <6B0649C0-8188-47AB-8EA1-B4A48172898C@oracle.com> <91b4d64f-261c-4355-b6d3-279af4583b1a@oracle.com> Message-ID: Hi Man, On 01.05.25 09:07, Man Cao wrote: > Great progress! Thank you, Ivan. Optimistically, many of these changes > could make it to JDK 25. > > I'm happy to do some experiments and provide feedback. Does [1] contain > all necessary changes? Afaik yes. > Will that branch be updated as parts of it merge > into master? Ivan can answer that. > > Some early questions/feedback?below. > > > We also increase the default GCTimeRatio from 12 to 24 [3] (we are > > choosing 24 but open to suggestions). The existing default causes the > > heap to shrink too aggressively under the new policy in order to > > maintain the target GCTimeRatio. A higher default provides a better > > balance and avoids shrinking heap. >> This changes the pause overhead target from ~8% (1/13) to 4% (1/25). > Would it make?G1 expand the heap more aggressively after incremental > collections compared to existing behavior? Could you share some early/ > rough performance numbers about 12 vs 24 with the prototype, such as > actual heap sizes, throughput?differences? That value has been found to create roughly same heap sizes at around the same performance +/- 1-2% throughput across our set of benchmarks that run out-of-box (iirc). Again, Ivan may chime in here. Part of this request for feedback is about getting a larger coverage on this aspect. (The increase in GCTimeRatio has actually been something that is long overdue regardless of this change, since G1's overhead decreased a lot in recent years.) [...] > Re Thomas's comments: > > > So if one were to make GCTimeRatio manageable (just for testing > > purposes), and made it a float (for better control), changes to it > > should reflect on the used heap size in the next few GCs automatically. > > Making GCTimeRatio manageable sounds like a good idea. Do we plan to do > this eventually (why "just for testing purposes")? > It's just not implemented in that branch :) Currently we think that GCTimeRatio will eventually get manageable and likely using floats as the integers are too coarse as divisors as the . Probably as a follow-up. There is still the question whether to deprecate it in favor of some GCCPUUsagePercent or whatever it is going to be called to have more direct control. > > A SoftMaxHeapSize implementation based on the discussion in the PR [0] > > that only guides IHOP with changes in > > ?G1AdaptiveIHOPControl::actual_target_threshold() should be effective > > now, but there may be issues with this GCTimeRatio based heap sizing > > that would be interesting to explore. > > If G1 strives to respect GCTimeRatio as the prototype proposes, our > existing use cases probably no longer need to set SoftMaxHeapSize (and > maintains a separate algorithm to calculate values for SoftMaxHeapSize). The purpose of this request is also to understand whether SoftMaxHeapSize is still necessary :) Sizing based on cpu usage may be more inconvenient and less exact at reducing to a particular target heap size (without OOME'ing) than a direct target heap size. (i.e. I can imagine the case where while the threshold is kind of a continuum, for a collector small changes to heap sizes can lead to radical changes in CPU usage, so G1 might flap back and forth all the time). We also do not have real use cases with real applications where we would temporarily want to keep the heap below a certain value like we think you suggested. Ignoring CraC like use cases where specific functionality would serve that use case better, one current use of SoftMaxHeapSize here is for tuning ZGC performance (since SoftMaxHeapSize is only implemented there), but we do not have seen uses for G1 (obviously as it's not implemented there, and one can use G1ReservePercent to some degree). Note that G1 already has this G1ReservePercent that somewhat already acts like that, so there is a certain overlap that might need some resolving (and its default is too high for large heaps anyway). Or just changed to be adaptive. SoftMaxHeapSize may also be necessary in some cases where there is more information available from the outside than AHS can ever know. So it can be worthwhile experimenting with SMHS anyway. I will update that umbrella CR with new thoughts in the next few days. Thanks, Thomas From ayang at openjdk.org Thu May 1 09:48:47 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 May 2025 09:48:47 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 08:27:32 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Albert review > - Merge branch 'master' into 8355681-find_contiguous_allow_expand > - init Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24915#pullrequestreview-2809538391 From tschatzl at openjdk.org Thu May 1 09:58:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 1 May 2025 09:58:46 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 08:27:32 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Albert review > - Merge branch 'master' into 8355681-find_contiguous_allow_expand > - init Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24915#pullrequestreview-2809548851 From wkemper at openjdk.org Thu May 1 17:43:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 1 May 2025 17:43:59 GMT Subject: Integrated: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: Message-ID: <3bV-rGkGRHjkUNAEElE0_aSdO8t81oMrd88bjWmZY6Y=.0df6a3b1-29db-4592-aa42-7d3d15684455@github.com> On Fri, 25 Apr 2025 20:40:09 GMT, William Kemper wrote: > Add a test case for `-XX:+UseCompactObjectHeaders`, increase pressure on old generation. I ran the test (which includes a compact object headers case now) fifty times without failure. This pull request has now been integrated. Changeset: 9e26b9fa Author: William Kemper URL: https://git.openjdk.org/jdk/commit/9e26b9facba09c4d6f516e8032b876c6d9e95e9e Stats: 24 lines in 1 file changed: 15 ins; 8 del; 1 mod 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/24888 From mbaesken at openjdk.org Fri May 2 06:39:52 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 2 May 2025 06:39:52 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: <_6MD1OrkbiBPcjVkKGXvlH4xGplX11i7L_FAYKXZls8=.1a8d7276-7eac-443c-aa74-a45a3ef65e17@github.com> Message-ID: On Wed, 30 Apr 2025 15:36:32 GMT, William Kemper wrote: > have you had a chance to retest after PR#24940 was integrated? Did not see the issue again after this (of course this is no 'proof' that they will never come back), so I would say looks good ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24888#issuecomment-2846479688 From iwalulya at openjdk.org Fri May 2 12:56:51 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 2 May 2025 12:56:51 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 09:46:17 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Albert review >> - Merge branch 'master' into 8355681-find_contiguous_allow_expand >> - init > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/24915#issuecomment-2847135344 From iwalulya at openjdk.org Fri May 2 12:56:52 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 2 May 2025 12:56:52 GMT Subject: Integrated: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 10:57:48 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. > > Testing: Tier 1-3 This pull request has now been integrated. Changeset: 995d5416 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/995d54161fed657f38753813f55d0591e77a42e3 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/24915 From ayang at openjdk.org Fri May 2 18:41:53 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 2 May 2025 18:41:53 GMT Subject: RFR: 8350621: Code cache stops scheduling GC In-Reply-To: References: Message-ID: On Sun, 16 Feb 2025 18:39:29 GMT, Alexandre Jacob wrote: > The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`. > > This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling. > > Unfortunately this can't work properly under certain circumstances. > For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)). > > I have observed this behavior on JVM in version 21 that were migrated recently from java 17. > Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen. > > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. > > In order to reproduce this issue, I found a very simple and convenient way: > > > public class CodeCacheMain { > public static void main(String[] args) throws InterruptedException { > while (true) { > Thread.sleep(100); > } > } > } > > > Run this simple app with the following JVM flags: > > > -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15 > > > - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC > - low `ReservedCodeCacheSize` to put pressure on code cache quickly > - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction > > Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will: > - allows us to monitor code cache > - indirectly generate activity on the code cache, just what we need to reproduce the bug > > Some logs related to code cache will show up at some point with GC activity: > > > [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory > > > And then it will stop and we'll end up with the following message: > > > [672.714s][info][codecache ] Code cache is full - disabling compilation > > > L... I have a question regarding the existing code/logic. // In case the GC is concurrent, we make sure only one thread requests the GC. if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { log_info(codecache)("Triggering aggressive GC due to having only %.3f%% free memory", free_ratio * 100.0); Universe::heap()->collect(GCCause::_codecache_GC_aggressive); } Why making sure only one thread calls `collect(...)`? I believe this API can be invoked concurrently. Would removing `_unloading_threshold_gc_requested` resolve this problem? > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. For ParallelGC, `ParallelScavengeHeap::collect` contains the following to ensure `System.gc` gccause and similar ones guarantee a full-gc. if (!GCCause::is_explicit_full_gc(cause)) { return; } However, the current logic that a young-gc can cancel a full-gc (`_codecache_GC_aggressive` in this case) also seems surprising. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2847860414 From aboldtch at openjdk.org Mon May 5 07:52:46 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 5 May 2025 07:52:46 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation [v2] In-Reply-To: References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: <_Mn9Z5l3XKaL0wmF0p2Zj4xzonQU1RDJt-AKhufIoaM=.2bdbcb5b-bee9-4b3e-822d-9ff177e4ac54@github.com> On Tue, 29 Apr 2025 23:58:36 GMT, Tom Rodriguez wrote: >> JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into tkr-zgc-deoptimize-allocation > - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24957#pullrequestreview-2814052474 From shade at openjdk.org Mon May 5 09:49:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 09:49:50 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Looking for more Reviewers, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2850471947 From iwalulya at openjdk.org Mon May 5 09:50:33 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 5 May 2025 09:50:33 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v3] In-Reply-To: References: Message-ID: > Hi, > > Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - use align_up_to_region_byte_size - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount - Thomas Review - nit - refactor full collection ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24944/files - new: https://git.openjdk.org/jdk/pull/24944/files/6ef77f71..4cde5315 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24944&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24944&range=01-02 Stats: 19999 lines in 573 files changed: 15129 ins; 2714 del; 2156 mod Patch: https://git.openjdk.org/jdk/pull/24944.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24944/head:pull/24944 PR: https://git.openjdk.org/jdk/pull/24944 From jsikstro at openjdk.org Mon May 5 09:50:55 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 5 May 2025 09:50:55 GMT Subject: RFR: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests Message-ID: Hello, There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. ------------- Commit messages: - Also rename ZTestEntryCompare - 8356083: ZGC: Duplicate ZTestEntry symbols in gtests Changes: https://git.openjdk.org/jdk/pull/25029/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25029&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356083 Stats: 52 lines in 1 file changed: 0 ins; 0 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/25029.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25029/head:pull/25029 PR: https://git.openjdk.org/jdk/pull/25029 From ayang at openjdk.org Mon May 5 10:39:54 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 5 May 2025 10:39:54 GMT Subject: RFR: 8356157: Remove retry loop in collect of SerialHeap and ParallelScavengeHeap Message-ID: Simple removing unnecessary retrying logic because an gc-operation will run-to-completion, guaranteeing the increment of corresponding counters. Test: tier1-3 ------------- Commit messages: - remove-systemgc-loop Changes: https://git.openjdk.org/jdk/pull/25032/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25032&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356157 Stats: 33 lines in 2 files changed: 0 ins; 26 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25032/head:pull/25032 PR: https://git.openjdk.org/jdk/pull/25032 From aboldtch at openjdk.org Mon May 5 11:21:46 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 5 May 2025 11:21:46 GMT Subject: RFR: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:43:50 GMT, Joel Sikstr?m wrote: > Hello, > > There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. > > To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. > > I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25029#pullrequestreview-2814492407 From tschatzl at openjdk.org Mon May 5 11:34:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 5 May 2025 11:34:52 GMT Subject: RFR: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:43:50 GMT, Joel Sikstr?m wrote: > Hello, > > There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. > > To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. > > I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25029#pullrequestreview-2814518370 From tschatzl at openjdk.org Mon May 5 11:34:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 5 May 2025 11:34:55 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v3] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:50:33 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - use align_up_to_region_byte_size > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - Thomas Review > - nit > - refactor full collection Maybe update that suggested comment (sorry, missed pointing that out earlier), but good. src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 484: > 482: // compacting collection, leaving no dead wood. > 483: // - if allocation_word_size is set, then this allocation size will > 484: // be accounted for in case shrinking of the heap happens. Suggestion: // - allocation_word_size is the size allocation that caused this collection. // To be considered when resizing the heap at the end of the full collection. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24944#pullrequestreview-2814512672 PR Review Comment: https://git.openjdk.org/jdk/pull/24944#discussion_r2073273737 From aboldtch at openjdk.org Mon May 5 12:18:45 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 5 May 2025 12:18:45 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 02:29:34 GMT, Quan Anh Mai wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > What I meant is that we should map a relocation to BOTH the instruction start and the patch site. APX has not even released yet so I think it is more efficient to make a better fix than to make a quicker one. I think @merykitty solution with two different relocations based on wether we support APX or not. And only emit the after and nop when `VM_Version::supports_apx_f()` is true. On the other hand maybe we can solve this with a minimal change by simply looking for the REX2 prefix when we patch the code. Something along the line of: diff --git a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp index 9cdf0b229c0..4a956b450bd 100644 --- a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp @@ -1328,7 +1328,13 @@ void ZBarrierSetAssembler::patch_barrier_relocation(address addr, int format) { const uint16_t value = patch_barrier_relocation_value(format); uint8_t* const patch_addr = (uint8_t*)addr + offset; if (format == ZBarrierRelocationFormatLoadGoodBeforeShl) { - *patch_addr = (uint8_t)value; + if (VM_Version::supports_apx_f()) { + NativeInstruction* instruction = nativeInstruction_at(addr); + uint8_t* const rex2_patch_addr = patch_addr + (instruction->has_rex2_prefix() ? 1 : 0); + *rex2_patch_addr = (uint8_t)value; + } else { + *patch_addr = (uint8_t)value; + } } else { *(uint16_t*)patch_addr = value; } As for the solution to have the relocation point at the entry. While they were not designed to be used this way, It looks like it works. (At least from a barrier patching point of view, as we only want to iterate over all relocations, never map a PC to an relocation). But changing invariants are scary. And is probably better to evaluate as a part of the [JDK-8355341](https://bugs.openjdk.org/browse/JDK-8355341) RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2850807205 From coleenp at openjdk.org Mon May 5 12:26:26 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 5 May 2025 12:26:26 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom Message-ID: Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. Tested with tier5-7 with vmTestbase tests that use this package. ------------- Commit messages: - 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom Changes: https://git.openjdk.org/jdk/pull/25034/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25034&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330022 Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25034/head:pull/25034 PR: https://git.openjdk.org/jdk/pull/25034 From eosterlund at openjdk.org Mon May 5 13:14:53 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 5 May 2025 13:14:53 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation [v2] In-Reply-To: References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: On Tue, 29 Apr 2025 23:58:36 GMT, Tom Rodriguez wrote: >> JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into tkr-zgc-deoptimize-allocation > - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Good stuff. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24957#pullrequestreview-2814769629 From iwalulya at openjdk.org Mon May 5 14:02:54 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 5 May 2025 14:02:54 GMT Subject: RFR: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms Message-ID: Hi, Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). Testing: gha, manual testing as below: Mainline: [3.740s][info ][gc,init ] Heap Min Capacity: 150G [3.740s][info ][gc,init ] Heap Initial Capacity: 150G [3.740s][info ][gc,init ] Heap Max Capacity: 150G . . [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B . . [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms With patch (No shrinking when -Xms == -Xms): [3.753s][info ][gc,init ] Heap Min Capacity: 150G [3.753s][info ][gc,init ] Heap Initial Capacity: 150G [3.753s][info ][gc,init ] Heap Max Capacity: 150G . . [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms With patch (Shrinking when -Xms != -Xms): [3.755s][info ][gc,init ] Heap Min Capacity: 153568M [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M [3.755s][info ][gc,init ] Heap Max Capacity: 150G . . [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) . . [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms ------------- Commit messages: - init Changes: https://git.openjdk.org/jdk/pull/25036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308854 Stats: 16 lines in 1 file changed: 11 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25036.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25036/head:pull/25036 PR: https://git.openjdk.org/jdk/pull/25036 From duke at openjdk.org Mon May 5 16:03:59 2025 From: duke at openjdk.org (duke) Date: Mon, 5 May 2025 16:03:59 GMT Subject: Withdrawn: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 09:59:44 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands, which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::has_initial_implicit_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (measured in percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo chopin benchmarks: > > ![C2-inc-hit-rate-jdk-25+1-vs-jdk-25+1-with-8345067](https://github.com/user-attachments/assets/8d114058-c6b2-4254-a374-0d0b220af718) > > The larger number of implicit null checks results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > A further extension of the optimization to arbitrary memory access instructions (including e.g. G1 object stores, which emit multiple memory accesses at arbitrary address offsets) will be investigated separately as part of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627). > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22678 From kvn at openjdk.org Mon May 5 16:14:49 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 16:14:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: <1TLtkRe2ydHcPB5lnREFbmF4hlQ4rOBHyNXbplFujM0=.427f9764-dda9-41e4-a228-95f47426cf25@github.com> On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Looks fine to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2815333573 From shade at openjdk.org Mon May 5 16:55:49 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 16:55:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Thank you! I'll wait a bit if @kimbarrett is able to confirm this matches the idea he had back in JBS comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2851636080 From never at openjdk.org Mon May 5 17:28:54 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 5 May 2025 17:28:54 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation [v2] In-Reply-To: References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: On Tue, 29 Apr 2025 23:58:36 GMT, Tom Rodriguez wrote: >> JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into tkr-zgc-deoptimize-allocation > - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24957#issuecomment-2851729694 From never at openjdk.org Mon May 5 17:28:54 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 5 May 2025 17:28:54 GMT Subject: Integrated: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation In-Reply-To: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: On Tue, 29 Apr 2025 23:46:51 GMT, Tom Rodriguez wrote: > JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. This pull request has now been integrated. Changeset: cc34135f Author: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/cc34135fff7650ad44c910dca0fd47e9cbd56b68 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Reviewed-by: aboldtch, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24957 From tschatzl at openjdk.org Tue May 6 08:17:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 6 May 2025 08:17:20 GMT Subject: RFR: 8356157: Remove retry loop in collect of SerialHeap and ParallelScavengeHeap In-Reply-To: References: Message-ID: <9hvwdTVHqbOVeikfixaFuzpbVpRW4Lxc0rdVbTBb7yE=.4bc00f1b-933d-4920-946e-93e3d06411a4@github.com> On Mon, 5 May 2025 10:36:11 GMT, Albert Mingkun Yang wrote: > Simple removing unnecessary retrying logic because an gc-operation will run-to-completion, guaranteeing the increment of corresponding counters. > > Test: tier1-3 lgtm ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25032#pullrequestreview-2817313107 From tschatzl at openjdk.org Tue May 6 08:19:17 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 6 May 2025 08:19:17 GMT Subject: RFR: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:29:02 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. > > This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). > > Testing: gha, manual testing as below: > > Mainline: > > > [3.740s][info ][gc,init ] Heap Min Capacity: 150G > [3.740s][info ][gc,init ] Heap Initial Capacity: 150G > [3.740s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B > . > . > [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms > > With patch (No shrinking when -Xms == -Xms): > > > [3.753s][info ][gc,init ] Heap Min Capacity: 150G > [3.753s][info ][gc,init ] Heap Initial Capacity: 150G > [3.753s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms > > With patch (Shrinking when -Xms != -Xms): > > > [3.755s][info ][gc,init ] Heap Min Capacity: 153568M > [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M > [3.755s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) > . > . > [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25036#pullrequestreview-2817316832 From tschatzl at openjdk.org Tue May 6 08:21:17 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 6 May 2025 08:21:17 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25034#pullrequestreview-2817322299 From sjohanss at openjdk.org Tue May 6 08:58:31 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 6 May 2025 08:58:31 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v2] In-Reply-To: References: Message-ID: <-65uM_iyhoKOhBmmcPjSJHkeab7WGzclKGNkfWGcl_c=.b17052b2-4212-44ad-b302-1eb52293bc49@github.com> > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Default TLAB size of 8k, avoid 0 updates and reasonable starting values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/0c1f6eed..76c79f5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=00-01 Stats: 17 lines in 4 files changed: 13 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From jsikstro at openjdk.org Tue May 6 09:53:21 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 6 May 2025 09:53:21 GMT Subject: RFR: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:43:50 GMT, Joel Sikstr?m wrote: > Hello, > > There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. > > To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. > > I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25029#issuecomment-2853936758 From jsikstro at openjdk.org Tue May 6 09:53:21 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 6 May 2025 09:53:21 GMT Subject: Integrated: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:43:50 GMT, Joel Sikstr?m wrote: > Hello, > > There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. > > To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. > > I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. This pull request has now been integrated. Changeset: ecfaf354 Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/ecfaf354d761bc7034ea8783f4428157ea450207 Stats: 52 lines in 1 file changed: 0 ins; 0 del; 52 mod 8356083: ZGC: Duplicate ZTestEntry symbols in gtests Reviewed-by: aboldtch, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/25029 From jbhateja at openjdk.org Tue May 6 09:55:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 09:55:21 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > Member Hi @xmas92, Your suggestion looks good to me for this bugfix. I think we can improve upon the existing implementation as part of JDK-8355341 since its a bigger change and also include graal byein. There is still a possibility of incorrect relocation sharing with subsequent relocatable instructions in other cases, e.g. OR instruction for which we bookkeep the relocation address from the end of the instruction, and it's the last instruction in the pointer coloring primitive. For this bug fix, your suggestion looks fine to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2853945841 From jbhateja at openjdk.org Tue May 6 10:21:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 10:21:54 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24919/files - new: https://git.openjdk.org/jdk/pull/24919/files/1f9c84c8..fc3b61e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24919&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24919&range=00-01 Stats: 25 lines in 4 files changed: 11 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24919/head:pull/24919 PR: https://git.openjdk.org/jdk/pull/24919 From rcastanedalo at openjdk.org Tue May 6 16:42:31 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 16:42:31 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads Message-ID: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. #### Testing - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). ------------- Commit messages: - Format - Remove extra line - Further clarify zLoadP candidate predicate and no-preceding-lea assertion - Rename machine node property to ins_is_late_expanded_null_check_candidate for clarity, and make it a total function - Update copyright year - Revert unnecessary changes - Move check to original location - Enable zLoadP as implicit null check candidates on riscv and ppc - Refactor assertion - Simplify test - ... and 15 more: https://git.openjdk.org/jdk/compare/e2ae50d8...dc5aa4fc Changes: https://git.openjdk.org/jdk/pull/25066/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345067 Stats: 385 lines in 15 files changed: 338 ins; 37 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From matsaave at openjdk.org Tue May 6 18:05:17 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 6 May 2025 18:05:17 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. LGTM! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25034#pullrequestreview-2819191469 From kvn at openjdk.org Tue May 6 18:10:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 May 2025 18:10:20 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Why the attribute is not set for `zLoadP` on x64? ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2819201282 From rcastanedalo at openjdk.org Tue May 6 19:00:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 19:00:18 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 18:07:17 GMT, Vladimir Kozlov wrote: > Why the attribute is not set for `zLoadP` on x64? `ins_is_late_expanded_null_check_candidate` is set to `true` for `zLoadP` in [src/hotspot/cpu/x86/gc/z/z_x86_64.ad (line 121)](https://github.com/openjdk/jdk/pull/25066/files#diff-183d5784f9317f5582b267d82e7afa4e23ae137671fab8ba9cb5b502dae52b3dR121), or did I misunderstand your question? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2855603683 From coleenp at openjdk.org Tue May 6 19:04:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 6 May 2025 19:04:19 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. Thanks for reviewing, Thomas and Matias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25034#issuecomment-2855609574 From coleenp at openjdk.org Tue May 6 19:04:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 6 May 2025 19:04:19 GMT Subject: Integrated: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. This pull request has now been integrated. Changeset: 4977588d Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/4977588d5e3424282f40209590737a487747095d Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom Co-authored-by: David Leopoldseder Reviewed-by: tschatzl, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/25034 From aboldtch at openjdk.org Wed May 7 06:15:14 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 7 May 2025 06:15:14 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding As I cannot test this on APX enabled hardware, I will leave the testing and verifying that this approach works up to you. But the change looks good, and it maintains the original behaviour for none APX enabled hardware. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24919#pullrequestreview-2820461864 From jbhateja at openjdk.org Wed May 7 06:19:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 May 2025 06:19:17 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Hi @TobiHartmann , @eme64 , can you kindly run this version through your test infra. This is an APX-specific issue. I have verified its correctness using SDE, both following tests are now passing. https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/compiler/c2/irTests/gc ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2857197887 From thartmann at openjdk.org Wed May 7 07:48:16 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 May 2025 07:48:16 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: <1gGtDEUALoWyrLQwwRD9bo2wb55O5Lh2DTnWTXQ8Oe8=.45ef5737-2ea6-4179-a998-79d8d51aca13@github.com> On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Sure, I'll run it through testing and report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2857462391 From sjohanss at openjdk.org Wed May 7 10:41:57 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 7 May 2025 10:41:57 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v3] In-Reply-To: References: Message-ID: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with three additional commits since the last revision: - Problemlist heap sampling test - Keep all TLAB tracking in TLABUsage - Revert initial value for TLABUsage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/76c79f5c..f361fc5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=01-02 Stats: 97 lines in 9 files changed: 33 ins; 37 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From coleenp at openjdk.org Wed May 7 20:33:56 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 7 May 2025 20:33:56 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops This is a cleaner way to do this. I believe it's what we discussed with Kim. He can confirm. Some questions and comments and a small nit. src/hotspot/share/compiler/compileBroker.cpp line 1697: > 1695: JavaThread* thread = JavaThread::current(); > 1696: > 1697: methodHandle method(thread, task->method()); I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 35: > 33: #include "oops/weakHandle.inline.hpp" > 34: > 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { This should initialize method in the ctor initializer list. src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 51: > 49: // Method holder class cannot be unloaded. > 50: return nullptr; > 51: } This is nice that this doesn't require creating a jni handle for unloadable class loaders with this change. src/hotspot/share/runtime/vmStructs.cpp line 1266: > 1264: declare_toplevel_type(CDSFileMapRegion) \ > 1265: declare_toplevel_type(UpcallStub::FrameData) \ > 1266: declare_toplevel_type(UnloadableMethodHandle) \ So are these left for the async profiler? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2823027214 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078430169 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078443576 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078379288 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078446115 From aboldtch at openjdk.org Thu May 8 05:25:01 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 8 May 2025 05:25:01 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree Message-ID: [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. ------------- Commit messages: - 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree Changes: https://git.openjdk.org/jdk/pull/25112/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356455 Stats: 2158 lines in 5 files changed: 97 ins; 2026 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/25112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25112/head:pull/25112 PR: https://git.openjdk.org/jdk/pull/25112 From eosterlund at openjdk.org Thu May 8 07:55:50 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 8 May 2025 07:55:50 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree In-Reply-To: References: Message-ID: On Thu, 8 May 2025 05:21:20 GMT, Axel Boldt-Christmas wrote: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2824168024 From jsikstro at openjdk.org Thu May 8 09:14:52 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 8 May 2025 09:14:52 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree In-Reply-To: References: Message-ID: On Thu, 8 May 2025 05:21:20 GMT, Axel Boldt-Christmas wrote: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2824384128 From sjohanss at openjdk.org Thu May 8 10:06:41 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 8 May 2025 10:06:41 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v4] In-Reply-To: References: Message-ID: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Handle inc and dec in alloc/undo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/f361fc5d..ba7cb673 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=02-03 Stats: 60 lines in 6 files changed: 37 ins; 14 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From epeter at openjdk.org Thu May 8 11:29:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:01 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). @roberto Thanks a lot for taking the time to explain how implicit null checks work, and giving me some background for the PR :) Below, I have mostly code style / naming suggestions, that you are welcome to use as inspiration. But you do not have to apply any of them, it is totally up to you :) I'm definitely not an expert here, but your approach seems reasonable to me. The opt-in annotation `ins_is_late_expanded_null_check_candidate` makes sure we only do the optimization when we are sure it is ok. It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations. In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. But I suppose that cannot happen, because the GC only moves the pointer, so if the old pointer was non-null, the new pointer must be non-null as well. Maybe that was so trivial that you did not even understand my question there ? But it could be helpful to write that down somewhere, just to make sure people are aware of this. I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer is still valid in the barrier, and if not, we somehow directly translate the old pointer to a new pointer? Is that what the oop map is used for? src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 130: > 128: Address::offset_ok_for_immed(ref_addr.offset(), exact_log2(size)), > 129: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); > 130: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); For context: 132 /* Sometimes we get misaligned loads and stores, usually from Unsafe 133 accesses, and these can exceed the offset range. */ 134 Address legitimize_address(const Address &a, int size, Register scratch) { 135 if (a.getMode() == Address::base_plus_offset) { 136 if (! Address::offset_ok_for_immed(a.offset(), exact_log2(size))) { 137 block_comment("legitimize_address {"); 138 lea(scratch, a); 139 block_comment("} legitimize_address"); 140 return Address(scratch); 141 } 142 } 143 return a; 144 } I wonder if it might be worth to create a `legitimize_address_requires_lea` that does the checks. Then you could refactor `legitimize_address` with it, and also use it here. Not sure if it is worth it, but it could ensure that the checks stay in sync. Up to you. src/hotspot/share/opto/block.hpp line 468: > 466: > 467: // If necessary, hoist orphan node n into the end of block b. > 468: void maybe_hoist_into(Node* n, Block* b); Hmm. It is "if necessary" or "if possible"? I wonder if we could come up with a name that is a little longer and expresses this condition? src/hotspot/share/opto/lcm.cpp line 79: > 77: } > 78: > 79: void PhaseCFG::move_into(Node* n, Block* b) { Suggestion: void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { src/hotspot/share/opto/lcm.cpp line 89: > 87: if (!out->is_MachProj()) { > 88: continue; > 89: } What about the `MachTemp`? Also: how specific to implicit null checks are your methods `move_into` and `maybe_hoist_into`? If they are not reusable elsewhere, it may be good to give them a more specific name. src/hotspot/share/opto/lcm.cpp line 105: > 103: "need for recursive hoisting not expected"); > 104: move_into(n, b); > 105: } Do I understand this right: You are looking at some input `n` here, and want to make sure that it is located at `b` or before? Suggestion to make it a bit more clear: Suggestion: // We want to ensure that n happens at b or before, i.e. at a block that dominates b. void PhaseCFG::ensure_node_is_at_block_or_before(Node* n, Block* b) { Block* current = get_block_for_node(n); if (current->dominates(b)) { return; // n already happens before b, do nothing. } // We only expect nodes without further inputs, like MachTemp or load Base. assert(n->req() == 0 || (n->req() == 1 && n->in(0) == (Node*)C->root()), "need for recursive hoisting not expected"); assert(b->dominates(current), "precondition: can only move n to b if b dominates n"); move_node_and_its_projections_to_block(n, b); } I did not understand what this meant: `sanity check: temp node placement`... Ah, I suppose we are assuming that `n` is a `MachTemp`, and this would have to be placed in a block dominated by b? But could `n` not also be a `load Base`? Could that be a `MachProj`? Just a little confused here. Maybe moving the `b->dominates(current)` assert down helps give good context? But in a sense, it is also a precondition, we can only move `n` up to `b` if `b` dominates `n`... Do you have a better idea? src/hotspot/share/opto/lcm.cpp line 356: > 354: if (mach->in(j)->is_MachTemp()) { > 355: assert(mach->in(j)->outcnt() == 1, "MachTemp nodes should not be shared"); > 356: // Ignore MachTemp inputs, they can be safely hoisted with the candidate. Suggestion: // Ignore MachTemp inputs, they can be safely hoisted with the candidate. // MachTemp have no inputs themselves and are only there to reserve a scratch // register for the GC barrier of the memory operation. That was what you told me in our offline meeting, I thought it was helpful context information. src/hotspot/share/opto/lcm.cpp line 428: > 426: maybe_hoist_into(val->in(i), block); > 427: } > 428: move_into(val, block); Suggestion: // Inputs of val may already be early enough, but if not move them together with val. ensure_node_is_at_block_or_before(val->in(i), block); } move_node_and_its_projections_to_block(val, block); src/hotspot/share/opto/lcm.cpp line 437: > 435: if (n == nullptr || !n->is_MachTemp()) { > 436: continue; > 437: } Do you want to check that all other nodes already dominate `block`? src/hotspot/share/opto/lcm.cpp line 439: > 437: } > 438: maybe_hoist_into(n, block); > 439: } It seems to me this is definitely new code, ensuring that we move the `MachTemp`. We did not do that before, at least not here. Correct? src/hotspot/share/opto/lcm.cpp line 441: > 439: map_node_to_block(n, block); > 440: } > 441: } This now happens in `move_into`, right? src/hotspot/share/opto/machnode.hpp line 391: > 389: > 390: // Whether this node is expanded during code emission into a sequence of > 391: // instructions and the first instruction can perform an implicit null check. You may want to put a warning / reasoning here, in case there are multiple loads. You explained to me offline that a `zLoadP` may have a load at the beginning, but then need to load again if the GC moved the object. I suppose if it was moved, then it cannot be null, and so that should be safe... maybe that is a sufficient argument, what do you think? test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 51: > 49: * @requires vm.gc.Z > 50: * @run driver compiler.gcbarriers.TestImplicitNullChecks Z > 51: */ Do you think there would be any value in having a run without requirements? Just for general result verification... i.e. that we get the correct NullPointerException. Of course, you would have to probably add `applyIf` to the `@IR` rules. test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 119: > 117: testLoad(o); > 118: } catch (NullPointerException e) { nullPointerException = true; } > 119: Asserts.assertTrue(nullPointerException); Suggestion: try { testLoad(o); throw new RuntimeException("Should have thrown NullPointerException"); } catch (NullPointerException e) { /* expected */} Could be a shorter alternative. Up to you. Maybe there is a benefit to `Asserts.assertTrue` I am also not aware of? But totally optional, as your approach works anyway :) test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 140: > 138: // G1 and ZGC stores cannot be currently used to implement implicit null > 139: // checks, because they expand into multiple memory access instructions that > 140: // are not necessarily located at the initial instruction start address. Very random idea, no idea if it is any good: Why not do the implicit null-check with a fake Load? No idea on the implications here. I suppose it would be extra code, but at least not branching code? ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2824535603 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079357655 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079437197 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079476518 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079430920 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079473986 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079420601 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079480978 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079486097 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079509053 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079488019 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079491319 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079493683 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079500275 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079505342 From epeter at openjdk.org Thu May 8 11:29:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 10:21:14 GMT, Emanuel Peter wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > src/hotspot/share/opto/block.hpp line 468: > >> 466: >> 467: // If necessary, hoist orphan node n into the end of block b. >> 468: void maybe_hoist_into(Node* n, Block* b); > > Hmm. It is "if necessary" or "if possible"? > I wonder if we could come up with a name that is a little longer and expresses this condition? Ah no, I'm starting to understand that it is rather a `if necessary`... > src/hotspot/share/opto/lcm.cpp line 428: > >> 426: maybe_hoist_into(val->in(i), block); >> 427: } >> 428: move_into(val, block); > > Suggestion: > > // Inputs of val may already be early enough, but if not move them together with val. > ensure_node_is_at_block_or_before(val->in(i), block); > } > move_node_and_its_projections_to_block(val, block); It's a little hard to see here: did you just refactor this code, or make any changes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079450181 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079507708 From epeter at openjdk.org Thu May 8 11:29:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 10:29:17 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/block.hpp line 468: >> >>> 466: >>> 467: // If necessary, hoist orphan node n into the end of block b. >>> 468: void maybe_hoist_into(Node* n, Block* b); >> >> Hmm. It is "if necessary" or "if possible"? >> I wonder if we could come up with a name that is a little longer and expresses this condition? > > Ah no, I'm starting to understand that it is rather a `if necessary`... See further comments at `maybe_hoist_into` and my suggestions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079512983 From thartmann at openjdk.org Thu May 8 12:17:57 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 12:17:57 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: <7XtX737NV9bjyQWKxZK0rjNzQ1ye2IpbsuWTtI8Rh1s=.7e6bb289-50a1-45e2-906a-44348848a281@github.com> On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2862849381 From shade at openjdk.org Thu May 8 12:39:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:39:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 20:30:00 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/runtime/vmStructs.cpp line 1266: > >> 1264: declare_toplevel_type(CDSFileMapRegion) \ >> 1265: declare_toplevel_type(UpcallStub::FrameData) \ >> 1266: declare_toplevel_type(UnloadableMethodHandle) \ > > So are these left for the async profiler? Yes, see https://github.com/async-profiler/async-profiler/issues/1260 that is filed already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079634276 From shade at openjdk.org Thu May 8 12:42:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:42:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 20:28:10 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 35: > >> 33: #include "oops/weakHandle.inline.hpp" >> 34: >> 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { > > This should initialize method in the ctor initializer list. Maybe, but the field is not `const`, so there seem to be no point? We also assign after assert checks `method` for us. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079637960 From shade at openjdk.org Thu May 8 12:50:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:50:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 19:54:04 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 51: > >> 49: // Method holder class cannot be unloaded. >> 50: return nullptr; >> 51: } > > This is nice that this doesn't require creating a jni handle for unloadable class loaders with this change. Right? Wasteful to even go through all this dance for compiling JDK methods :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079651140 From aboldtch at openjdk.org Thu May 8 13:01:07 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 8 May 2025 13:01:07 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v2] In-Reply-To: References: Message-ID: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: - Use private inheritance - Separate tree logic to own class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25112/files - new: https://git.openjdk.org/jdk/pull/25112/files/4bc5cf09..3c3e22bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=00-01 Stats: 253 lines in 2 files changed: 122 ins; 93 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/25112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25112/head:pull/25112 PR: https://git.openjdk.org/jdk/pull/25112 From aboldtch at openjdk.org Thu May 8 13:03:53 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 8 May 2025 13:03:53 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree In-Reply-To: References: Message-ID: On Thu, 8 May 2025 05:21:20 GMT, Axel Boldt-Christmas wrote: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. @stefank had some comments about having to much logic inlined. So abstracted the extra tree logic into its own inner class. Currently re-running tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25112#issuecomment-2862969347 From shade at openjdk.org Thu May 8 14:33:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 14:33:02 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> On Wed, 7 May 2025 20:18:29 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/compiler/compileBroker.cpp line 1697: > >> 1695: JavaThread* thread = JavaThread::current(); >> 1696: >> 1697: methodHandle method(thread, task->method()); > > I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. Ah, that reminds me, thanks. I removed this because I caught method to be in unsafe (unloaded) state, so `method()` asserted on me. `compiler/c1/TestConcurrentPatching.java` seems to intermittently crash on it. On this code path, I think we might be plausibly waiting on unloaded compile task, and we "only" wait for notification that task got purged from the queue. Handelizing broken `Method*` is awkward, to say the least! Then again, I am not sure if removing this handle is safe enough. So out of abundance of caution, we can actually handelize `Method*` after checking for task status. But now that I do this: methodHandle method(thread, task->is_unloaded() ? nullptr : task->method()); ...the test still fails on the same assert! Which makes no sense to me, as we are supposed to be guarded by `is_unloaded` check before it. Something is off, I'll investigate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079838894 From kvn at openjdk.org Thu May 8 15:14:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 15:14:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <40ZOuLCtxa6ytFKxGHY5mHY_SI_e1AxrXSUrpmNB9Lk=.17f141ca-5b1e-4ead-8416-86f5b7382598@github.com> On Tue, 6 May 2025 18:57:14 GMT, Roberto Casta?eda Lozano wrote: > > Why the attribute is not set for `zLoadP` on x64? > > `ins_is_late_expanded_null_check_candidate` is set to `true` for `zLoadP` in [src/hotspot/cpu/x86/gc/z/z_x86_64.ad (line 121)](https://github.com/openjdk/jdk/pull/25066/files#diff-183d5784f9317f5582b267d82e7afa4e23ae137671fab8ba9cb5b502dae52b3dR121), or did I misunderstand your question? Somehow I missed this change. Good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2863416833 From kvn at openjdk.org Thu May 8 15:24:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 15:24:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). src/hotspot/share/opto/lcm.cpp line 95: > 93: } > 94: > 95: void PhaseCFG::maybe_hoist_into(Node* n, Block* b) { Consider adding asserts into these 2 new methods to make sure that they operate only on **data** and not control nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079942627 From stefank at openjdk.org Thu May 8 16:06:04 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 8 May 2025 16:06:04 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v4] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 10:06:41 GMT, Stefan Johansson wrote: >> Please review this change to improve TLAB handling in ZGC. >> >> **Summary** >> In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. >> >> The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: >> >> bool update_allocation_history = used > 0.5 * capacity; >> ``` >> >> So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. >> >> Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. >> >> How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. >> >> This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. >> >> **Testing** >> * Functional testin... > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Handle inc and dec in alloc/undo I like this change. I've added a few comments below. src/hotspot/share/gc/z/zTLABUsage.cpp line 32: > 30: _used_history() {} > 31: > 32: Suggestion: src/hotspot/share/gc/z/zTLABUsage.cpp line 39: > 37: void ZTLABUsage::decrease_used(size_t size) { > 38: precond(size <= _used); > 39: Atomic::sub(&_used, size, memory_order_relaxed); Suggestion: precond(size <= _used); Atomic::sub(&_used, size, memory_order_relaxed); src/hotspot/share/gc/z/zTLABUsage.cpp line 43: > 41: > 42: void ZTLABUsage::reset() { > 43: const size_t current_used = Atomic::xchg(&_used, (size_t) 0); Does this work instead? Suggestion: const size_t current_used = Atomic::xchg(&_used, 0u); src/hotspot/share/gc/z/zTLABUsage.cpp line 51: > 49: > 50: // Save the old values for logging > 51: const size_t old_used = used(); It's not immediately obvious what `_used` is compared to `used()` Could one of these be renamed so that readers don't mistakenly assume that `used()` returns `_used`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24814#pullrequestreview-2825630207 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080009139 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080009572 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080010741 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080017958 From sviswanathan at openjdk.org Thu May 8 22:20:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 8 May 2025 22:20:52 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24919#pullrequestreview-2826479403 From Monica.Beckwith at microsoft.com Thu May 8 22:47:25 2025 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Thu, 8 May 2025 22:47:25 +0000 Subject: G1 AHS + Request for Feedback and Testing on G1 Heap Resizing Prototype Message-ID: Hi all, Thanks to everyone for the ongoing AHS discussions across 8236073, 8238686/87, and umbrella JDK-8353716. >From the Microsoft side, we have been reviewing logs from a range of prod-like use cases across the broader MSFT environment, including first-party Java services (both Azure-hosted and non-Azure), as well as OSS-based deployments (Cassandra, Kafka, etc). We've also been benchmarking with various combinations (ReservePercent, GCTimeRatio, periodic GC, etc) and exploring early models to help gauge expected shrink/grow behavior under service conditions. These observations have shaped our perspective and contributions to upstream design discussions. Here's?where we currently stand: ------------------------------------------------------------------------ 1. ?SoftMaxHeapSize semantics and placement ------------------------------------------------------------------------ We continue to support the current SoftMax proposal as a **soft upper bound** on heap usage?one that the GC controller respects, but may temporarily exceed if necessary. Our analysis of logs shows that an effective SoftMax, even when static, would help reduce RSS under light traffic without requiring aggressive full GCs. We also plan to evaluate the controller changes under PR #24211 once they?re merged, and we?d like to keep the option of a `jcmd GC.set_soft_max` interface, consistent with ZGC and future container signals (e.g. memory.high). ------------------------------------------------------------------------ 2. ?GCTimeRatio as a feedback driver ------------------------------------------------------------------------ We support the move to a higher default value for `GCTimeRatio` as it aligns well with throughput goals in our measured workloads, including SPECjbb2015, DBs, and Spring-based services. We plan to continue stepped testing across representative service patterns. ?We'd also support exposing an alias like `-XX:GCCPUPercent` to improve ergonomics for operators.? ------------------------------------------------------------------------ 3. ?Reserve floor and shrink control ------------------------------------------------------------------------ We strongly recommend retaining `G1ReservePercent` as a configurable minimum, particularly in low-latency scenarios or when allocation bursts are expected immediately after idle phases. We?d also be open to exploring future adaptive variants of the reserve floor as the AHS loop matures. ------------------------------------------------------------------------ 4. ?Periodic GC fallback and field heuristics ------------------------------------------------------------------------ Until AHS-driven shrink behavior is well understood and widely adopted, we recommend retaining a periodic GC safety net?especially for services with extended idle phases. As AHS matures, we?ll continue to evaluate whether this fallback remains necessary in production. ------------------------------------------------------------------------ 5. ?Role of externally-supplied limits ------------------------------------------------------------------------ Internally, we?ve discussed how AHS should behave in managed container environments such as AKS. In most cases we expect the JVM to operate within cgroup-defined memory.max and possibly memory.high bounds. We don?t?currently envision supporting non-cgroup (custom/embedded) environments on day one. We also believe that memory.high or RSS-based constraints could eventually serve as complementary signals for guiding heap elasticity, especially for AKS customers. These use cases are still exploratory, but we hope they can be accommodated within the direction of AHS without adding undue complexity to the core loop. ------------------------------------------------------------------------ 6. ?Design notes and alignment ------------------------------------------------------------------------ For reference, our current AHS evaluation and alignment write-up (including control flow diagrams and tuning strategy) is here: ? ? https://github.com/microsoft/openjdk-workstreams/tree/main/G1-AHS We?ll?continue to update that as PRs land and more data becomes available. We welcome any feedback on the write-up or our alignment approach and would be happy to incorporate community input via PRs. We are also open to hosting the write-up within an OpenJDK project repo if that's deemed appropriate. Thanks again to everyone driving this effort forward?happy to continue refining as the pieces come together. Best regards, ? Monica From jbhateja at openjdk.org Fri May 9 05:31:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 05:31:57 GMT Subject: Integrated: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. This pull request has now been integrated. Changeset: 53ad4b2a Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/53ad4b2ad2664e5056c113543dfaa26647d6ce26 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Co-authored-by: Axel Boldt-Christmas Reviewed-by: aboldtch, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/24919 From sjohanss at openjdk.org Fri May 9 06:07:53 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 May 2025 06:07:53 GMT Subject: RFR: 8350596: [Linux] Increase default MaxRAMPercentage for containerized workloads In-Reply-To: References: Message-ID: On Wed, 7 May 2025 09:29:16 GMT, Severin Gehwolf wrote: > Please take a look at this proposal to fix the "Java needs so much memory" perception in containers. The idea would be to bump the default `MaxRAMPercentage` to a higher value. The patch proposes 75%, but we could just as well use 50% if people feel more comfortable about it. Right now the default deployment in containers with resource limits in place (common for Kubernetes deployments) where a single process runs in the container isn't well catered for today for an application that just uses the default configuration. Only 25% of the container memory will be used for the Java heap, arguably wasting much of the remaining memory that has been granted to the container by a memory limit (that the JVM would detect and use as physical memory). > > I've filed a CSR for this as well for which I'm looking for reviewers too and intend to write a release note as well about this change as it has some risk associated with it, although the escape hatch is pretty simple: set `-XX:MaxRAMPercentage=25.0` to go back to the old behavour. > > Testing: > - [x] GHA - tier 1 (windows failures seem infra related) > - [x] hotspot and jdk container tests on cg v2 and cg v1 including the two new tests. > > Thoughts? Opinions? Thanks for looking into this Severin. Thinking back to the discussions we had around this at OCW, I remember there were some concerns regarding different types of deployments. I think this really makes sense in the cases where we divide a machines memory using containers, but what if containers are just used to divide other resources. One use-case that was raised was containerized applications on Linux. I'm not sure if such an application would report true for `is_containerized()`, but it would be nice to have some data around this. Have you done any testing with containerized apps? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25086#issuecomment-2865246427 From sjohanss at openjdk.org Fri May 9 07:52:53 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 May 2025 07:52:53 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v4] In-Reply-To: References: Message-ID: <2uwu7EoW1H6F6v0FlZsop7jiQhePYWnXNzePf_4pQBc=.52f2dde4-dadc-4b07-af0b-8fd52f0765f0@github.com> On Thu, 8 May 2025 15:57:19 GMT, Stefan Karlsson wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Handle inc and dec in alloc/undo > > src/hotspot/share/gc/z/zTLABUsage.cpp line 43: > >> 41: >> 42: void ZTLABUsage::reset() { >> 43: const size_t current_used = Atomic::xchg(&_used, (size_t) 0); > > Does this work instead? > Suggestion: > > const size_t current_used = Atomic::xchg(&_used, 0u); No, `0ul` works on Linux, but Windows fails with that. > src/hotspot/share/gc/z/zTLABUsage.cpp line 51: > >> 49: >> 50: // Save the old values for logging >> 51: const size_t old_used = used(); > > It's not immediately obvious what `_used` is compared to `used()` Could one of these be renamed so that readers don't mistakenly assume that `used()` returns `_used`. Talked a bit about this offline, will add some comments and rename `used()` and `capacity()` to `tlab_used()` and `tlab_capacity()` to make it a bit more clear that they are not directly connected and also better match the `ZHeap` interface. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2081127733 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2081130690 From sjohanss at openjdk.org Fri May 9 08:17:13 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 May 2025 08:17:13 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v5] In-Reply-To: References: Message-ID: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: StefanK review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/ba7cb673..2f5742fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=03-04 Stats: 22 lines in 3 files changed: 4 ins; 1 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From stefank at openjdk.org Fri May 9 08:29:53 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 9 May 2025 08:29:53 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v5] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 08:17:13 GMT, Stefan Johansson wrote: >> Please review this change to improve TLAB handling in ZGC. >> >> **Summary** >> In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. >> >> The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: >> >> bool update_allocation_history = used > 0.5 * capacity; >> ``` >> >> So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. >> >> Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. >> >> How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. >> >> This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. >> >> **Testing** >> * Functional testin... > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > StefanK review Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24814#pullrequestreview-2827450137 From aboldtch at openjdk.org Fri May 9 09:31:59 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 9 May 2025 09:31:59 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v5] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 08:17:13 GMT, Stefan Johansson wrote: >> Please review this change to improve TLAB handling in ZGC. >> >> **Summary** >> In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. >> >> The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: >> >> bool update_allocation_history = used > 0.5 * capacity; >> ``` >> >> So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. >> >> Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. >> >> How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. >> >> This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. >> >> **Testing** >> * Functional testin... > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > StefanK review lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24814#pullrequestreview-2827676721 From sgehwolf at openjdk.org Fri May 9 10:06:50 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 9 May 2025 10:06:50 GMT Subject: RFR: 8350596: [Linux] Increase default MaxRAMPercentage for containerized workloads In-Reply-To: References: Message-ID: <3zkVeUEqr_avGG1v8Q0Dp_0_FiZrXLxHJeU4KQ556sg=.77fbebb1-8bcf-40bb-95c0-664120321cbf@github.com> On Wed, 7 May 2025 09:29:16 GMT, Severin Gehwolf wrote: > Please take a look at this proposal to fix the "Java needs so much memory" perception in containers. The idea would be to bump the default `MaxRAMPercentage` to a higher value. The patch proposes 75%, but we could just as well use 50% if people feel more comfortable about it. Right now the default deployment in containers with resource limits in place (common for Kubernetes deployments) where a single process runs in the container isn't well catered for today for an application that just uses the default configuration. Only 25% of the container memory will be used for the Java heap, arguably wasting much of the remaining memory that has been granted to the container by a memory limit (that the JVM would detect and use as physical memory). > > I've filed a CSR for this as well for which I'm looking for reviewers too and intend to write a release note as well about this change as it has some risk associated with it, although the escape hatch is pretty simple: set `-XX:MaxRAMPercentage=25.0` to go back to the old behavour. > > Testing: > - [x] GHA - tier 1 (windows failures seem infra related) > - [x] hotspot and jdk container tests on cg v2 and cg v1 including the two new tests. > > Thoughts? Opinions? Thanks for looking at this, Stefan. > Thinking back to the discussions we had around this at OCW, I remember there were some concerns regarding different types of deployments. I think this really makes sense in the cases where we divide a machines memory using containers, but what if containers are just used to divide other resources. One use-case that was raised was containerized applications on Linux. Currently there is only the generic `is_containerized()` API which has been documented in the bug that fixed that: [JDK-8261242](https://bugs.openjdk.org/browse/JDK-8261242?focusedId=14685743&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14685743) So yes, this would update the RAM percentage for a) unprivileged container (no limits), b) some other container tech which sets the cgroup CPU limit for example. The JVM currently only looks at memory/cpu limits for privileged containers and takes that into consideration for `is_containerized()`. If there is consensus, we could add an API that returns true if only a memory limit is present. That doesn't exist yet, though. Happy to propose something going into that direction. The infra is already there. > I'm not sure if such an application would report true for `is_containerized()`, but it would be nice to have some data around this. It would return true for any non-privileged container. I can see that this might be a concern. > Have you done any testing with containerized apps? I have done some basic testing so far, but would be happy to do more. What specific testing would you be interested in? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25086#issuecomment-2865954385 From shade at openjdk.org Fri May 9 11:23:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 11:23:55 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> References: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> Message-ID: <_8y_DYl9Q4P1scTtA_J8ilWw_GP0kdSL37bAmYb4dEM=.ea34a76f-0236-459f-b99c-a8d6129c3a67@github.com> On Thu, 8 May 2025 14:29:56 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/compiler/compileBroker.cpp line 1697: >> >>> 1695: JavaThread* thread = JavaThread::current(); >>> 1696: >>> 1697: methodHandle method(thread, task->method()); >> >> I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. > > Ah, that reminds me, thanks. > > I removed this because I caught method to be in unsafe (unloaded) state, so `method()` asserted on me. `compiler/c1/TestConcurrentPatching.java` seems to intermittently crash on it. On this code path, I think we might be plausibly waiting on unloaded compile task, and we "only" wait for notification that task got purged from the queue. Handelizing broken `Method*` is awkward, to say the least! > > Then again, I am not sure if removing this handle is safe enough. So out of abundance of caution, we can actually handelize `Method*` after checking for task status. But now that I do this: > > > methodHandle method(thread, task->is_unloaded() ? nullptr : task->method()); > > > ...the test still fails on the same assert! Which makes no sense to me, as we are supposed to be guarded by `is_unloaded` check before it. Something is off, I'll investigate. I understand now. There are TOCTOU-s under concurrent `block_unloading`. The most egregious one is here: `is_unloaded` checks in two steps: `!_weak_handle.is_empty() && _weak_handle.peek() == nullptr;`. So when `block_unloading` comes in concurrently and resets weak to empty (since we have strong handle now), it might be possible that first predicate is still `true`, but evaluation of second predicate calls `peek` on empty `_weak_handle`, oops. We could technically claim that `UnloadableMethodHandle` is not thread-safe, but it does not solve current compiler uses, and it is very unsatisfactory for the utility class. I'll look into ways to make it resilient under concurrent updates. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2081467353 From shade at openjdk.org Fri May 9 17:08:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 17:08:42 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v12] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Tracking UMH state more accurately - Rework for safer concurrency - Merge branch 'master' into JDK-8231269-compile-task-weaks - Move to oops - Improve get_method_blocker - Simplify a bit - Merge branch 'master' into JDK-8231269-compile-task-weaks - Do not accept nullptr methods - Attempt at phasing doc - Merge branch 'master' into JDK-8231269-compile-task-weaks - ... and 12 more: https://git.openjdk.org/jdk/compare/ad07426f...1cdbed2b ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=11 Stats: 393 lines in 11 files changed: 331 ins; 25 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Fri May 9 17:08:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 17:08:42 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops So... Following up on one forgotten `methodHandle` removal (https://github.com/openjdk/jdk/pull/24018#discussion_r2081467353) got me into a rabbit hole of making new utility class thread-safe. Otherwise, there are TOCTOU issues checking `(Weak)Handle` status, which gets us in trouble real quick. This is normally happens in current tests when external thread goes into `CompilerBroker::wait_for_compilation()` and compiler thread starts moving the `UMH` state for compilation. Relying on un-synchronized `Weak(Handle)` state is not nice either. The answer to all these problems is to track the `UMH` state more accurately, and thus trust `WeakHandle` only sporadically. This is now done in new commit. This also allows for more explicit state checks. And, this allows clearly catching when we try to access `method()` after `release()` -- that is surprisingly happens for `hot_method()` that is not re-initialized always. Chasing this bug also made my head hurt a bit about double-negating `!is_unloaded` checks. It is technically a safety check, so I renamed methods to reflect that: `is_safe`, `make_always_safe`. I will schedule weekend tests for this PR on various machines to see if more bugs fall out once I shake that particular tree. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2867309949 From ayang at openjdk.org Sun May 11 16:42:50 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sun, 11 May 2025 16:42:50 GMT Subject: RFR: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:29:02 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. > > This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). > > Testing: gha, manual testing as below: > > Mainline: > > > [3.740s][info ][gc,init ] Heap Min Capacity: 150G > [3.740s][info ][gc,init ] Heap Initial Capacity: 150G > [3.740s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B > . > . > [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms > > With patch (No shrinking when -Xms == -Xms): > > > [3.753s][info ][gc,init ] Heap Min Capacity: 150G > [3.753s][info ][gc,init ] Heap Initial Capacity: 150G > [3.753s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms > > With patch (Shrinking when -Xms != -Xms): > > > [3.755s][info ][gc,init ] Heap Min Capacity: 153568M > [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M > [3.755s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) > . > . > [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25036#pullrequestreview-2831444743 From gli at openjdk.org Sun May 11 19:33:58 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 11 May 2025 19:33:58 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Fri, 2 May 2025 10:23:25 GMT, Albert Mingkun Yang wrote: > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 I review about 1/3 code now. But I want to save the thoughts, so I submit it. Sorry for the noise if it bothers you. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 343: > 341: if (is_gc_overhead_limit_reached()) { > 342: return nullptr; > 343: } It seems the parameter `gc_overhead_limit_was_exceeded` and the field `MemAllocator::Allocation::_overhead_limit_exceeded` are not used by all GCs now. Should we keep the parameter and set it as `true` under the condition `is_gc_overhead_limit_reached()`? For example: if (op.prologue_succeeded()) { assert(is_in_or_null(op.result()), "result not in heap"); if (is_gc_overhead_limit_reached()) { *gc_overhead_limit_was_exceeded = true; return nullptr; } return op.result(); } Or we should remove the parameter and the field in another PR. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 825: > 823: // If MinHeapFreeRatio is at its default value; shrink cautiously. Otherwise, users expect prompt shrinking. > 824: if (FLAG_IS_DEFAULT(MinHeapFreeRatio) && MinHeapFreeRatio == 0) { > 825: if (desired_capacity < current_capacity) { I think curiously a lot about the condition `MinHeapFreeRatio == 0` and then I find the following code in `parallelArguments.cpp`. May it be better to use `UseAdaptiveSizePolicy && FLAG_IS_DEFAULT(MinHeapFreeRatio)` here instead of `FLAG_IS_DEFAULT(MinHeapFreeRatio) && MinHeapFreeRatio == 0`? if (UseAdaptiveSizePolicy) { // We don't want to limit adaptive heap sizing's freedom to adjust the heap // unless the user actually sets these flags. if (FLAG_IS_DEFAULT(MinHeapFreeRatio)) { FLAG_SET_DEFAULT(MinHeapFreeRatio, 0); } if (FLAG_IS_DEFAULT(MaxHeapFreeRatio)) { FLAG_SET_DEFAULT(MaxHeapFreeRatio, 100); } } src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 862: > 860: resize_old_gen_after_full_gc(); > 861: young_gen()->resize_after_full_gc(); > 862: } The `PSYoungGen` has its methods `resize_after_full_gc` and `resize_after_young_gc`. I think such design is good. What about moving the method `resize_old_gen_after_full_gc` (and the related method `calculate_desired_old_gen_capacity`) to `PSOldGen` and renaming it as `resize_after_full_gc`? src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 141: > 139: // Invoked at gc-pause-end > 140: void gc_epilogue(bool full); > 141: It is strange that Parallel GC didn't have its prologue and epilogue before. And currently, the concrete work categories (such as increasing the GC count) of the prologue and epilogue in all the GCs are not unified. It seems an issue left over by history, so it need more investigation in the future. src/hotspot/share/gc/parallel/psAdaptiveSizePolicy.cpp line 45: > 43: _avg_promoted(new AdaptivePaddedNoZeroDevAverage(AdaptiveSizePolicyWeight, PromotedPadding)), > 44: _space_alignment(space_alignment), > 45: _young_gen_size_increment_supplement(YoungGenerationSizeSupplement) {} Typos in `gc_globals.hpp`(shown below): `YoungedGenerationSizeIncrement` and `YoungedGenerationSizeSupplement`. It should be fixed in another PR. product(uint, YoungGenerationSizeIncrement, 20, \ "Adaptive size percentage change in young generation") \ range(0, 100) \ \ product(uint, YoungGenerationSizeSupplement, 80, \ "Supplement to YoungedGenerationSizeIncrement used at startup") \ // <--- here range(0, 100) \ \ product(uintx, YoungGenerationSizeSupplementDecay, 8, \ "Decay factor to YoungedGenerationSizeSupplement") \ // <--- here range(1, max_uintx) \ src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1104: > 1102: heap->post_full_gc_dump(&_gc_timer); > 1103: > 1104: size_policy->record_gc_pause_end_instant(); What about moving this invocation into `major_collection_end`? Just like the `record_gc_pause_start_instant` and `major_collection_begin`. src/hotspot/share/gc/shared/adaptiveSizePolicy.hpp line 179: > 177: _gc_distance_timer.reset(); > 178: _gc_distance_timer.start(); > 179: } The method name `record_gc_pause_end_instant` is about `gc pause`, but the code is about `gc distance`. May we need a clearer name? src/hotspot/share/gc/shared/adaptiveSizePolicy.hpp line 184: > 182: _gc_distance_timer.stop(); > 183: _gc_distance_seconds_seq.add(_gc_distance_timer.seconds()); > 184: } The method name `record_gc_pause_start_instant` is about `gc pause`, but the code is about `gc distance`. May we need a clearer name? ------------- Changes requested by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25000#pullrequestreview-2831414868 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083540517 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083573645 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083574866 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083578247 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083595212 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083596481 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083582694 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083581870 From iwalulya at openjdk.org Mon May 12 08:18:58 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 12 May 2025 08:18:58 GMT Subject: RFR: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms In-Reply-To: References: Message-ID: On Sun, 11 May 2025 16:40:15 GMT, Albert Mingkun Yang wrote: >> Hi, >> >> Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. >> >> This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). >> >> Testing: gha, manual testing as below: >> >> Mainline: >> >> >> [3.740s][info ][gc,init ] Heap Min Capacity: 150G >> [3.740s][info ][gc,init ] Heap Initial Capacity: 150G >> [3.740s][info ][gc,init ] Heap Max Capacity: 150G >> . >> . >> [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B >> . >> . >> [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms >> >> With patch (No shrinking when -Xms == -Xms): >> >> >> [3.753s][info ][gc,init ] Heap Min Capacity: 150G >> [3.753s][info ][gc,init ] Heap Initial Capacity: 150G >> [3.753s][info ][gc,init ] Heap Max Capacity: 150G >> . >> . >> [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms >> >> With patch (Shrinking when -Xms != -Xms): >> >> >> [3.755s][info ][gc,init ] Heap Min Capacity: 153568M >> [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M >> [3.755s][info ][gc,init ] Heap Max Capacity: 150G >> . >> . >> [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) >> . >> . >> [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25036#issuecomment-2871357566 From iwalulya at openjdk.org Mon May 12 08:18:59 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 12 May 2025 08:18:59 GMT Subject: Integrated: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:29:02 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. > > This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). > > Testing: gha, manual testing as below: > > Mainline: > > > [3.740s][info ][gc,init ] Heap Min Capacity: 150G > [3.740s][info ][gc,init ] Heap Initial Capacity: 150G > [3.740s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B > . > . > [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms > > With patch (No shrinking when -Xms == -Xms): > > > [3.753s][info ][gc,init ] Heap Min Capacity: 150G > [3.753s][info ][gc,init ] Heap Initial Capacity: 150G > [3.753s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms > > With patch (Shrinking when -Xms != -Xms): > > > [3.755s][info ][gc,init ] Heap Min Capacity: 153568M > [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M > [3.755s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) > . > . > [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms This pull request has now been integrated. Changeset: a3afc9f7 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/a3afc9f7ceba24ab607141426bb0a2693e6d37ca Stats: 16 lines in 1 file changed: 11 ins; 1 del; 4 mod 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/25036 From shade at openjdk.org Mon May 12 13:11:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 13:11:17 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v13] In-Reply-To: References: Message-ID: <2ydVKTAbomGLgJTwl-1jRBxgF4MRz0h-2CQmr9yHTxg=.0e094037-94b2-4627-92ef-01946fed014b@github.com> > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - More thorough locking and redefinition escape hatch - Fix build failures: add more headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/1cdbed2b..ce737c5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=11-12 Stats: 114 lines in 7 files changed: 58 ins; 20 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Mon May 12 14:15:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 14:15:16 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v14] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Fix release builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/ce737c5a..f239c221 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Mon May 12 14:33:40 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 14:33:40 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v15] In-Reply-To: References: Message-ID: <2LlyHKO14TOr7qVXQbyjy4ZWrGo8fCo3muVoa6VlFzc=.50816f66-e90b-4bb6-b953-64f6a675d664@github.com> > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More touchups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/f239c221..33e545ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=13-14 Stats: 26 lines in 3 files changed: 14 ins; 4 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From rcastanedalo at openjdk.org Mon May 12 14:48:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 May 2025 14:48:59 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Thanks for looking at this PR, Emanuel! > It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations. Right, I have prototyped this alternative in the wider context of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627) since it would be required for using writes as implicit null checks (both in ZGC and G1), and it indeed adds some complexity to `PhaseOutput` and other places (see https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks). I ran some preliminary experiments and could not see enough benefits to justify the additional complexity. > In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. This cannot happen because the address we are loading from is constant through the barrier, see e.g. the code generated for a zLoadP in x64 (AT&T syntax): 0x00007514c47d6aa0: movq 0x10(%rsi), %rax ; main OOP load with implicit exception: dispatches to 0x00007514c47d6abe 0x00007514c47d6aa4: shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax 0x00007514c47d6aa8: ja 0x36 ; jump to barrier stub (slow path) (...) 0x00007514c47d6abe: trigger uncommon trap (null_check) (...) barrier stub (slow path): 0x00007514c47d6ae4: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) 0x00007514c47d6b09: jmp -0x5d ; go back to main code section Note how the address we might fault on (triggering the implicit exception) is stored on `%rsi` (base address) + `0x10` (field offset), which is not changed between the main load and the slow-path reload. > I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer is still valid in the barrier, and if not, we somehow directly translate the old pointer to a new pointer? Is that what the oop map is used for? I am not sure I understand the question, could you perhaps re-formulate it using some example to make it more concrete? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2872870543 From eosterlund at openjdk.org Mon May 12 21:53:54 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 12 May 2025 21:53:54 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v2] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 13:01:07 GMT, Axel Boldt-Christmas wrote: >> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. >> >> The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. >> >> Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. >> >> Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Use private inheritance > - Separate tree logic to own class Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2834716108 From kdnilsen at openjdk.org Mon May 12 22:40:33 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 22:40:33 GMT Subject: RFR: 8355340: GenShen: Remove unneeded log messages related to remembered set write table [v2] In-Reply-To: References: Message-ID: <0rJUri0R4B1p5Vf_3tzRegWxn3T6r7046gKXUJbeYV8=.af67166b-45bc-4888-82ec-c69fbdb5c6af@github.com> > Remove unneeded log messages related to processing of the remembered set write card table. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Use log_develop_debug message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24809/files - new: https://git.openjdk.org/jdk/pull/24809/files/c1f65632..9cb54f3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24809&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24809&range=00-01 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24809.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24809/head:pull/24809 PR: https://git.openjdk.org/jdk/pull/24809 From kdnilsen at openjdk.org Mon May 12 22:40:33 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 22:40:33 GMT Subject: RFR: 8355340: GenShen: Remove unneeded log messages related to remembered set write table In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 00:54:45 GMT, Kelvin Nilsen wrote: > Remove unneeded log messages related to processing of the remembered set write card table. Replaced original messages with log_develop_debug() messages. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24809#issuecomment-2874353597 From kdnilsen at openjdk.org Mon May 12 22:43:26 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 22:43:26 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() Message-ID: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Two changes: 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). ------------- Commit messages: - Fix white apce - available() returns Sentinel if under construction - Log full gc region transfers outside heaplock - Make ShenFreeSet::available() race free - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 29 more: https://git.openjdk.org/jdk/compare/92730945...6353f1f7 Changes: https://git.openjdk.org/jdk/pull/25165/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356667 Stats: 95 lines in 10 files changed: 72 ins; 9 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/25165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25165/head:pull/25165 PR: https://git.openjdk.org/jdk/pull/25165 From wkemper at openjdk.org Mon May 12 23:09:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 May 2025 23:09:53 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() In-Reply-To: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: On Fri, 9 May 2025 23:45:50 GMT, Kelvin Nilsen wrote: > Two changes: > > 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) > 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). Minor nits. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 235: > 233: > 234: // Acquire heap lock and return available_in, assuming heap lock is not acquired by the caller. > 235: inline size_t available_in_under_lock(ShenandoahFreeSetPartitionId which_partition) const { This name confuses me: `available_in_under_lock`. Should it be called `available_without_lock` or `available_no_lock`? Or, switch it with `available_in` (which asserts that the heap lock is held). I see that it takes the lock, but this is only to make the assertion. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 242: > 240: } > 241: > 242: ShenandoahGenerationalHeap::TransferResult result;; Extra `;` here. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25165#pullrequestreview-2834801227 PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085650563 PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085648309 From wkemper at openjdk.org Mon May 12 23:10:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 May 2025 23:10:52 GMT Subject: RFR: 8355340: GenShen: Remove unneeded log messages related to remembered set write table [v2] In-Reply-To: <0rJUri0R4B1p5Vf_3tzRegWxn3T6r7046gKXUJbeYV8=.af67166b-45bc-4888-82ec-c69fbdb5c6af@github.com> References: <0rJUri0R4B1p5Vf_3tzRegWxn3T6r7046gKXUJbeYV8=.af67166b-45bc-4888-82ec-c69fbdb5c6af@github.com> Message-ID: <9ZAGrz5s4dOvL3oaBpQ014-qW272MewsPGOK9JordRc=.c21440fc-129e-4bfd-b5b4-3ad55459264d@github.com> On Mon, 12 May 2025 22:40:33 GMT, Kelvin Nilsen wrote: >> Remove unneeded log messages related to processing of the remembered set write card table. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Use log_develop_debug message Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24809#pullrequestreview-2834806479 From kdnilsen at openjdk.org Mon May 12 23:16:58 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 23:16:58 GMT Subject: Integrated: 8355340: GenShen: Remove unneeded log messages related to remembered set write table In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 00:54:45 GMT, Kelvin Nilsen wrote: > Remove unneeded log messages related to processing of the remembered set write card table. This pull request has now been integrated. Changeset: c23469df Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/c23469df162498e30119f43bc3d1effa15574a42 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8355340: GenShen: Remove unneeded log messages related to remembered set write table Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/jdk/pull/24809 From kdnilsen at openjdk.org Mon May 12 23:17:51 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 23:17:51 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() In-Reply-To: References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: On Mon, 12 May 2025 23:05:43 GMT, William Kemper wrote: >> Two changes: >> >> 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) >> 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 235: > >> 233: >> 234: // Acquire heap lock and return available_in, assuming heap lock is not acquired by the caller. >> 235: inline size_t available_in_under_lock(ShenandoahFreeSetPartitionId which_partition) const { > > This name confuses me: `available_in_under_lock`. Should it be called `available_without_lock` or `available_no_lock`? Or, switch it with `available_in` (which asserts that the heap lock is held). I see that it takes the lock, but this is only to make the assertion. Thanks for review and comments. I'll change the name. It follows a pattern that is admittedly very confusing... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085656619 From kdnilsen at openjdk.org Mon May 12 23:22:33 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 23:22:33 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v2] In-Reply-To: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: <9O1jQ5rn-sWaFz3hO-5tn4CCiDbyh1Q5E1fXDTM_Tco=.5efc1191-0c87-47d1-b219-737249bfe63d@github.com> > Two changes: > > 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) > 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Respond to reviewer comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25165/files - new: https://git.openjdk.org/jdk/pull/25165/files/6353f1f7..ffe1113e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25165/head:pull/25165 PR: https://git.openjdk.org/jdk/pull/25165 From kdnilsen at openjdk.org Mon May 12 23:22:33 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 23:22:33 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v2] In-Reply-To: References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: On Mon, 12 May 2025 23:14:59 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 235: >> >>> 233: >>> 234: // Acquire heap lock and return available_in, assuming heap lock is not acquired by the caller. >>> 235: inline size_t available_in_under_lock(ShenandoahFreeSetPartitionId which_partition) const { >> >> This name confuses me: `available_in_under_lock`. Should it be called `available_without_lock` or `available_no_lock`? Or, switch it with `available_in` (which asserts that the heap lock is held). I see that it takes the lock, but this is only to make the assertion. > > Thanks for review and comments. I'll change the name. It follows a pattern that is admittedly very confusing... changing to available_in_not_locked() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085660231 From kdnilsen at openjdk.org Mon May 12 23:22:33 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 23:22:33 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v2] In-Reply-To: References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: On Mon, 12 May 2025 23:02:26 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer comments > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 242: > >> 240: } >> 241: >> 242: ShenandoahGenerationalHeap::TransferResult result;; > > Extra `;` here. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085660144 From wkemper at openjdk.org Mon May 12 23:31:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 May 2025 23:31:54 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v2] In-Reply-To: <9O1jQ5rn-sWaFz3hO-5tn4CCiDbyh1Q5E1fXDTM_Tco=.5efc1191-0c87-47d1-b219-737249bfe63d@github.com> References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> <9O1jQ5rn-sWaFz3hO-5tn4CCiDbyh1Q5E1fXDTM_Tco=.5efc1191-0c87-47d1-b219-737249bfe63d@github.com> Message-ID: On Mon, 12 May 2025 23:22:33 GMT, Kelvin Nilsen wrote: >> Two changes: >> >> 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) >> 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Respond to reviewer comments Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 234: > 232: } > 233: > 234: // Acquire heap lock and return available_in, assuming heap lock is not acquired by the caller. Sorry - can we change this comment too? This method does _not_ acquire the lock in release builds. Comment could mention that it acquires the lock only for the correctness of the assertion? ------------- PR Review: https://git.openjdk.org/jdk/pull/25165#pullrequestreview-2834876929 PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085680099 From stefank at openjdk.org Tue May 13 04:31:56 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 13 May 2025 04:31:56 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v2] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 13:01:07 GMT, Axel Boldt-Christmas wrote: >> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. >> >> The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. >> >> Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. >> >> Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Use private inheritance > - Separate tree logic to own class Thanks for doing this cleanup. I have a few nits below. src/hotspot/share/gc/z/zMappedCache.cpp line 308: > 306: > 307: // Replace in size-class lists > 308: _tree.replace(old_node, new_node, cursor); This code was changed from: // Replace in tree _tree.replace(entry->node_addr(), cursor); // Replace in size-class lists to: // Replace in size-class lists _tree.replace(old_node, new_node, cursor); It seems like something went wrong with the comments. src/hotspot/share/gc/z/zMappedCache.cpp line 672: > 670: // use is_empty_error_reporter_safe and size_error_reporter_safe on the size > 671: // class lists. > 672: const size_t entry_count = _tree.size(); There doesn't seem to be an `Atomic::load` or `volatile` to make sure that we honor the comment about reading only once. src/hotspot/share/gc/z/zMappedCache.hpp line 32: > 30: #include "gc/z/zList.hpp" > 31: #include "utilities/globalDefinitions.hpp" > 32: #include "utilities/rbTree.hpp" Sort order. ------------- PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2835230617 PR Review Comment: https://git.openjdk.org/jdk/pull/25112#discussion_r2085893256 PR Review Comment: https://git.openjdk.org/jdk/pull/25112#discussion_r2085896187 PR Review Comment: https://git.openjdk.org/jdk/pull/25112#discussion_r2085896390 From aboldtch at openjdk.org Tue May 13 06:02:27 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 May 2025 06:02:27 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic Message-ID: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) required changing the way ZGC handle memory uncommitting (returning physical memory to the OS). Previously ZGC tracked how recently used memory was on a ZPage level. [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) did away with the ZPage abstraction for unused memory. But because of this ZGC does not have a convenient way of tracking the usage of a specific memory range. Instead [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) opted to keep a watermark in the cache unused mapped memory, to keep track of the amount of memory that was not used within the last ZUncommitDelay, and use this when deciding how much to uncommit. Because this measurement is not as granular as previously, and because uncommitting memory is something we want to do conservatively, as a response to low memory utilization, [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was written with the intent to spread out the uncommitting over some time interval. The actual implementation in [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) has a few issues which this RFE tries to address: * Missing wait, the uncommitting is not actually spread out, but happens all at once. * Reactivity, if the process starts using memory that was below the previous watermark, uncommitting should stop. * Structure, the current implementation has a lot of different dependencies and has state spread out over multiple classes. Refactor to keep the logic contained to the ZUncommitter, and provide better named facilitating functions on the ZPartition and ZMappedCache. And make the lifecycle of ZUncommitter more explicit. * Events, overhaul the JFR uncommit events to be sent (and track the time for) a chunk of uncommits without any waits. An alternative discussed has been to do uncommitting based on GC triggers rather than a periodically. So rather than using ZUncommitDelay, we could have our proactive GCs actually trigger and track uncommitting. This might be a future RFE, but it was not attempted here as it would change user facing APIs. [JDK-8329758](https://bugs.openjdk.org/browse/JDK-8329758) will more than likely overhaul the uncommit triggers as well, and the whole concept of ZUncommitDelay and having to tune how to uncommit will go away. ------------- Commit messages: - Use milliseconds instead of seconds - Improve events and statistics - Handle timeout correctly - Cleanups - Remove test's TIMEOUT_FACTOR dependency - Improve remove from min - Maybe better? - This is more inline with what uncommit does - Speed up as well, but weirdly none-linear - The intent Changes: https://git.openjdk.org/jdk/pull/25198/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356716 Stats: 333 lines in 9 files changed: 264 ins; 32 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/25198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25198/head:pull/25198 PR: https://git.openjdk.org/jdk/pull/25198 From epeter at openjdk.org Tue May 13 06:10:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 06:10:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Mon, 12 May 2025 14:46:22 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Thanks for looking at this PR, Emanuel! > >> It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations. > > Right, I have prototyped this alternative in the wider context of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627) since it would be required for using writes as implicit null checks (both in ZGC and G1), and it indeed adds some complexity to `PhaseOutput` and other places (see https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks). I ran some preliminary experiments and could not see enough benefits to justify the additional complexity. > >> In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. > > This cannot happen because the address we are loading from is constant through the barrier, see e.g. the code generated for a zLoadP in x64 (AT&T syntax): > > > 0x00007514c47d6aa0: movq 0x10(%rsi), %rax ; main OOP load with implicit exception: dispatches to 0x00007514c47d6abe > 0x00007514c47d6aa4: shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax > 0x00007514c47d6aa8: ja 0x36 ; jump to barrier stub (slow path) > > (...) > > 0x00007514c47d6abe: trigger uncommon trap (null_check) > > (...) > > barrier stub (slow path): > 0x00007514c47d6ae4: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring > (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) > 0x00007514c47d6b09: jmp -0x5d ; go back to main code section > > > Note how the address we might fault on (triggering the implicit exception) is stored on `%rsi` (base address) + `0x10` (field offset), which is not changed between the main load and the slow-path reload. > >> I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer ... @robcasloz Thanks for the explanations! I have no idea how the GC barriers work, and what addresses they load from. So I just had a list of questions run through my mind, about what could possibly go wrong. But the questions are more speculations, because I really have no idea what the GC barriers do. I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? Ideally, we would have some sort of semi-formal proof, to guarantee that if we did ever encounter a null-pointer, we would have to encounter it already on that first load. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875161021 From epeter at openjdk.org Tue May 13 06:19:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 06:19:52 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <_fLiVC2_bMj4oQ8k1__Y07Eyl-vAE4JrdjWbTfIR5QU=.94c2bfde-72ce-4db0-9d62-7b87c5067779@github.com> On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). If I understand your statements above correctly: The first load and any subsequent loads are all from the **exact same** address. Hence, if any were null-pointer, the first one has to be a null-pointer. Assuming this is correct, it seems that this follows: Assuming the pointer is not a null-pointer, then wherever it points to cannot be moved by the GC. In your example code above, `0x10(%rsi)` is the address, and presumably `rsi` refers to the base of some object, and `0x10` is the offset to a field. The object that `rsi` points to can thus not be moved by the GC, correct? But the object that the field at offset `0x10` points to may have been moved, and that is why we check its coloring, and then re-load from that field later. Does that sound correct to you? What guarantees that the object associated with `rsi` is not moved by the GC? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875176654 From sjohanss at openjdk.org Tue May 13 07:41:50 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 13 May 2025 07:41:50 GMT Subject: RFR: 8350596: [Linux] Increase default MaxRAMPercentage for containerized workloads In-Reply-To: <3zkVeUEqr_avGG1v8Q0Dp_0_FiZrXLxHJeU4KQ556sg=.77fbebb1-8bcf-40bb-95c0-664120321cbf@github.com> References: <3zkVeUEqr_avGG1v8Q0Dp_0_FiZrXLxHJeU4KQ556sg=.77fbebb1-8bcf-40bb-95c0-664120321cbf@github.com> Message-ID: On Fri, 9 May 2025 10:04:41 GMT, Severin Gehwolf wrote: > Thanks for looking at this, Stefan. > > > Thinking back to the discussions we had around this at OCW, I remember there were some concerns regarding different types of deployments. I think this really makes sense in the cases where we divide a machines memory using containers, but what if containers are just used to divide other resources. One use-case that was raised was containerized applications on Linux. > > Currently there is only the generic `is_containerized()` API which has been documented in the bug that fixed that: [JDK-8261242](https://bugs.openjdk.org/browse/JDK-8261242?focusedId=14685743&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14685743) > > So yes, this would update the RAM percentage for a) unprivileged container (no limits), b) some other container tech which sets the cgroup CPU limit for example. The JVM currently only looks at memory/cpu limits for privileged containers and takes that into consideration for `is_containerized()`. If there is consensus, we could add an API that returns true if only a memory limit is present. That doesn't exist yet, though. Happy to propose something going into that direction. The infra is already there. > This could be a good direction, we at least need some way to avoid desktop Java apps using too much memory out of the box. There was some talk about using 75% when containerized, but also looking at the machine total, so that if 75% of the container is more than 25% of the machine we fall back to 25% of the machine. For example, for an 8g container on a 16g machine, we would constrain the heap to 4g (25% of machine) rather than 6g (75% of the container). This would of course not be optimal in all situations either, but it would be a fall back to the old defaults for limit-less containers and still in some cases provide a higher default heap for memory configured container deployments. > > I'm not sure if such an application would report true for `is_containerized()`, but it would be nice to have some data around this. > > It would return true for any non-privileged container. I can see that this might be a concern. > Thanks for verifying this. > > Have you done any testing with containerized apps? > > I have done some basic testing so far, but would be happy to do more. What specific testing would you be interested in? I was mostly thinking about limitless containers (desktop apps) to see if we run into the problems of using way too much memory, but given your answer above I guess we would. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25086#issuecomment-2875389061 From sjohanss at openjdk.org Tue May 13 07:46:59 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 13 May 2025 07:46:59 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v5] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 08:27:20 GMT, Stefan Karlsson wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> StefanK review > > Marked as reviewed by stefank (Reviewer). Thanks for the reviews @stefank and @xmas92 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24814#issuecomment-2875402257 From sjohanss at openjdk.org Tue May 13 07:47:00 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 13 May 2025 07:47:00 GMT Subject: Integrated: 8353184: ZGC: Simplify and correct tlab_used() tracking In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 07:58:35 GMT, Stefan Johansson wrote: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... This pull request has now been integrated. Changeset: 526f543a Author: Stefan Johansson URL: https://git.openjdk.org/jdk/commit/526f543adfeb90341b3b5b18916c1bb7ef725599 Stats: 227 lines in 12 files changed: 180 ins; 41 del; 6 mod 8353184: ZGC: Simplify and correct tlab_used() tracking Reviewed-by: stefank, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/24814 From rcastanedalo at openjdk.org Tue May 13 08:40:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 08:40:52 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <_fLiVC2_bMj4oQ8k1__Y07Eyl-vAE4JrdjWbTfIR5QU=.94c2bfde-72ce-4db0-9d62-7b87c5067779@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> <_fLiVC2_bMj4oQ8k1__Y07Eyl-vAE4JrdjWbTfIR5QU=.94c2bfde-72ce-4db0-9d62-7b87c5067779@github.com> Message-ID: On Tue, 13 May 2025 06:16:53 GMT, Emanuel Peter wrote: > If I understand your statements above correctly: > The first load and any subsequent loads are all from the exact same address. Hence, if any were null-pointer, the first one has to be a null-pointer. Right. > Assuming this is correct, it seems that this follows: > Assuming the pointer is not a null-pointer, then wherever it points to cannot be moved by the GC. In your example code above, 0x10(%rsi) is the address, and presumably rsi refers to the base of some object, and 0x10 is the offset to a field. The object that rsi points to can thus not be moved by the GC, correct? But the object that the field at offset 0x10 points to may have been moved, and that is why we check its coloring, and then re-load from that field later. Does that sound correct to you? What guarantees that the object associated with rsi is not moved by the GC? The inner workings of ZGC's guarantee that "root" addresses such as `%rsi` remain valid ("have a good color" in ZGC speak), but I am afraid I cannot offer a more detailed explanation. You may find more information in e.g. [1] (even though it is outdated by now as it describes non-generational ZGC), or perhaps some GC engineer may chime into the discussion and offer more detail? In any case, to convince ourselves of the correctness of this RFE without needed to dive deep into ZGC internals, maybe it is enough to ensure that we preserve the same behavior as in mainline (where `zLoadP` cannot be used for implicit null checks). Here is how the compiled code looks for the above example before and after this change: # Before the RFE (explicit null check): testq %rsi, %rsi ; explicit null check on the base address je #uncommon_trap block movq 0x10(%rsi), %rax ; main OOP load shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax ja #slow_barrier_path continue: (...) slow_barrier_path: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) jmp #continue # After the RFE (implicit null check): movq 0x10(%rsi), %rax ; main OOP load with implicit exception: dispatches to #uncommon_trap block shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax ja #slow_barrier_path continue: (...) slow_barrier_path: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) jmp #continue As you can see, both cases rely on the same assumptions about the validity of `%rsi` through the execution of the compiled code. [1] Albert Mingkun Yang and Tobias Wrigstad. Deep Dive into ZGC: A Modern Garbage Collector in OpenJDK. In ACM TOPLAS, 2022. https://doi.org/10.1145/3538532 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875559501 From rcastanedalo at openjdk.org Tue May 13 08:55:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 08:55:55 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 06:08:38 GMT, Emanuel Peter wrote: > I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? Regarding code, I recommend you starting [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L126-L129) and following `z_load_barrier`. The slow barrier path is generated in a stub [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L1217-L1235). Regarding documentation, you might have a look at the [TOPLAS paper](https://dl.acm.org/doi/full/10.1145/3538532) (which is unfortunately a bit outdated because it only covers non-generational ZGC, but might still offer some intuition that is valid for the latest ZGC version, in particular regarding concurrent relocation and load barriers), the [Generational ZGC JEP](https://openjdk.org/jeps/439), or one of the numerous presentations available on YouTube (e.g. I found the overview in https://www.youtube.com/watch?v=YyXjC68l8mw&t=864s pretty useful). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875604527 From epeter at openjdk.org Tue May 13 09:59:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 09:59:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 08:53:08 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz Thanks for the explanations! >> I have no idea how the GC barriers work, and what addresses they load from. So I just had a list of questions run through my mind, about what could possibly go wrong. But the questions are more speculations, because I really have no idea what the GC barriers do. >> >> I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? >> >> Ideally, we would have some sort of semi-formal proof, to guarantee that if we did ever encounter a null-pointer, we would have to encounter it already on that first load. > >> I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? > > Regarding code, I recommend you starting [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L126-L129) and following `z_load_barrier`. The slow barrier path is generated in a stub [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L1217-L1235). > > Regarding documentation, you might have a look at the [TOPLAS paper](https://dl.acm.org/doi/full/10.1145/3538532) (which is unfortunately a bit outdated because it only covers non-generational ZGC, but might still offer some intuition that is valid for the latest ZGC version, in particular regarding concurrent relocation and load barriers), the [Generational ZGC JEP](https://openjdk.org/jeps/439), or one of the numerous presentations available on YouTube (e.g. I found the overview in https://www.youtube.com/watch?v=YyXjC68l8mw&t=864s pretty useful). @robcasloz Alright, to me this sounds convincing. I suggest you add a comment about this assumption, i.e. that the address we load from is always the same. And then let a GC engineer have a look at this PR, to confirm that this assumption is always correct, and that there is not some other path where the address could change ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875820583 From shade at openjdk.org Tue May 13 13:47:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 May 2025 13:47:09 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v16] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Rename CompilerTask::is_unloaded back to avoid losing comment context - Simplify select_for_compilation - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - Fix release builds - More thorough locking and redefinition escape hatch - Fix build failures: add more headers - Tracking UMH state more accurately - Rework for safer concurrency - Merge branch 'master' into JDK-8231269-compile-task-weaks - ... and 19 more: https://git.openjdk.org/jdk/compare/48d2acb3...59798bdb ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=15 Stats: 422 lines in 12 files changed: 379 ins; 19 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From jsikstro at openjdk.org Tue May 13 14:20:32 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 13 May 2025 14:20:32 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing Message-ID: Hello, The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. * And the largest change in terms of LOC, separate Metaspace and GC Heap in the periodic printing before/after GC invocation(s). The periodic printing is also recorded in a ring buffer, which is printed in vmError.cpp. Testing: * GHA, Oracle's tier 1-4 * Manuel inspection of printed content ------------- Commit messages: - 8356848: Separate Metaspace and GC printing Changes: https://git.openjdk.org/jdk/pull/25214/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356848 Stats: 138 lines in 14 files changed: 52 ins; 32 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/25214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25214/head:pull/25214 PR: https://git.openjdk.org/jdk/pull/25214 From cnorrbin at openjdk.org Tue May 13 14:20:48 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 13 May 2025 14:20:48 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v3] In-Reply-To: References: Message-ID: > Hi everyone, > > This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. > > For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. > > The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - removed last traces of hrt.ticks - Merge branch 'master' into statsampler-removal - feedback fixes - removed the PerfDataSamplingInterval flag - calculate timestamp in jstat instead of sampling - StatSampler + sampling code removed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24872/files - new: https://git.openjdk.org/jdk/pull/24872/files/ed3670eb..fb53bd44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24872&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24872&range=01-02 Stats: 73116 lines in 2258 files changed: 50367 ins; 13675 del; 9074 mod Patch: https://git.openjdk.org/jdk/pull/24872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24872/head:pull/24872 PR: https://git.openjdk.org/jdk/pull/24872 From rcastanedalo at openjdk.org Tue May 13 16:03:43 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:03:43 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: - Generalize tests by removing requires annotation and adding local applyIf rules - Assert that we do not move control nodes - Extend comment about hoisting DecodeN inputs - Apply Emanuels suggestions to ensure_node_is_at_block_or_above - Rename auxiliary functions - Rename auxiliary functions - Clarify scope of move_into - Extend comment about MachTemp nodes - Extract and reuse legitimize_address test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25066/files - new: https://git.openjdk.org/jdk/pull/25066/files/dc5aa4fc..6353f42b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=00-01 Stats: 66 lines in 5 files changed: 21 ins; 19 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From rcastanedalo at openjdk.org Tue May 13 16:06:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:06:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 09:51:44 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: >> >> - Generalize tests by removing requires annotation and adding local applyIf rules >> - Assert that we do not move control nodes >> - Extend comment about hoisting DecodeN inputs >> - Apply Emanuels suggestions to ensure_node_is_at_block_or_above >> - Rename auxiliary functions >> - Rename auxiliary functions >> - Clarify scope of move_into >> - Extend comment about MachTemp nodes >> - Extract and reuse legitimize_address test > > src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 130: > >> 128: Address::offset_ok_for_immed(ref_addr.offset(), exact_log2(size)), >> 129: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); >> 130: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); > > For context: > > 132 /* Sometimes we get misaligned loads and stores, usually from Unsafe > 133 accesses, and these can exceed the offset range. */ > 134 Address legitimize_address(const Address &a, int size, Register scratch) { > 135 if (a.getMode() == Address::base_plus_offset) { > 136 if (! Address::offset_ok_for_immed(a.offset(), exact_log2(size))) { > 137 block_comment("legitimize_address {"); > 138 lea(scratch, a); > 139 block_comment("} legitimize_address"); > 140 return Address(scratch); > 141 } > 142 } > 143 return a; > 144 } > > I wonder if it might be worth to create a `legitimize_address_requires_lea` that does the checks. Then you could refactor `legitimize_address` with it, and also use it here. Not sure if it is worth it, but it could ensure that the checks stay in sync. Up to you. Thanks, done (commit 5c7da867). > What about the `MachTemp`? I did not include moving incoming MachTemp nodes so that I could reuse the function across `PhaseCFG::implicit_null_check` without risking behavioral changes. I extended the comment of `move_into` to clarify its scope (commit d6a749e4). > Also: how specific to implicit null checks are your methods `move_into` and `maybe_hoist_into`? If they are not reusable elsewhere, it may be good to give them a more specific name. I changed the name according to your suggestion below, except using "above" instead of "before" which I find more natural when referring to the dominator tree (commits dbe46110 and bcf08f90). > src/hotspot/share/opto/lcm.cpp line 356: > >> 354: if (mach->in(j)->is_MachTemp()) { >> 355: assert(mach->in(j)->outcnt() == 1, "MachTemp nodes should not be shared"); >> 356: // Ignore MachTemp inputs, they can be safely hoisted with the candidate. > > Suggestion: > > // Ignore MachTemp inputs, they can be safely hoisted with the candidate. > // MachTemp have no inputs themselves and are only there to reserve a scratch > // register for the GC barrier of the memory operation. > > That was what you told me in our offline meeting, I thought it was helpful context information. Thanks, added a slightly generalized version (commit 446649a6). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087177975 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087179887 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087178324 From jsikstro at openjdk.org Tue May 13 16:14:06 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 13 May 2025 16:14:06 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v2] In-Reply-To: References: Message-ID: > Hello, > > The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. > > With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. > > To better reflect the current state of the JVM, I propose we make the following changes to separate Metaspace from GC printing: > * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. > * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). > * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. > * And the largest change in terms of LOC, separate Metaspace and GC Heap in the periodic printing before/after GC invocation(s). The periodic printing is also recorded in a ring buffer, which is printed in vmError.cpp. > > Testing: > * GHA, Oracle's tier 1-4 > * Manuel inspection of printed content Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: HeapInfoTest should check that GC.heap_info actually runs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25214/files - new: https://git.openjdk.org/jdk/pull/25214/files/4beb8c30..3a6c6a83 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25214/head:pull/25214 PR: https://git.openjdk.org/jdk/pull/25214 From rcastanedalo at openjdk.org Tue May 13 16:17:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:17:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 10:48:26 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: >> >> - Generalize tests by removing requires annotation and adding local applyIf rules >> - Assert that we do not move control nodes >> - Extend comment about hoisting DecodeN inputs >> - Apply Emanuels suggestions to ensure_node_is_at_block_or_above >> - Rename auxiliary functions >> - Rename auxiliary functions >> - Clarify scope of move_into >> - Extend comment about MachTemp nodes >> - Extract and reuse legitimize_address test > > src/hotspot/share/opto/lcm.cpp line 79: > >> 77: } >> 78: >> 79: void PhaseCFG::move_into(Node* n, Block* b) { > > Suggestion: > > void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { Done. > Do I understand this right: You are looking at some input `n` here, and want to make sure that it is located at `b` or before? Right. > I did not understand what this meant: `sanity check: temp node placement`... Ah, I suppose we are assuming that `n` is a `MachTemp`, and this would have to be placed in a block dominated by b? But could `n` not also be a `load Base`? Could that be a `MachProj`? Just a little confused here. Maybe moving the `b->dominates(current)` assert down helps give good context? But in a sense, it is also a precondition, we can only move `n` up to `b` if `b` dominates `n`... > > Do you have a better idea? Right, the comment comes from the original context from which the code is moved, and I guess it should be generalized to make more sense in its new context. I went with your suggestion (commit 793bbe7f). The intention of `ensure_node_is_at_block_or_above` becomes hopefully clear by looking at its callees. > That was what you told me in our offline meeting, I thought it was helpful context information. Thanks, added a slightly generalized version (commit 446649a6). > src/hotspot/share/opto/lcm.cpp line 437: > >> 435: if (n == nullptr || !n->is_MachTemp()) { >> 436: continue; >> 437: } > > Do you want to check that all other nodes already dominate `block`? This is guaranteed by the input domination test in https://github.com/openjdk/jdk/pull/25066/files#diff-6343a8024ec7abfc1bd5e377ba254ed868d97a99258b5af0aab12ecf8f961503R345-R369, so it feels a bit redundant. Let me know if you still think it would be useful to add the assertion. > src/hotspot/share/opto/lcm.cpp line 439: > >> 437: } >> 438: maybe_hoist_into(n, block); >> 439: } > > It seems to me this is definitely new code, ensuring that we move the `MachTemp`. We did not do that before, at least not here. Correct? That's right. > src/hotspot/share/opto/lcm.cpp line 441: > >> 439: map_node_to_block(n, block); >> 440: } >> 441: } > > This now happens in `move_into`, right? Yes. > src/hotspot/share/opto/machnode.hpp line 391: > >> 389: >> 390: // Whether this node is expanded during code emission into a sequence of >> 391: // instructions and the first instruction can perform an implicit null check. > > You may want to put a warning / reasoning here, in case there are multiple loads. > You explained to me offline that a `zLoadP` may have a load at the beginning, but then need to load again if the GC moved the object. I suppose if it was moved, then it cannot be null, and so that should be safe... maybe that is a sufficient argument, what do you think? In light of our discussion above I am not sure this warning is needed, the key invariant IMO is that the very first instruction emitted should be able to implement the implicit null check. > test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 51: > >> 49: * @requires vm.gc.Z >> 50: * @run driver compiler.gcbarriers.TestImplicitNullChecks Z >> 51: */ > > Do you think there would be any value in having a run without requirements? Just for general result verification... i.e. that we get the correct NullPointerException. > Of course, you would have to probably add `applyIf` to the `@IR` rules. Sure, done (commit 6353f42b). > test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 140: > >> 138: // G1 and ZGC stores cannot be currently used to implement implicit null >> 139: // checks, because they expand into multiple memory access instructions that >> 140: // are not necessarily located at the initial instruction start address. > > Very random idea, no idea if it is any good: > Why not do the implicit null-check with a fake Load? > No idea on the implications here. I suppose it would be extra code, but at least not branching code? Thanks, but given that doing something theoretically more efficient (addressing the limitation and using the stores for implicit null checking as described in https://github.com/openjdk/jdk/pull/25066#issuecomment-2872870543) did not show any benefit in practice I would not expect any benefit either from implementing the null checks with a synthetic load. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087184637 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087184187 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087195811 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087187130 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087191174 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087187408 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087187699 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087188494 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087190800 From rcastanedalo at openjdk.org Tue May 13 16:17:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:17:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 15:22:33 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: >> >> - Generalize tests by removing requires annotation and adding local applyIf rules >> - Assert that we do not move control nodes >> - Extend comment about hoisting DecodeN inputs >> - Apply Emanuels suggestions to ensure_node_is_at_block_or_above >> - Rename auxiliary functions >> - Rename auxiliary functions >> - Clarify scope of move_into >> - Extend comment about MachTemp nodes >> - Extract and reuse legitimize_address test > > src/hotspot/share/opto/lcm.cpp line 95: > >> 93: } >> 94: >> 95: void PhaseCFG::maybe_hoist_into(Node* n, Block* b) { > > Consider adding asserts into these 2 new methods to make sure that they operate only on **data** and not control nodes. Thanks, done (commit b198fca8). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087192210 From rcastanedalo at openjdk.org Tue May 13 16:17:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:17:59 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 11:10:05 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/lcm.cpp line 428: >> >>> 426: maybe_hoist_into(val->in(i), block); >>> 427: } >>> 428: move_into(val, block); >> >> Suggestion: >> >> // Inputs of val may already be early enough, but if not move them together with val. >> ensure_node_is_at_block_or_before(val->in(i), block); >> } >> move_node_and_its_projections_to_block(val, block); > > It's a little hard to see here: did you just refactor this code, or make any changes? I just refactored the code (extracted and generalized the logic into the `ensure_node_is_at_block_or_above` and `move_node_and_its_projections_to_block` primitives so that it can be reused by the new logic (dealing with `MachTemp` inputs) and also by other existing logic (hoisting the memory candidate and its flag-killing projections). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087186663 From rcastanedalo at openjdk.org Tue May 13 16:23:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:23:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <8NRnIxRrMoiLw2RGUzMuiFjiC35mPs53Kp1IKOWLRuI=.44049e5d-48b2-4aac-abe1-27e7b76d8cc5@github.com> On Thu, 8 May 2025 11:04:26 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: >> >> - Generalize tests by removing requires annotation and adding local applyIf rules >> - Assert that we do not move control nodes >> - Extend comment about hoisting DecodeN inputs >> - Apply Emanuels suggestions to ensure_node_is_at_block_or_above >> - Rename auxiliary functions >> - Rename auxiliary functions >> - Clarify scope of move_into >> - Extend comment about MachTemp nodes >> - Extract and reuse legitimize_address test > > test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 119: > >> 117: testLoad(o); >> 118: } catch (NullPointerException e) { nullPointerException = true; } >> 119: Asserts.assertTrue(nullPointerException); > > Suggestion: > > try { > testLoad(o); > throw new RuntimeException("Should have thrown NullPointerException"); > } catch (NullPointerException e) { /* expected */} > > Could be a shorter alternative. Up to you. Maybe there is a benefit to `Asserts.assertTrue` I am also not aware of? > But totally optional, as your approach works anyway :) I rather prefer the current version with `Asserts.assertTrue(nullPointerException)`, because it makes the test expectations more explicit (no need for an `/* expected */` comment or similar). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087205670 From qamai at openjdk.org Tue May 13 16:27:00 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 May 2025 16:27:00 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 16:03:43 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: > > - Generalize tests by removing requires annotation and adding local applyIf rules > - Assert that we do not move control nodes > - Extend comment about hoisting DecodeN inputs > - Apply Emanuels suggestions to ensure_node_is_at_block_or_above > - Rename auxiliary functions > - Rename auxiliary functions > - Clarify scope of move_into > - Extend comment about MachTemp nodes > - Extract and reuse legitimize_address test src/hotspot/share/opto/output.cpp line 2020: > 2018: assert(access->barrier_data() == 0 || > 2019: access->is_late_expanded_null_check_candidate(), > 2020: "Implicit null checks on memory accesses with barriers are only supported on nodes explicitly marked as null-check candidates"); I assume this is why you want the SIGSEGV instruction to be the first one. Do you think it is better if we mark the whole region and any SIGSEGV from any instruction inside the region will be mapped to this handler. Another way is to make the `MachNode` set the SIGSEGV point themselves. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087211380 From rcastanedalo at openjdk.org Tue May 13 17:24:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 17:24:59 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 16:24:25 GMT, Quan Anh Mai wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: >> >> - Generalize tests by removing requires annotation and adding local applyIf rules >> - Assert that we do not move control nodes >> - Extend comment about hoisting DecodeN inputs >> - Apply Emanuels suggestions to ensure_node_is_at_block_or_above >> - Rename auxiliary functions >> - Rename auxiliary functions >> - Clarify scope of move_into >> - Extend comment about MachTemp nodes >> - Extract and reuse legitimize_address test > > src/hotspot/share/opto/output.cpp line 2020: > >> 2018: assert(access->barrier_data() == 0 || >> 2019: access->is_late_expanded_null_check_candidate(), >> 2020: "Implicit null checks on memory accesses with barriers are only supported on nodes explicitly marked as null-check candidates"); > > I assume this is why you want the SIGSEGV instruction to be the first one. Do you think it is better if we mark the whole region and any SIGSEGV from any instruction inside the region will be mapped to this handler. Another way is to make the `MachNode` set the SIGSEGV point themselves. Thanks, both could be done, but require non-trivial changes to the exception table building logic for no apparent benefit. I actually prototyped your second suggestion [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks) some time ago so that I could also use ZGC and G1 writes as implicit null checks, but the experiments did not show any performance benefit that could justify the additional complexity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087316762 From rcastanedalo at openjdk.org Tue May 13 17:40:40 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 17:40:40 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Extend comments in zLoadP implementations to explain role of reload ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25066/files - new: https://git.openjdk.org/jdk/pull/25066/files/6353f42b..20d960e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=01-02 Stats: 12 lines in 2 files changed: 8 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From rcastanedalo at openjdk.org Tue May 13 17:44:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 17:44:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 08:53:08 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz Thanks for the explanations! >> I have no idea how the GC barriers work, and what addresses they load from. So I just had a list of questions run through my mind, about what could possibly go wrong. But the questions are more speculations, because I really have no idea what the GC barriers do. >> >> I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? >> >> Ideally, we would have some sort of semi-formal proof, to guarantee that if we did ever encounter a null-pointer, we would have to encounter it already on that first load. > >> I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? > > Regarding code, I recommend you starting [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L126-L129) and following `z_load_barrier`. The slow barrier path is generated in a stub [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L1217-L1235). > > Regarding documentation, you might have a look at the [TOPLAS paper](https://dl.acm.org/doi/full/10.1145/3538532) (which is unfortunately a bit outdated because it only covers non-generational ZGC, but might still offer some intuition that is valid for the latest ZGC version, in particular regarding concurrent relocation and load barriers), the [Generational ZGC JEP](https://openjdk.org/jeps/439), or one of the numerous presentations available on YouTube (e.g. I found the overview in https://www.youtube.com/watch?v=YyXjC68l8mw&t=864s pretty useful). > @robcasloz Alright, to me this sounds convincing. I suggest you add a comment about this assumption, i.e. that the address we load from is always the same. Thanks, I added comments to the zLoadP implementations (commit 20d960e6). > And then let a GC engineer have a look at this PR, to confirm that this assumption is always correct, and that there is not some other path where the address could change ;) Absolutely, after getting approval from the compiler side, I will request a formal review from the GC side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2877434993 From rcastanedalo at openjdk.org Tue May 13 17:44:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 17:44:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 09:56:47 GMT, Emanuel Peter wrote: >>> I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? >> >> Regarding code, I recommend you starting [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L126-L129) and following `z_load_barrier`. The slow barrier path is generated in a stub [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L1217-L1235). >> >> Regarding documentation, you might have a look at the [TOPLAS paper](https://dl.acm.org/doi/full/10.1145/3538532) (which is unfortunately a bit outdated because it only covers non-generational ZGC, but might still offer some intuition that is valid for the latest ZGC version, in particular regarding concurrent relocation and load barriers), the [Generational ZGC JEP](https://openjdk.org/jeps/439), or one of the numerous presentations available on YouTube (e.g. I found the overview in https://www.youtube.com/watch?v=YyXjC68l8mw&t=864s pretty useful). > > @robcasloz Alright, to me this sounds convincing. I suggest you add a comment about this assumption, i.e. that the address we load from is always the same. And then let a GC engineer have a look at this PR, to confirm that this assumption is always correct, and that there is not some other path where the address could change ;) @eme64 @vnkozlov Thank you for your thorough comments and suggestions, I believe I have addressed all of them in the latest version. Please re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2877438612 From kdnilsen at openjdk.org Tue May 13 17:55:07 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 13 May 2025 17:55:07 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v3] In-Reply-To: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: > Two changes: > > 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) > 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25165/files - new: https://git.openjdk.org/jdk/pull/25165/files/ffe1113e..c4bb674d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25165/head:pull/25165 PR: https://git.openjdk.org/jdk/pull/25165 From kdnilsen at openjdk.org Tue May 13 17:55:10 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 13 May 2025 17:55:10 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v2] In-Reply-To: References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> <9O1jQ5rn-sWaFz3hO-5tn4CCiDbyh1Q5E1fXDTM_Tco=.5efc1191-0c87-47d1-b219-737249bfe63d@github.com> Message-ID: On Mon, 12 May 2025 23:28:44 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer comments > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 234: > >> 232: } >> 233: >> 234: // Acquire heap lock and return available_in, assuming heap lock is not acquired by the caller. > > Sorry - can we change this comment too? This method does _not_ acquire the lock in release builds. Comment could mention that it acquires the lock only for the correctness of the assertion? Very good point. I've made this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2087358566 From kvn at openjdk.org Tue May 13 17:58:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 May 2025 17:58:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 17:40:40 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Extend comments in zLoadP implementations to explain role of reload Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2837672919 From kdnilsen at openjdk.org Tue May 13 18:04:13 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 13 May 2025 18:04:13 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v4] In-Reply-To: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: > Two changes: > > 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) > 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix white space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25165/files - new: https://git.openjdk.org/jdk/pull/25165/files/c4bb674d..13b53514 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25165/head:pull/25165 PR: https://git.openjdk.org/jdk/pull/25165 From kdnilsen at openjdk.org Tue May 13 18:11:24 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 13 May 2025 18:11:24 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v5] In-Reply-To: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: > Two changes: > > 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) > 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Another white space problem ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25165/files - new: https://git.openjdk.org/jdk/pull/25165/files/13b53514..10373305 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25165/head:pull/25165 PR: https://git.openjdk.org/jdk/pull/25165 From wkemper at openjdk.org Tue May 13 18:11:25 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 May 2025 18:11:25 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v5] In-Reply-To: References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: <9JQe7kdBr__vV--6YCxZ2VfRPQU5ruXVuae6J-TUIuo=.a65b9aee-f43c-4343-bc1f-4d065718ef28@github.com> On Tue, 13 May 2025 18:08:21 GMT, Kelvin Nilsen wrote: >> Two changes: >> >> 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) >> 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Another white space problem Thanks for the changes. Looks good to me! ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25165#pullrequestreview-2837697704 From kdnilsen at openjdk.org Tue May 13 18:11:25 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 13 May 2025 18:11:25 GMT Subject: Integrated: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() In-Reply-To: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: On Fri, 9 May 2025 23:45:50 GMT, Kelvin Nilsen wrote: > Two changes: > > 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) > 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). This pull request has now been integrated. Changeset: e7ce661a Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/e7ce661adb01fba4bb690d51cc2858c822008654 Stats: 97 lines in 10 files changed: 74 ins; 9 del; 14 mod 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/25165 From dholmes at openjdk.org Wed May 14 00:54:59 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 14 May 2025 00:54:59 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v2] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 16:14:06 GMT, Joel Sikstr?m wrote: >> Hello, >> >> The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. >> >> With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. >> >> To better reflect the current state of the JVM, I propose we make the following changes to separate Metaspace from GC printing: >> * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. >> * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). >> * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. >> * And the largest change in terms of LOC, separate Metaspace and GC Heap in the periodic printing before/after GC invocation(s). The periodic printing is also recorded in a ring buffer, which is printed in vmError.cpp. >> >> Testing: >> * GHA, Oracle's tier 1-4 >> * Manuel inspection of printed content > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > HeapInfoTest should check that GC.heap_info actually runs @jsikstro I think this is relevant to the serviceability folk. Also you are making a number of changes to the way tool commands behave so this definitely needs a CSR request, and also a Release Note. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25214#issuecomment-2878290615 From aboldtch at openjdk.org Wed May 14 06:34:31 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 May 2025 06:34:31 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic [v2] In-Reply-To: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> References: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> Message-ID: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) required changing the way ZGC handle memory uncommitting (returning physical memory to the OS). Previously ZGC tracked how recently used memory was on a ZPage level. [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) did away with the ZPage abstraction for unused memory. But because of this ZGC does not have a convenient way of tracking the usage of a specific memory range. Instead [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) opted to keep a watermark in the cache unused mapped memory, to keep track of the amount of memory that was not used within the last ZUncommitDelay, and use this when deciding how much to uncommit. > > Because this measurement is not as granular as previously, and because uncommitting memory is something we want to do conservatively, as a response to low memory utilization, [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was written with the intent to spread out the uncommitting over some time interval. > > The actual implementation in [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) has a few issues which this RFE tries to address: > * Missing wait, the uncommitting is not actually spread out, but happens all at once. > * Reactivity, if the process starts using memory that was below the previous watermark, uncommitting should stop. > * Structure, the current implementation has a lot of different dependencies and has state spread out over multiple classes. Refactor to keep the logic contained to the ZUncommitter, and provide better named facilitating functions on the ZPartition and ZMappedCache. And make the lifecycle of ZUncommitter more explicit. > * Events, overhaul the JFR uncommit events to be sent (and track the time for) a chunk of uncommits without any waits. > > An alternative discussed has been to do uncommitting based on GC triggers rather than a periodically. So rather than using ZUncommitDelay, we could have our proactive GCs actually trigger and track uncommitting. This might be a future RFE, but it was not attempted here as it would change user facing APIs. [JDK-8329758](https://bugs.openjdk.org/browse/JDK-8329758) will more than likely overhaul the uncommit triggers as well, and the whole concept of ZUncommitDelay and having to tune how to uncommit will go away. Axel Boldt-Christmas has updated the pull request incrementally with seven additional commits since the last revision: - Wrong too - Less archaic spelling of complete - Cleanup and simplify - Move all uncommit logic to zUncommitter - Log time spent uncommitting - Split reset_uncommit_cycle and add headroom - Rename _min_last_uncommit_cycle to _min_size_watermark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25198/files - new: https://git.openjdk.org/jdk/pull/25198/files/82a87d8d..96ce895f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=00-01 Stats: 265 lines in 7 files changed: 117 ins; 110 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/25198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25198/head:pull/25198 PR: https://git.openjdk.org/jdk/pull/25198 From jsikstro at openjdk.org Wed May 14 06:47:52 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 14 May 2025 06:47:52 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v2] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 00:51:56 GMT, David Holmes wrote: >> Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> HeapInfoTest should check that GC.heap_info actually runs > > @jsikstro I think this is relevant to the serviceability folk. Also you are making a number of changes to the way tool commands behave so this definitely needs a CSR request, and also a Release Note. @dholmes-ora Thank you for the input! I will take a look at creating a CSR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25214#issuecomment-2878876071 From tschatzl at openjdk.org Wed May 14 08:21:13 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 14 May 2025 08:21:13 GMT Subject: RFR: 8355743: G1: Collection set clearing is not recorded as part of "Free Collection Set Time" Message-ID: Hi all, please review this change that moves "clear_collection_set" work correctly under the "Serial Free Collection Set" phase. There is probably no particular impact on timings, but just for correctness. Testing: gha, gc/g1 tests Thanks, Thomas ------------- Commit messages: - 8355743 Changes: https://git.openjdk.org/jdk/pull/25222/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25222&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355743 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25222.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25222/head:pull/25222 PR: https://git.openjdk.org/jdk/pull/25222 From aboldtch at openjdk.org Wed May 14 08:33:35 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 14 May 2025 08:33:35 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v3] In-Reply-To: References: Message-ID: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: - Use atomic load of tree size in print_on - cache_replace comments - Sort order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25112/files - new: https://git.openjdk.org/jdk/pull/25112/files/3c3e22bf..0f406adf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=01-02 Stats: 14 lines in 2 files changed: 8 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25112/head:pull/25112 PR: https://git.openjdk.org/jdk/pull/25112 From iwalulya at openjdk.org Wed May 14 08:37:54 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 14 May 2025 08:37:54 GMT Subject: RFR: 8355743: G1: Collection set clearing is not recorded as part of "Free Collection Set Time" In-Reply-To: References: Message-ID: On Wed, 14 May 2025 08:15:20 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that moves "clear_collection_set" work correctly under the "Serial Free Collection Set" phase. > > There is probably no particular impact on timings, but just for correctness. > > Testing: gha, gc/g1 tests > > Thanks, > Thomas Trivial! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25222#pullrequestreview-2839285466 From shade at openjdk.org Wed May 14 09:42:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 May 2025 09:42:53 GMT Subject: RFR: 8355743: G1: Collection set clearing is not recorded as part of "Free Collection Set Time" In-Reply-To: References: Message-ID: On Wed, 14 May 2025 08:15:20 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that moves "clear_collection_set" work correctly under the "Serial Free Collection Set" phase. > > There is probably no particular impact on timings, but just for correctness. > > Testing: gha, gc/g1 tests > > Thanks, > Thomas Makes sense. It does not look that the amount of work done in `clear_collection_set` is trivial, so I would mark this for additional backports to improve GC logging accuracy everywhere. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25222#pullrequestreview-2839489885 From ayang at openjdk.org Wed May 14 15:12:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 14 May 2025 15:12:52 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v3] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:50:33 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - use align_up_to_region_byte_size > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - Thomas Review > - nit > - refactor full collection I find it a bit odd that `G1CollectedHeap::do_full_collection` (the full-gc entry API) takes this additional argument for heap-resizing only, as I tend to think the heap-resizing part (and possibly some related logic) should *not* be considered part of full-GC. This seems to be a preexisting design: for example, at the end of `G1FullCollector::complete_collection`, the list of `heap->...` invocations suggests that this logic actually belongs to the heap. In other words, the heap should be notified upon full-GC completion and perform the necessary work accordingly?not have full-GC operate on the heap as it sees fit. Also, as written in the ticket: > It is not a major issue, as the shrinking does not actually uncommit the regions, just marks these regions for uncommitting concurrently, so the expansion is just undoing this marking. Since no OS-level commit/uncommit (the actual expensive operation) is happening, I'm not sure that fixing this issue is a net win if it comes at the cost of changing the full-GC entry API. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24944#issuecomment-2880609666 From iwalulya at openjdk.org Wed May 14 16:16:54 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 14 May 2025 16:16:54 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v3] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:50:33 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - use align_up_to_region_byte_size > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - Thomas Review > - nit > - refactor full collection I assumed the full-gc entry API is `do_full_collection(bool clear_all_soft_refs)`. Yes, we can probably structure the implementation differently, but I disagree that changing `bool do_full_collection(bool clear_all_soft_refs, bool do_maximal_compaction)` is such a big price to pay. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24944#issuecomment-2880811046 From tschatzl at openjdk.org Thu May 15 08:07:00 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 15 May 2025 08:07:00 GMT Subject: RFR: 8355743: G1: Collection set clearing is not recorded as part of "Free Collection Set Time" In-Reply-To: References: Message-ID: <3lUi716T7xfJTSGHlv8M_5j3iAHSwN15Dw455wPcfr4=.b1719285-93d9-4e6c-af3b-043563383779@github.com> On Wed, 14 May 2025 09:39:59 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> please review this change that moves "clear_collection_set" work correctly under the "Serial Free Collection Set" phase. >> >> There is probably no particular impact on timings, but just for correctness. >> >> Testing: gha, gc/g1 tests >> >> Thanks, >> Thomas > > Makes sense. It does not look that the amount of work done in `clear_collection_set` is trivial, so I would mark this for additional backports to improve GC logging accuracy everywhere. Thanks @shipilev @walulyai for your reviews. > Makes sense. It does not look that the amount of work done in clear_collection_set is trivial, so I would mark this for additional backports to improve GC logging accuracy everywhere. My measurements showed that the actual time taken is very small, although there is some per remembered set group activity (and earlier per-region activity). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25222#issuecomment-2882936403 From tschatzl at openjdk.org Thu May 15 08:07:01 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 15 May 2025 08:07:01 GMT Subject: Integrated: 8355743: G1: Collection set clearing is not recorded as part of "Free Collection Set Time" In-Reply-To: References: Message-ID: <8kqx-Y3eqfepha6jqK-f5aMnYNf1nF8U53g4SujmqXE=.78432687-8c14-4098-93ce-2367b1389294@github.com> On Wed, 14 May 2025 08:15:20 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that moves "clear_collection_set" work correctly under the "Serial Free Collection Set" phase. > > There is probably no particular impact on timings, but just for correctness. > > Testing: gha, gc/g1 tests > > Thanks, > Thomas This pull request has now been integrated. Changeset: b8d2bdb4 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/b8d2bdb46529f780b4c21d709ca38b489348ee10 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod 8355743: G1: Collection set clearing is not recorded as part of "Free Collection Set Time" Reviewed-by: iwalulya, shade ------------- PR: https://git.openjdk.org/jdk/pull/25222 From tschatzl at openjdk.org Thu May 15 08:18:47 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 15 May 2025 08:18:47 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v38] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review: remove sweep_epoch - Merge branch 'master' into card-table-as-dcq-merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review (part 2 - yield duration changes) - * ayang review (part 1) - * indentation fix - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * fixes after merge related to 32 bit x86 removal - ... and 44 more: https://git.openjdk.org/jdk/compare/5e50a584...1def83af ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=37 Stats: 7088 lines in 111 files changed: 2568 ins; 3599 del; 921 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From jsikstro at openjdk.org Thu May 15 08:24:06 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 15 May 2025 08:24:06 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v3] In-Reply-To: References: Message-ID: > Hello, > > The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. > > With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. > > To better reflect the current state of the JVM, I propose we make the following changes to separate Metaspace from GC printing: > * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. > * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). > * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. > * And the largest change in terms of LOC, separate Metaspace and GC Heap in the periodic printing before/after GC invocation(s). The periodic printing is also recorded in a ring buffer, which is printed in vmError.cpp. > > Testing: > * GHA, Oracle's tier 1-4 > * Manuel inspection of printed content Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Flip order of periodic events in hs_err files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25214/files - new: https://git.openjdk.org/jdk/pull/25214/files/3a6c6a83..711ce2a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=01-02 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25214/head:pull/25214 PR: https://git.openjdk.org/jdk/pull/25214 From jsikstro at openjdk.org Thu May 15 08:26:51 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 15 May 2025 08:26:51 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v2] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 16:14:06 GMT, Joel Sikstr?m wrote: >> Hello, >> >> The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. >> >> With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. >> >> To better reflect the current state of the JVM, I propose we make the following changes to separate Metaspace from GC printing: >> * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. >> * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). >> * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. >> * And the largest change in terms of LOC, separate Metaspace and GC Heap in the periodic printing before/after GC invocation(s). The periodic printing is also recorded in a ring buffer, which is printed in vmError.cpp. >> >> Testing: >> * GHA, Oracle's tier 1-4 >> * Manuel inspection of printed content > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > HeapInfoTest should check that GC.heap_info actually runs With feedback from @stefank, I've flipped the order of Metaspace Usage and GC Heap Usage in hs_err files so that GC Heap Usage comes first. Example output: GC Heap Usage History (70 events): Event: 0,896 {heap Before GC invocations=0 (full 0): ZHeap used 860M, capacity 860M, max capacity 9216M Cache 0M (0) } ... Metaspace Usage History (70 events): Event: 0,896 {metaspace Before GC invocations=0 (full 0): Metaspace used 18663K, committed 19008K, reserved 1114112K class space used 1601K, committed 1728K, reserved 1048576K } ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/25214#issuecomment-2882994637 From lkorinth at openjdk.org Thu May 15 09:18:38 2025 From: lkorinth at openjdk.org (Leo Korinth) Date: Thu, 15 May 2025 09:18:38 GMT Subject: RFR: 8356847: Problem list two test cases for JDK-8284234 Message-ID: <_DyYoWRJGXwUQdxw_qnJgSkUhgz7vDaW9R6lql212fs=.8412fcfd-9edb-4518-ac07-f66592a1f6d5@github.com> These tests fails intermittently. Problem list until JDK-8284234 is solved. I have verified (rebased on https://bugs.openjdk.org/browse/JDK-8356866) with: test=vmTestbase/gc/gctests/FinalizeTest04/FinalizeTest04.java make run-test TEST="${test}" JTREG="RETAIN=all;VERBOSE=all;OPTIONS=--verify-exclude" test=vmTestbase/gc/gctests/PhantomReference/phantom001/phantom001.java make run-test TEST="${test}" JTREG="RETAIN=all;VERBOSE=all;OPTIONS=--verify-exclude" ------------- Commit messages: - 8356847: Problem list two test cases for JDK-8284234 Changes: https://git.openjdk.org/jdk/pull/25209/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25209&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356847 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25209/head:pull/25209 PR: https://git.openjdk.org/jdk/pull/25209 From tschatzl at openjdk.org Thu May 15 09:46:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 15 May 2025 09:46:51 GMT Subject: RFR: 8356847: Problem list two test cases for JDK-8284234 In-Reply-To: <_DyYoWRJGXwUQdxw_qnJgSkUhgz7vDaW9R6lql212fs=.8412fcfd-9edb-4518-ac07-f66592a1f6d5@github.com> References: <_DyYoWRJGXwUQdxw_qnJgSkUhgz7vDaW9R6lql212fs=.8412fcfd-9edb-4518-ac07-f66592a1f6d5@github.com> Message-ID: On Tue, 13 May 2025 12:49:06 GMT, Leo Korinth wrote: > These tests fails intermittently. Problem list until JDK-8284234 is solved. > > I have verified (rebased on https://bugs.openjdk.org/browse/JDK-8356866) with: > > test=vmTestbase/gc/gctests/FinalizeTest04/FinalizeTest04.java > make run-test TEST="${test}" JTREG="RETAIN=all;VERBOSE=all;OPTIONS=--verify-exclude" > test=vmTestbase/gc/gctests/PhantomReference/phantom001/phantom001.java > make run-test TEST="${test}" JTREG="RETAIN=all;VERBOSE=all;OPTIONS=--verify-exclude" lgtm and trivial. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25209#pullrequestreview-2843013002 From shade at openjdk.org Thu May 15 10:48:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 May 2025 10:48:09 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v17] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: - Merge branch 'master' into JDK-8231269-compile-task-weaks - Rename CompilerTask::is_unloaded back to avoid losing comment context - Simplify select_for_compilation - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - Fix release builds - More thorough locking and redefinition escape hatch - Fix build failures: add more headers - Tracking UMH state more accurately - Rework for safer concurrency - ... and 20 more: https://git.openjdk.org/jdk/compare/5c73dfc2...4d33a4d5 ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=16 Stats: 422 lines in 12 files changed: 379 ins; 19 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From jsikstro at openjdk.org Thu May 15 11:47:07 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 15 May 2025 11:47:07 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v4] In-Reply-To: References: Message-ID: > Hello, > > The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. > > With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. > > To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: > * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. > * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). > * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. > * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. > > Testing: > * GHA, Oracle's tier 1-4 > * Manuel inspection of printed content Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Update new order in tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25214/files - new: https://git.openjdk.org/jdk/pull/25214/files/711ce2a2..689a2230 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=02-03 Stats: 6 lines in 2 files changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25214/head:pull/25214 PR: https://git.openjdk.org/jdk/pull/25214 From epeter at openjdk.org Thu May 15 13:28:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 13:28:58 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 17:40:40 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Extend comments in zLoadP implementations to explain role of reload Two small nits/questions, but otherwise ready from my side :) src/hotspot/share/opto/lcm.cpp line 80: > 78: > 79: void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { > 80: assert(n->bottom_type() != Type::CONTROL, "cannot move control node"); I usually check `n->is_CFG()`. What is the bottom type of an `IfNode`? `virtual const Type *bottom_type() const { return TypeTuple::IFBOTH; }` Are you aware of that? src/hotspot/share/opto/lcm.cpp line 97: > 95: > 96: void PhaseCFG::ensure_node_is_at_block_or_above(Node* n, Block* b) { > 97: assert(n->bottom_type() != Type::CONTROL, "cannot move control node"); Same question here test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 40: > 38: different GC memory accesses. > 39: * @library /test/lib / > 40: * @run driver compiler.gcbarriers.TestImplicitNullChecks I suppose you could still have a special run with `ZGC` and one with `G1GC`. But not sure if that is worth it, or if we do that in higher tiers anyway? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2843684261 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2091166322 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2091166890 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2091170584 From jsikstro at openjdk.org Thu May 15 14:07:08 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 15 May 2025 14:07:08 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge Message-ID: Hello, This RFE improves utility for converting to/from, iterating over and defining structures that are indexed using the `ZPageAge` type. Converting to/from ZPageAge and its underlying type (uint8_t, often just uint) is currently done via using static_cast. This works fine because sane values are converted in all use cases. However, to make conversion safer (and also more readable), I propose we add a `to_zpageage` and a corresponding `untype` that checks that the conversion is valid. Such conversion methods should be used instead of calling `static_cast`. We currently define a value called `ZPageAgeMax`, which is defined as `static_cast(ZPageAge::old)`. The majority of places that use this value actualy use `ZPageAgeMax + 1`, which is equivalent to the number of ages. Instead, I propose we define and use a value that represents the number of possible ages, called `ZPageAgeCount`. Lastly, to make iterating over ages more accessible, I propose we create an intreface of enum iterators of ZPageAge. This will also create a foundation for generating values that require a ZPageAge in the future. Since the end of the enum iterators are exclusive, I've opted to use the following value as end for the iterators: constexpr ZPageAge ZPageAgeLastPlusOne = static_cast(ZPageAgeCount); I see us using either this or a sentinel/dummy value at the end of the enum class, but I prefer having a value similar to `ZPageAgeLastPlusOne` over a dummy value. Testing: * Currently running through Oracle's tier 1-4 * GHA ------------- Commit messages: - 8357053: ZGC: Improved utility for ZPageAge Changes: https://git.openjdk.org/jdk/pull/25251/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25251&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357053 Stats: 131 lines in 13 files changed: 81 ins; 7 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/25251.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25251/head:pull/25251 PR: https://git.openjdk.org/jdk/pull/25251 From rcastanedalo at openjdk.org Thu May 15 14:26:40 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 May 2025 14:26:40 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v4] In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Remove comment that is only applicable to x64, not aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25066/files - new: https://git.openjdk.org/jdk/pull/25066/files/20d960e6..a52b0730 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From lkorinth at openjdk.org Thu May 15 16:06:56 2025 From: lkorinth at openjdk.org (Leo Korinth) Date: Thu, 15 May 2025 16:06:56 GMT Subject: RFR: 8356847: Problem list two test cases for JDK-8284234 In-Reply-To: <_DyYoWRJGXwUQdxw_qnJgSkUhgz7vDaW9R6lql212fs=.8412fcfd-9edb-4518-ac07-f66592a1f6d5@github.com> References: <_DyYoWRJGXwUQdxw_qnJgSkUhgz7vDaW9R6lql212fs=.8412fcfd-9edb-4518-ac07-f66592a1f6d5@github.com> Message-ID: <6oLRdtT4evPDg0MtmzgmAKe6ITryT6q47Jmj4UDgS4c=.8cdce9f0-464a-4729-87cd-56914b021ebd@github.com> On Tue, 13 May 2025 12:49:06 GMT, Leo Korinth wrote: > These tests fails intermittently. Problem list until JDK-8284234 is solved. > > I have verified (rebased on https://bugs.openjdk.org/browse/JDK-8356866) with: > > test=vmTestbase/gc/gctests/FinalizeTest04/FinalizeTest04.java > make run-test TEST="${test}" JTREG="RETAIN=all;VERBOSE=all;OPTIONS=--verify-exclude" > test=vmTestbase/gc/gctests/PhantomReference/phantom001/phantom001.java > make run-test TEST="${test}" JTREG="RETAIN=all;VERBOSE=all;OPTIONS=--verify-exclude" Thanks Thomas! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25209#issuecomment-2884350410 From lkorinth at openjdk.org Thu May 15 16:06:57 2025 From: lkorinth at openjdk.org (Leo Korinth) Date: Thu, 15 May 2025 16:06:57 GMT Subject: Integrated: 8356847: Problem list two test cases for JDK-8284234 In-Reply-To: <_DyYoWRJGXwUQdxw_qnJgSkUhgz7vDaW9R6lql212fs=.8412fcfd-9edb-4518-ac07-f66592a1f6d5@github.com> References: <_DyYoWRJGXwUQdxw_qnJgSkUhgz7vDaW9R6lql212fs=.8412fcfd-9edb-4518-ac07-f66592a1f6d5@github.com> Message-ID: On Tue, 13 May 2025 12:49:06 GMT, Leo Korinth wrote: > These tests fails intermittently. Problem list until JDK-8284234 is solved. > > I have verified (rebased on https://bugs.openjdk.org/browse/JDK-8356866) with: > > test=vmTestbase/gc/gctests/FinalizeTest04/FinalizeTest04.java > make run-test TEST="${test}" JTREG="RETAIN=all;VERBOSE=all;OPTIONS=--verify-exclude" > test=vmTestbase/gc/gctests/PhantomReference/phantom001/phantom001.java > make run-test TEST="${test}" JTREG="RETAIN=all;VERBOSE=all;OPTIONS=--verify-exclude" This pull request has now been integrated. Changeset: b3e856f9 Author: Leo Korinth URL: https://git.openjdk.org/jdk/commit/b3e856f9b37078969478809207b63fb6bc9c5f13 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8356847: Problem list two test cases for JDK-8284234 Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/25209 From duke at openjdk.org Thu May 15 21:32:50 2025 From: duke at openjdk.org (Hendrik Schick) Date: Thu, 15 May 2025 21:32:50 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge In-Reply-To: References: Message-ID: On Thu, 15 May 2025 13:59:29 GMT, Joel Sikstr?m wrote: > Hello, > > This RFE improves utility for converting to/from, iterating over and defining structures that are indexed using the `ZPageAge` type. > > Converting to/from ZPageAge and its underlying type (uint8_t, often just uint) is currently done via using static_cast. This works fine because sane values are converted in all use cases. However, to make conversion safer (and also more readable), I propose we add a `to_zpageage` and a corresponding `untype` that checks that the conversion is valid. Such conversion methods should be used instead of calling `static_cast`. > > We currently define a value called `ZPageAgeMax`, which is defined as `static_cast(ZPageAge::old)`. The majority of places that use this value actualy use `ZPageAgeMax + 1`, which is equivalent to the number of ages. Instead, I propose we define and use a value that represents the number of possible ages, called `ZPageAgeCount`. > > Lastly, to make iterating over ages more accessible, I propose we create an intreface of enum iterators of ZPageAge. This will also create a foundation for generating values that require a ZPageAge in the future. Since the end of the enum iterators are exclusive, I've opted to use the following value as end for the iterators: > > constexpr ZPageAge ZPageAgeLastPlusOne = static_cast(ZPageAgeCount); > > > I see us using either this or a sentinel/dummy value at the end of the enum class, but I prefer having a value similar to `ZPageAgeLastPlusOne` over a dummy value. > > Testing: > * Currently running through Oracle's tier 1-4 > * GHA copyright-year update is missing on at least 2 files ------------- PR Comment: https://git.openjdk.org/jdk/pull/25251#issuecomment-2885097918 From rcastanedalo at openjdk.org Fri May 16 04:15:02 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 04:15:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 15 May 2025 13:25:21 GMT, Emanuel Peter wrote: > But not sure if that is worth it, or if we do that in higher tiers anyway? We already run this test with G1 (default) on tier1 and with all non-default GCs (including ZGC) in Oracle's internal tier3. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2092286950 From ayang at openjdk.org Fri May 16 05:39:53 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 16 May 2025 05:39:53 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v3] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:50:33 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - use align_up_to_region_byte_size > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - Thomas Review > - nit > - refactor full collection Since all invocations of full-GC go through the following version, it is the de facto full-GC entry point: bool do_full_collection(bool clear_all_soft_refs, bool do_maximal_compaction, size_t allocation_word_size); The newly-add argument is at a different level of abstraction compared to the existing parameters, so this part feels like a small detour -- mostly caused by a preexisting design issue. If that design issue were resolved, we likely wouldn't need this extra argument at all. With that in mind, what do you think about addressing the underlying design problem first (or doing so alongside an improved `full_collection_resize_amount` policy)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24944#issuecomment-2885685487 From iwalulya at openjdk.org Fri May 16 06:03:58 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 16 May 2025 06:03:58 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v3] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 05:37:09 GMT, Albert Mingkun Yang wrote: > If that design issue were resolved, we likely wouldn't need this extra argument at all. With that in mind, what do you think about addressing the underlying design problem first (or doing so alongside an improved `full_collection_resize_amount` policy)? That sounds good to me. Do you mind creating the CR describing this design issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24944#issuecomment-2885720345 From jsikstro at openjdk.org Fri May 16 07:11:38 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 16 May 2025 07:11:38 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v2] In-Reply-To: References: Message-ID: <24IePcN9bC99HgcU_rg6t0cKl4p-pQSSifJtqgokyxY=.2037c51e-4159-4a69-824f-46abb8801256@github.com> > Hello, > > This RFE improves utility for converting to/from, iterating over and defining structures that are indexed using the `ZPageAge` type. > > Converting to/from ZPageAge and its underlying type (uint8_t, often just uint) is currently done via using static_cast. This works fine because sane values are converted in all use cases. However, to make conversion safer (and also more readable), I propose we add a `to_zpageage` and a corresponding `untype` that checks that the conversion is valid. Such conversion methods should be used instead of calling `static_cast`. > > We currently define a value called `ZPageAgeMax`, which is defined as `static_cast(ZPageAge::old)`. The majority of places that use this value actualy use `ZPageAgeMax + 1`, which is equivalent to the number of ages. Instead, I propose we define and use a value that represents the number of possible ages, called `ZPageAgeCount`. > > Lastly, to make iterating over ages more accessible, I propose we create an intreface of enum iterators of ZPageAge. This will also create a foundation for generating values that require a ZPageAge in the future. Since the end of the enum iterators are exclusive, I've opted to use the following value as end for the iterators: > > constexpr ZPageAge ZPageAgeLastPlusOne = static_cast(ZPageAgeCount); > > > I see us using either this or a sentinel/dummy value at the end of the enum class, but I prefer having a value similar to `ZPageAgeLastPlusOne` over a dummy value. > > Testing: > * Oracle's tier 1-4 > * GHA Joel Sikstr?m has updated the pull request incrementally with two additional commits since the last revision: - Copyright years - Simplify untype(ZPageAge age) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25251/files - new: https://git.openjdk.org/jdk/pull/25251/files/baee83e6..3e0af5e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25251&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25251&range=00-01 Stats: 8 lines in 7 files changed: 0 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25251.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25251/head:pull/25251 PR: https://git.openjdk.org/jdk/pull/25251 From ayang at openjdk.org Fri May 16 07:24:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 16 May 2025 07:24:55 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v3] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:50:33 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - use align_up_to_region_byte_size > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - Thomas Review > - nit > - refactor full collection It's alluded in my previous msg "This seems to be a preexisting design:..."; extracted out to be standalone: https://bugs.openjdk.org/browse/JDK-8357108 WDYT? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24944#issuecomment-2885871932 From rcastanedalo at openjdk.org Fri May 16 07:44:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 07:44:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Replace control type with PhaseCFG::is_CFG test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25066/files - new: https://git.openjdk.org/jdk/pull/25066/files/a52b0730..b92500a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From rcastanedalo at openjdk.org Fri May 16 07:51:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 07:51:59 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 15 May 2025 13:23:21 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Extend comments in zLoadP implementations to explain role of reload > > src/hotspot/share/opto/lcm.cpp line 80: > >> 78: >> 79: void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { >> 80: assert(n->bottom_type() != Type::CONTROL, "cannot move control node"); > > I usually check `n->is_CFG()`. > > What is the bottom type of an `IfNode`? > `virtual const Type *bottom_type() const { return TypeTuple::IFBOTH; }` > Are you aware of that? Note that the analysis operates at the Mach level, where `Node::is_CFG()` is not complete anymore and `If` nodes have been replaced by their platform-dependent implementations. I replaced the `n->bottom_type() != Type::CONTROL` test with `!PhaseCFG::is_CFG(n)`, which is analogous to `Node::is_CFG()` at the Mach level (and covers some additional nodes without control type that should not be moved anyway), see commit b92500a2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2092524938 From epeter at openjdk.org Fri May 16 08:06:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 08:06:00 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 07:44:53 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Replace control type with PhaseCFG::is_CFG test Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2845835872 From rcastanedalo at openjdk.org Fri May 16 08:06:01 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 08:06:01 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 15 May 2025 13:25:56 GMT, Emanuel Peter wrote: > Two small nits/questions, but otherwise ready from my side :) Thanks again for reviewing @eme64, I have addressed your questions now. And thanks also for your review @vnkozlov. @stefank @fisk @xmas92 @jsikstro may I get a review from the GC side? @RealFYang @TheRealMDoerr note that this PR also introduces implicit null check support for ZGC loads in RISC-V and PPC, but I cannot test it beyond GHA. May I ask you to test the changes on your respective platforms? (or let me know if you prefer to add the support in separate PRs). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2885960803 From epeter at openjdk.org Fri May 16 08:06:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 08:06:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 07:49:01 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/lcm.cpp line 80: >> >>> 78: >>> 79: void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { >>> 80: assert(n->bottom_type() != Type::CONTROL, "cannot move control node"); >> >> I usually check `n->is_CFG()`. >> >> What is the bottom type of an `IfNode`? >> `virtual const Type *bottom_type() const { return TypeTuple::IFBOTH; }` >> Are you aware of that? > > Note that the analysis operates at the Mach level, where `Node::is_CFG()` is not complete anymore and `If` nodes have been replaced by their platform-dependent implementations. I replaced the `n->bottom_type() != Type::CONTROL` test with `!PhaseCFG::is_CFG(n)`, which is analogous to `Node::is_CFG()` at the Mach level (and covers some additional nodes without control type that should not be moved anyway), see commit b92500a2. Ah, makes sense, did not know that ? Thanks for the update! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2092546157 From ayang at openjdk.org Fri May 16 08:36:22 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 16 May 2025 08:36:22 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v2] In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: <3l8x32wbOr2FZzLV3lYfSbch-6hlT1te0uZXUeQVAcQ=.3ff8422e-fc0a-492f-a6bc-0df6acbc9a66@github.com> > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - review - Merge branch 'master' into pgc-size-policy - pgc-size-policy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25000/files - new: https://git.openjdk.org/jdk/pull/25000/files/36583a8f..dc9eb4f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=00-01 Stats: 2851 lines in 90 files changed: 2471 ins; 182 del; 198 mod Patch: https://git.openjdk.org/jdk/pull/25000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25000/head:pull/25000 PR: https://git.openjdk.org/jdk/pull/25000 From ayang at openjdk.org Fri May 16 08:36:25 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 16 May 2025 08:36:25 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v2] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Sun, 11 May 2025 14:24:43 GMT, Guoxiong Li wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into pgc-size-policy >> - pgc-size-policy > > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 343: > >> 341: if (is_gc_overhead_limit_reached()) { >> 342: return nullptr; >> 343: } > > It seems the parameter `gc_overhead_limit_was_exceeded` and the field `MemAllocator::Allocation::_overhead_limit_exceeded` are not used by all GCs now. Should we keep the parameter and set it as `true` under the condition `is_gc_overhead_limit_reached()`? For example: > > > if (op.prologue_succeeded()) { > assert(is_in_or_null(op.result()), "result not in heap"); > if (is_gc_overhead_limit_reached()) { > *gc_overhead_limit_was_exceeded = true; > return nullptr; > } > return op.result(); > } > > > Or we should remove the parameter and the field in another PR. Since this is not implemented by any other GCs, I think it's best to remove it in a follow-up PR. > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 825: > >> 823: // If MinHeapFreeRatio is at its default value; shrink cautiously. Otherwise, users expect prompt shrinking. >> 824: if (FLAG_IS_DEFAULT(MinHeapFreeRatio) && MinHeapFreeRatio == 0) { >> 825: if (desired_capacity < current_capacity) { > > I think curiously a lot about the condition `MinHeapFreeRatio == 0` and then I find the following code in `parallelArguments.cpp`. May it be better to use `UseAdaptiveSizePolicy && FLAG_IS_DEFAULT(MinHeapFreeRatio)` here instead of `FLAG_IS_DEFAULT(MinHeapFreeRatio) && MinHeapFreeRatio == 0`? > > > if (UseAdaptiveSizePolicy) { > // We don't want to limit adaptive heap sizing's freedom to adjust the heap > // unless the user actually sets these flags. > if (FLAG_IS_DEFAULT(MinHeapFreeRatio)) { > FLAG_SET_DEFAULT(MinHeapFreeRatio, 0); > } > if (FLAG_IS_DEFAULT(MaxHeapFreeRatio)) { > FLAG_SET_DEFAULT(MaxHeapFreeRatio, 100); > } > } This method is invoked only when `UseAdaptiveSizePolicy == true`. Removed `MinHeapFreeRatio == 0`. > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 862: > >> 860: resize_old_gen_after_full_gc(); >> 861: young_gen()->resize_after_full_gc(); >> 862: } > > The `PSYoungGen` has its methods `resize_after_full_gc` and `resize_after_young_gc`. I think such design is good. What about moving the method `resize_old_gen_after_full_gc` (and the related method `calculate_desired_old_gen_capacity`) to `PSOldGen` and renaming it as `resize_after_full_gc`? `resize_old_gen_after_full_gc` contains some logic around `MinHeapFreeRatio` that makes it unsuitable to be placed inside old-gen, IMO. Given there is on-going work/discussion on removing/limiting MinHeapFreeRatio in https://bugs.openjdk.org/browse/JDK-8353716 in G1, I think we don't need to optimize for the structure too much, as it will probably be changed soon. > src/hotspot/share/gc/parallel/psAdaptiveSizePolicy.cpp line 45: > >> 43: _avg_promoted(new AdaptivePaddedNoZeroDevAverage(AdaptiveSizePolicyWeight, PromotedPadding)), >> 44: _space_alignment(space_alignment), >> 45: _young_gen_size_increment_supplement(YoungGenerationSizeSupplement) {} > > Typos in `gc_globals.hpp`(shown below): `YoungedGenerationSizeIncrement` and `YoungedGenerationSizeSupplement`. It should be fixed in another PR. > > product(uint, YoungGenerationSizeIncrement, 20, \ > "Adaptive size percentage change in young generation") \ > range(0, 100) \ > \ > product(uint, YoungGenerationSizeSupplement, 80, \ > "Supplement to YoungedGenerationSizeIncrement used at startup") \ // <--- here > range(0, 100) \ > \ > product(uintx, YoungGenerationSizeSupplementDecay, 8, \ > "Decay factor to YoungedGenerationSizeSupplement") \ // <--- here > range(1, max_uintx) \ Filed: https://bugs.openjdk.org/browse/JDK-8357109 > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1104: > >> 1102: heap->post_full_gc_dump(&_gc_timer); >> 1103: >> 1104: size_policy->record_gc_pause_end_instant(); > > What about moving this invocation into `major_collection_end`? Just like the `record_gc_pause_start_instant` and `major_collection_begin`. This method should be called at the end of gc-pause to better reflect the actual mutator-running/paused time. OTOH, we also adaptive-resizing using gc-pause-time, so there is a circular dependency. Therefore, I invoke `major_collection_end` before adaptive-resizing as a compromise. This issue is more evident for young-gc, as young-gc is usually much shorter; see the comment next to `size_policy->minor_collection_end`. > src/hotspot/share/gc/shared/adaptiveSizePolicy.hpp line 179: > >> 177: _gc_distance_timer.reset(); >> 178: _gc_distance_timer.start(); >> 179: } > > The method name `record_gc_pause_end_instant` is about `gc pause`, but the code is about `gc distance`. May we need a clearer name? In stw-gc, when a gc-pause ends, mutators start running, so the distance btw two consecutive gc-pauses start to tick. This looks quite clear to me, but I am ofc biased. What names do you suggest? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2092493546 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2092594791 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2092589932 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2092502996 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2092520272 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2092526857 From aboldtch at openjdk.org Fri May 16 08:40:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 16 May 2025 08:40:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 07:44:53 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Replace control type with PhaseCFG::is_CFG test The GC changes looks good. Only took a cursory look of the ADLC and C2 changes, but nothing stands out. Only had a small comment about `legitimize_address_requires_lea`. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 141: > 139: Address legitimize_address(const Address &a, int size, Register scratch) { > 140: if (a.getMode() == Address::base_plus_offset) { > 141: if (legitimize_address_requires_lea(a, size)) { It is a little strange that `legitimize_address_requires_lea` is only the second condition and not return a.getMode() == Address::base_plus_offset && !Address::offset_ok_for_immed(a.offset(), exact_log2(size)); And have the check in `legitimize_address` simply be `if (legitimize_address_requires_lea(a, size))` I guess we never end up calling `legitimize_address_requires_lea` with a literal address, where it would assert in `a.offset()`. But requiring the Address parameter of legitimize_address_requires_lea to be in a specific mode as a precondition seems weird to me. ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2845912788 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2092596572 From mdoerr at openjdk.org Fri May 16 09:36:12 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 16 May 2025 09:36:12 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 07:44:53 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Replace control type with PhaseCFG::is_CFG test Thanks for implementing it and thanks for the ping. It basically works on PPC64, but one IR rule is failing: Failed IR Rules (1) of Methods (1) ---------------------------------- 1) Method "static java.lang.Object compiler.gcbarriers.TestImplicitNullChecks.testLoadVolatile(compiler.gcbarriers.TestImplicitNullChecks$OuterWithVolatileField)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={FINAL_CODE}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#NULL_CHECK#_", "1"}, applyIfPlatformOr={}, applyIfPlatform={"aarch64", "false"}, failOn={}, applyIfOr={"UseZGC", "true", "UseG1GC", "true"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "Final Code": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(NullCheck.*)+(\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! This is probably because PPC64 uses a membar_volatile before volatile load, so the graph looks differently: 33 Prolog === [[ ]] [2380000000033] 9 MachProj === 10 [[ 8 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) R3 11 MachProj === 10 [[ 8 26 ]] #5 Oop:compiler/gcbarriers/TestImplicitNullChecks$OuterWithVolatileField * !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) 12 MachProj === 10 [[ 4 17 ]] #1/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) 13 MachProj === 10 [[ 4 21 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) R1 14 MachProj === 10 [[ 4 2 17 ]] #3 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) 15 MachProj === 10 [[ 4 17 ]] #4 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) 0 Con === 10 [[ ]] #top 8 zeroCheckP_reg_imm0 === 9 11 [[ 7 22 ]] P=0.000001, C=-1.000000 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) BB#002: 31 Region === 31 22 [[ 31 21 26 ]] 21 membar_volatile === 31 0 13 0 0 [[ 20 23 ]] !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 20 MachProj === 21 [[ 19 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 23 MachProj === 21 [[ 19 26 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) R15 26 loadN_ac === 31 23 11 [[ 25 19 ]] #12/0x000000000000000c Volatile!narrowoop: java/lang/Object * 19 unnecessary_membar_acquire === 20 0 23 0 0 |26 0 [[ 18 24 ]] !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 18 MachProj === 19 [[ 17 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 24 MachProj === 19 [[ 17 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) R3 25 decodeN_unscaled === _ 26 [[ 17 ]] java/lang/Object * Oop:java/lang/Object * !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 34 Epilog === [[ ]] [2380000000034] 17 Ret === 18 12 24 14 15 25 [[ 1 ]] BB#003: 30 Region === 30 7 [[ 30 4 ]] R3 16 loadConI16 === 1 [[ 4 ]] #-10/0xfffffff6 6 ConP === 10 [[ 4 ]] #null 4 CallStaticJavaDirect === 30 12 13 14 15 16 0 6 [[ 5 3 32 ]] Static wrapper for: uncommon_trap(reason='null_check' action='maybe_recompile') # void ( int ) C=0.000100 TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) reexecute !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 5 MachProj === 4 [[ ]] #10005/fat 3 MachProj === 4 [[ 2 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) R14 32 MachProj === 4 [[ ]] #6/fat 2 ShouldNotReachHere === 3 0 0 14 0 [[ 1 ]] !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) I guess it's not worth stepping over the memory barrier. Disabling this rule for PPC64 should be ok, too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2886188487 From zgu at openjdk.org Fri May 16 12:56:51 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Fri, 16 May 2025 12:56:51 GMT Subject: RFR: 8356157: Remove retry loop in collect of SerialHeap and ParallelScavengeHeap In-Reply-To: References: Message-ID: On Mon, 5 May 2025 10:36:11 GMT, Albert Mingkun Yang wrote: > Simple removing unnecessary retrying logic because an gc-operation will run-to-completion, guaranteeing the increment of corresponding counters. > > Test: tier1-3 LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25032#pullrequestreview-2846546163 From aph at openjdk.org Fri May 16 13:00:57 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 16 May 2025 13:00:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 07:44:53 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Replace control type with PhaseCFG::is_CFG test src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 131: > 129: !MacroAssembler::legitimize_address_requires_lea(ref_addr, size), > 130: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); > 131: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); I just saw this. I think it might be simpler and better to handle this case in the segfault handler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2092999625 From ayang at openjdk.org Fri May 16 13:07:14 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 16 May 2025 13:07:14 GMT Subject: RFR: 8356157: Remove retry loop in collect of SerialHeap and ParallelScavengeHeap [v2] In-Reply-To: References: Message-ID: > Simple removing unnecessary retrying logic because an gc-operation will run-to-completion, guaranteeing the increment of corresponding counters. > > Test: tier1-3 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into remove-systemgc-loop - remove-systemgc-loop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25032/files - new: https://git.openjdk.org/jdk/pull/25032/files/958d1315..8a04d3f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25032&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25032&range=00-01 Stats: 40645 lines in 1443 files changed: 23144 ins; 9777 del; 7724 mod Patch: https://git.openjdk.org/jdk/pull/25032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25032/head:pull/25032 PR: https://git.openjdk.org/jdk/pull/25032 From rcastanedalo at openjdk.org Fri May 16 15:02:06 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 15:02:06 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 12:57:54 GMT, Andrew Haley wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace control type with PhaseCFG::is_CFG test > > src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 131: > >> 129: !MacroAssembler::legitimize_address_requires_lea(ref_addr, size), >> 130: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); >> 131: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); > > I just saw this. I think it might be simpler and better to handle this case in the segfault handler. OK. C2 does not currently support creating exception table entries with arbitrary offsets relative to the start address of the code emitted for a Mach node, so that support would have to be added. I prototyped this support [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks), see calls to `record_exception_pc_offset()`. I don't think it is, overall, simpler than the approach proposed in this PR - definitely not from a `PhaseOutput`/`C2_MacroAssembler` perspective. But if you still think it is worth exploring, I will create a new prototype with the `record_exception_pc_offset()` on top of this PR to make it easier to compare. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2093219601 From wkemper at openjdk.org Fri May 16 17:34:28 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 16 May 2025 17:34:28 GMT Subject: RFR: 8354078: Shenandoah: Make the generational mode be non-experimental (implementation) Message-ID: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> Testing: % ./build/linux-x86_64-server-fastdebug/jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational --version openjdk 25 2025-09-16 OpenJDK Runtime Environment (fastdebug build 25-make-genshen-non-experimental) OpenJDK 64-Bit Server VM (fastdebug build 25-make-genshen-non-experimental, mixed mode) ------------- Commit messages: - Make generational mode non-experimental Changes: https://git.openjdk.org/jdk/pull/25270/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25270&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354078 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25270/head:pull/25270 PR: https://git.openjdk.org/jdk/pull/25270 From ysr at openjdk.org Fri May 16 23:44:52 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 16 May 2025 23:44:52 GMT Subject: RFR: 8354078: Shenandoah: Make the generational mode be non-experimental (implementation) In-Reply-To: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> References: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> Message-ID: On Fri, 16 May 2025 17:30:11 GMT, William Kemper wrote: > Testing: > > % ./build/linux-x86_64-server-fastdebug/jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational --version > openjdk 25 2025-09-16 > OpenJDK Runtime Environment (fastdebug build 25-make-genshen-non-experimental) > OpenJDK 64-Bit Server VM (fastdebug build 25-make-genshen-non-experimental, mixed mode) Break out the production bubbly! ? Reviewed! ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25270#pullrequestreview-2847770147 From ysr at openjdk.org Fri May 16 23:50:50 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 16 May 2025 23:50:50 GMT Subject: RFR: 8354078: Shenandoah: Make the generational mode be non-experimental (implementation) In-Reply-To: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> References: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> Message-ID: On Fri, 16 May 2025 17:30:11 GMT, William Kemper wrote: > Testing: > > % ./build/linux-x86_64-server-fastdebug/jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational --version > openjdk 25 2025-09-16 > OpenJDK Runtime Environment (fastdebug build 25-make-genshen-non-experimental) > OpenJDK 64-Bit Server VM (fastdebug build 25-make-genshen-non-experimental, mixed mode) Just noticed that gc/shenandoah/options/TestModeUnlock.java needs to be taught that "generational" is no longer experimental. It's failing in the GHA tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25270#issuecomment-2887854018 From manc at openjdk.org Sat May 17 05:15:15 2025 From: manc at openjdk.org (Man Cao) Date: Sat, 17 May 2025 05:15:15 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v9] In-Reply-To: References: Message-ID: <1JUF8X8c_xjYPO_RdMYkHSYCXoeqHnTrThwtlL6Fz28=.e9ce9f8f-d90d-46fb-800f-f42c2266161c@github.com> > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge branch 'master' into JDK-8236073-softmaxheap - Use Atomic::load for flag - Fix test failure on macos-aarch64 by using power-of-two sizes. - Address comments and try fixing test failure on macos-aarch64 - Revise test summary - Add two tests - Merge branch 'master' into JDK-8236073-softmaxheap - Update copyright year. - G1: Use SoftMaxHeapSize to guide GC heuristics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/c60ade41..fc87a0bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=07-08 Stats: 365086 lines in 4371 files changed: 126146 ins; 220004 del; 18936 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From aph at openjdk.org Sat May 17 09:06:54 2025 From: aph at openjdk.org (Andrew Haley) Date: Sat, 17 May 2025 09:06:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 14:59:18 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 131: >> >>> 129: !MacroAssembler::legitimize_address_requires_lea(ref_addr, size), >>> 130: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); >>> 131: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); >> >> I just saw this. I think it might be simpler and better to handle this case in the segfault handler. > > OK. C2 does not currently support creating exception table entries with arbitrary offsets relative to the start address of the code emitted for a Mach node, so that support would have to be added. I prototyped this support [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks), see calls to `record_exception_pc_offset()`. I don't think it is, overall, simpler than the approach proposed in this PR - definitely not from a `PhaseOutput`/`C2_MacroAssembler` perspective. But if you still think it is worth exploring, I will create a new prototype with the `record_exception_pc_offset()` on top of this PR to make it easier to compare. I don't think you have to do that. I think you only have to mark both the lea and the memory access with an exception table entry. The segfault handler sees the two entries, deduces that this access is split into two instructions, and does the right thing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2094064464 From zgu at openjdk.org Sun May 18 00:28:56 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Sun, 18 May 2025 00:28:56 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v2] In-Reply-To: <3l8x32wbOr2FZzLV3lYfSbch-6hlT1te0uZXUeQVAcQ=.3ff8422e-fc0a-492f-a6bc-0df6acbc9a66@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> <3l8x32wbOr2FZzLV3lYfSbch-6hlT1te0uZXUeQVAcQ=.3ff8422e-fc0a-492f-a6bc-0df6acbc9a66@github.com> Message-ID: <-k-DamMcH1pZ4vSAkWjhlM5PD777oPKlkrX0JK2SsSk=.de6913a9-3e2f-4cc5-bc53-e251c82ed78d@github.com> On Fri, 16 May 2025 08:36:22 GMT, Albert Mingkun Yang wrote: >> This patch refines Parallel's sizing strategy to improve overall memory management and performance. >> >> The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. >> >> `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. >> >> GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. >> >> ## Performance evaluation >> >> - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). >> - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). >> - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. >> >> PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into pgc-size-policy > - pgc-size-policy src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 51: > 49: #include "oops/oop.inline.hpp" > 50: #include "runtime/cpuTimeCounters.hpp" > 51: #include "runtime/globals_extension.hpp" Don't see why it is needed. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 404: > 402: collect_at_safepoint(!should_run_young_gc); > 403: > 404: if (is_gc_overhead_limit_reached()) { Maybe want to adopt current algorithm, start to clear soft references when approaching gc overhead limit? Running a full gc and clearing all soft references without retrying allocation and throws OOM, seems a bit harsh. People still use soft references for caches, reclaim soft references could potentially free large amount of memory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2087170321 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2087828140 From gli at openjdk.org Sun May 18 12:20:51 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 18 May 2025 12:20:51 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v2] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Fri, 16 May 2025 07:27:56 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 343: >> >>> 341: if (is_gc_overhead_limit_reached()) { >>> 342: return nullptr; >>> 343: } >> >> It seems the parameter `gc_overhead_limit_was_exceeded` and the field `MemAllocator::Allocation::_overhead_limit_exceeded` are not used by all GCs now. Should we keep the parameter and set it as `true` under the condition `is_gc_overhead_limit_reached()`? For example: >> >> >> if (op.prologue_succeeded()) { >> assert(is_in_or_null(op.result()), "result not in heap"); >> if (is_gc_overhead_limit_reached()) { >> *gc_overhead_limit_was_exceeded = true; >> return nullptr; >> } >> return op.result(); >> } >> >> >> Or we should remove the parameter and the field in another PR. > > Since this is not implemented by any other GCs, I think it's best to remove it in a follow-up PR. Filed https://bugs.openjdk.org/browse/JDK-8357188 >> src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 862: >> >>> 860: resize_old_gen_after_full_gc(); >>> 861: young_gen()->resize_after_full_gc(); >>> 862: } >> >> The `PSYoungGen` has its methods `resize_after_full_gc` and `resize_after_young_gc`. I think such design is good. What about moving the method `resize_old_gen_after_full_gc` (and the related method `calculate_desired_old_gen_capacity`) to `PSOldGen` and renaming it as `resize_after_full_gc`? > > `resize_old_gen_after_full_gc` contains some logic around `MinHeapFreeRatio` that makes it unsuitable to be placed inside old-gen, IMO. Given there is on-going work/discussion on removing/limiting MinHeapFreeRatio in https://bugs.openjdk.org/browse/JDK-8353716 in G1, I think we don't need to optimize for the structure too much, as it will probably be changed soon. OK ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094507209 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094507227 From gli at openjdk.org Sun May 18 12:45:58 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 18 May 2025 12:45:58 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v2] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Fri, 16 May 2025 07:50:13 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/shared/adaptiveSizePolicy.hpp line 179: >> >>> 177: _gc_distance_timer.reset(); >>> 178: _gc_distance_timer.start(); >>> 179: } >> >> The method name `record_gc_pause_end_instant` is about `gc pause`, but the code is about `gc distance`. May we need a clearer name? > > In stw-gc, when a gc-pause ends, mutators start running, so the distance btw two consecutive gc-pauses start to tick. This looks quite clear to me, but I am ofc biased. What names do you suggest? I don't have a better name now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094513461 From ayang at openjdk.org Sun May 18 15:20:50 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sun, 18 May 2025 15:20:50 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v3] In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - pgc-size-policy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25000/files - new: https://git.openjdk.org/jdk/pull/25000/files/dc9eb4f1..a8d14931 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=01-02 Stats: 50640 lines in 1539 files changed: 24057 ins; 18482 del; 8101 mod Patch: https://git.openjdk.org/jdk/pull/25000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25000/head:pull/25000 PR: https://git.openjdk.org/jdk/pull/25000 From ayang at openjdk.org Sun May 18 15:25:00 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sun, 18 May 2025 15:25:00 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v3] In-Reply-To: <-k-DamMcH1pZ4vSAkWjhlM5PD777oPKlkrX0JK2SsSk=.de6913a9-3e2f-4cc5-bc53-e251c82ed78d@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> <3l8x32wbOr2FZzLV3lYfSbch-6hlT1te0uZXUeQVAcQ=.3ff8422e-fc0a-492f-a6bc-0df6acbc9a66@github.com> <-k-DamMcH1pZ4vSAkWjhlM5PD777oPKlkrX0JK2SsSk=.de6913a9-3e2f-4cc5-bc53-e251c82ed78d@github.com> Message-ID: On Tue, 13 May 2025 15:59:19 GMT, Zhengyu Gu wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - review >> - Merge branch 'master' into pgc-size-policy >> - review >> - Merge branch 'master' into pgc-size-policy >> - pgc-size-policy > > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 51: > >> 49: #include "oops/oop.inline.hpp" >> 50: #include "runtime/cpuTimeCounters.hpp" >> 51: #include "runtime/globals_extension.hpp" > > Don't see why it is needed. It's needed for `FLAG_IS_DEFAULT`; got a build error without this include. > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 404: > >> 402: collect_at_safepoint(!should_run_young_gc); >> 403: >> 404: if (is_gc_overhead_limit_reached()) { > > Maybe want to adopt current algorithm, start to clear soft references when approaching gc overhead limit? > Running a full gc and clearing all soft references without retrying allocation and throws OOM, seems a bit harsh. > > People still use soft references for caches, reclaim soft references could potentially free large amount of memory. Revised a bit; the limitation of what we have on master is that it doesn't detect gc-overhead for young-gcs. If many young-gcs are run, gc-overhead checking should kick in as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094553207 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094554006 From gli at openjdk.org Sun May 18 18:08:56 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 18 May 2025 18:08:56 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v3] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Sun, 18 May 2025 15:20:50 GMT, Albert Mingkun Yang wrote: >> This patch refines Parallel's sizing strategy to improve overall memory management and performance. >> >> The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. >> >> `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. >> >> GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. >> >> ## Performance evaluation >> >> - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). >> - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). >> - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. >> >> PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - review > - Merge branch 'master' into pgc-size-policy > - review > - Merge branch 'master' into pgc-size-policy > - pgc-size-policy Some more comments. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 343: > 341: if (_gc_overhead_counter >= GCOverheadLimitThreshold) { > 342: return nullptr; > 343: } Returning `nullptr` means the `OutOfMemoryException` should be thrown later. Is it good to add a `error` level log here? src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 363: > 361: } > 362: > 363: bool ParallelScavengeHeap::check_gc_overhead_limit() { In main-line code, the method `check_gc_overhead_limit` is invoked by `PSScavenge::invoke` and `PSParallelCompact::invoke_no_policy` so that we can do the check after all the GCs. But now you only use `check_gc_overhead_limit` in `ParallelScavengeHeap::satisfy_failed_allocation`. I suspect whether it can check the gc overhead limit accurately. src/hotspot/share/gc/parallel/psYoungGen.cpp line 268: > 266: size_t original_committed_size = virtual_space()->committed_size(); > 267: > 268: while (true) { The `while` statement only runs once. Maybe we can find a better way to handle such complex conditional flow. src/hotspot/share/gc/parallel/psYoungGen.cpp line 268: > 266: size_t original_committed_size = virtual_space()->committed_size(); > 267: > 268: while (true) { The `while` statement only runs once. May we find a better way to refactor the code? src/hotspot/share/gc/parallel/psYoungGen.cpp line 334: > 332: assert(from_space()->capacity_in_bytes() == to_space()->capacity_in_bytes(), "inv"); > 333: const size_t current_survivor_size = from_space()->capacity_in_bytes(); > 334: assert(max_gen_size() > 2 * current_survivor_size, "inv"); Should this assertion be changed to `assert(max_gen_size() > current_eden_size + 2 * current_survivor_size, "inv");` ? src/hotspot/share/gc/parallel/psYoungGen.cpp line 379: > 377: // We usually resize young-gen only after a successful young-gc. However, in emergency state, we wanna expand young-gen to its max-capacity. > 378: // Young-gen should be empty normally after a full-gc. > 379: if (eden_space()->is_empty() && to_space()->is_empty()) { Why don't you test the `from space` here? And actually, if the `eden space` is empty, the `from space` and `to space` are empty too, because the objects are firstly moved to `eden space`. See the method `PSParallelCompact::summary_phase` for more information. So here, you only need to test whether the `eden space` is empty. src/hotspot/share/gc/parallel/psYoungGen.cpp line 487: > 485: > 486: void PSYoungGen::resize_spaces(size_t requested_eden_size, > 487: size_t requested_survivor_size) { You remove some `trace` level logs in this method. Please confirm whether it is your intent? src/hotspot/share/gc/shared/adaptiveSizePolicy.hpp line 48: > 46: // Default: 100ms. > 47: static constexpr double MinGCDistanceSecond = 0.100; > 48: static_assert(MinGCDistanceSecond >= 0.001, "inv"); The`MinGCDistanceSecond` is just a contant; the static assertion seems unnecessary? ------------- Changes requested by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25000#pullrequestreview-2848994453 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094590507 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094582368 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094559809 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094559817 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094560936 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094567980 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094571784 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094574692 From gli at openjdk.org Sun May 18 18:08:56 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 18 May 2025 18:08:56 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v3] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Sun, 18 May 2025 18:01:44 GMT, Guoxiong Li wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - review >> - Merge branch 'master' into pgc-size-policy >> - review >> - Merge branch 'master' into pgc-size-policy >> - pgc-size-policy > > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 343: > >> 341: if (_gc_overhead_counter >= GCOverheadLimitThreshold) { >> 342: return nullptr; >> 343: } > > Returning `nullptr` means the `OutOfMemoryException` should be thrown later. Is it good to add a `error` level log here? And notice: we can't identify the `OutOfMemoryException` whether because `gc overhead limit is exceeded`. As I pointed out before: `the field MemAllocator::Allocation::_overhead_limit_exceeded are not used now`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094591384 From gli at openjdk.org Sun May 18 18:08:59 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 18 May 2025 18:08:59 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v2] In-Reply-To: <3l8x32wbOr2FZzLV3lYfSbch-6hlT1te0uZXUeQVAcQ=.3ff8422e-fc0a-492f-a6bc-0df6acbc9a66@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> <3l8x32wbOr2FZzLV3lYfSbch-6hlT1te0uZXUeQVAcQ=.3ff8422e-fc0a-492f-a6bc-0df6acbc9a66@github.com> Message-ID: On Fri, 16 May 2025 08:36:22 GMT, Albert Mingkun Yang wrote: >> This patch refines Parallel's sizing strategy to improve overall memory management and performance. >> >> The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. >> >> `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. >> >> GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. >> >> ## Performance evaluation >> >> - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). >> - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). >> - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. >> >> PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into pgc-size-policy > - pgc-size-policy src/hotspot/share/gc/parallel/psAdaptiveSizePolicy.cpp line 232: > 230: // Major times are too long, so we want less promotion. > 231: incr_tenuring_threshold = true; > 232: } You keep the condition `minor_cost > major_cost * _threshold_tolerance_percent` of the previous code. But it will be strange when we only read the new code (in the future). What about removing the condition `minor_cost > major_cost * _threshold_tolerance_percent` and moving the comment `we prefer young-gc over full-gc` to another place? src/hotspot/share/gc/parallel/psAdaptiveSizePolicy.cpp line 254: > 252: // survived is an underestimate > 253: _survived_bytes.add(survived + promoted); > 254: } The parameter `is_survivor_overflow` seems unnecessary. When `is_survivor_overflow` is `false`, the `promoted` is `0`. What about using `_survived_bytes.add(survived + promoted)` only and removing parameter `is_survivor_overflow` (and the related conditional code). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094533246 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094536061 From gli at openjdk.org Sun May 18 18:23:53 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 18 May 2025 18:23:53 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v3] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: <-yIbScHlkeICVqjP2oPh4DEhknv41XfABxOf6Np7C7I=.17f1ef14-93af-4d4e-8350-2ad7f2e441a1@github.com> On Sun, 18 May 2025 15:20:50 GMT, Albert Mingkun Yang wrote: >> This patch refines Parallel's sizing strategy to improve overall memory management and performance. >> >> The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. >> >> `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. >> >> GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. >> >> ## Performance evaluation >> >> - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). >> - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). >> - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. >> >> PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - review > - Merge branch 'master' into pgc-size-policy > - review > - Merge branch 'master' into pgc-size-policy > - pgc-size-policy src/hotspot/share/gc/parallel/psGCAdaptivePolicyCounters.hpp line 168: > 166: gc_overhead_limit_exceeded_counter()->set_value( > 167: (jlong) ps_size_policy()->gc_overhead_limit_exceeded()); > 168: } The field `GCPolicyCounters::_gc_overhead_limit_exceeded_counter` and the related methods are not used now. It is good to remove them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094594014 From fyang at openjdk.org Mon May 19 03:43:00 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 19 May 2025 03:43:00 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <9hPoS71U0nWv19FLVE7E5Q1SUTRziIPqhoATcrPAe0E=.0397e761-048e-436e-bbd6-dbfb418fa76f@github.com> On Fri, 16 May 2025 08:02:16 GMT, Roberto Casta?eda Lozano wrote: >> Two small nits/questions, but otherwise ready from my side :) > >> Two small nits/questions, but otherwise ready from my side :) > > Thanks again for reviewing @eme64, I have addressed your questions now. And thanks also for your review @vnkozlov. > > @stefank @fisk @xmas92 @jsikstro may I get a review from the GC side? > > @RealFYang @TheRealMDoerr note that this PR also introduces implicit null check support for ZGC loads in RISC-V and PPC, but I cannot test it beyond GHA. May I ask you to test the changes on your respective platforms? (or let me know if you prefer to add the support in separate PRs). @robcasloz : Hi, Thanks for the ping! I performed tier1-3 tests on linux-riscv64 platform, result is good. The new test `test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java` also pass when running with G1 and ZGC using fastdebug build. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2889518370 From ayang at openjdk.org Mon May 19 05:19:56 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 05:19:56 GMT Subject: RFR: 8356157: Remove retry loop in collect of SerialHeap and ParallelScavengeHeap [v2] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 13:07:14 GMT, Albert Mingkun Yang wrote: >> Simple removing unnecessary retrying logic because an gc-operation will run-to-completion, guaranteeing the increment of corresponding counters. >> >> Test: tier1-3 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into remove-systemgc-loop > - remove-systemgc-loop Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25032#issuecomment-2889648367 From ayang at openjdk.org Mon May 19 05:19:57 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 05:19:57 GMT Subject: Integrated: 8356157: Remove retry loop in collect of SerialHeap and ParallelScavengeHeap In-Reply-To: References: Message-ID: On Mon, 5 May 2025 10:36:11 GMT, Albert Mingkun Yang wrote: > Simple removing unnecessary retrying logic because an gc-operation will run-to-completion, guaranteeing the increment of corresponding counters. > > Test: tier1-3 This pull request has now been integrated. Changeset: 969708bd Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/969708bd8f0de49f641eab3881cb15712aa34f1e Stats: 33 lines in 2 files changed: 0 ins; 26 del; 7 mod 8356157: Remove retry loop in collect of SerialHeap and ParallelScavengeHeap Reviewed-by: tschatzl, zgu ------------- PR: https://git.openjdk.org/jdk/pull/25032 From ayang at openjdk.org Mon May 19 06:10:42 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 06:10:42 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v4] In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - pgc-size-policy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25000/files - new: https://git.openjdk.org/jdk/pull/25000/files/a8d14931..e39ece09 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=02-03 Stats: 69 lines in 10 files changed: 46 ins; 15 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25000/head:pull/25000 PR: https://git.openjdk.org/jdk/pull/25000 From ayang at openjdk.org Mon May 19 06:10:42 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 06:10:42 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v3] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Sun, 18 May 2025 18:06:15 GMT, Guoxiong Li wrote: >> src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 343: >> >>> 341: if (_gc_overhead_counter >= GCOverheadLimitThreshold) { >>> 342: return nullptr; >>> 343: } >> >> Returning `nullptr` means the `OutOfMemoryError` will be thrown later. Is it good to add a `error` level log here? > > And notice: we can't identify whether the `OutOfMemoryError` is because of `gc overhead limit exceeded`. > > As I pointed out before: `the field MemAllocator::Allocation::_overhead_limit_exceeded are not used now`. The one inside the safepoint will print sth `log_info(gc)("GCOverheadLimitThreshold ...`. There can be multiple concurrent mutators reaching here; printing here is undesirable. I don't think throwing OOM, from gc's perspective, is an "error". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094899794 From ayang at openjdk.org Mon May 19 06:10:43 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 06:10:43 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v3] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Sun, 18 May 2025 17:21:40 GMT, Guoxiong Li wrote: > so that we can do the check after all the GCs Well, not really. In the old impl, `GCOverheadChecker::check_gc_overhead_limit` calls `set_gc_overhead_limit_exceeded` only for full-gc. > But now you only use check_gc_overhead_limit in ParallelScavengeHeap::satisfy_failed_allocation. I suspect whether it can check the gc overhead limit accurately. I believe so. In the old impl, we don't check gc-overhead for explicit gcs. Only allocation-failure caused gcs are interesting, which all go through `satisfy_failed_allocation`. // Ignore explicit GC's. Exiting here does not set the flag and // does not reset the count. if (GCCause::is_user_requested_gc(gc_cause) || GCCause::is_serviceability_requested_gc(gc_cause)) { return; } > src/hotspot/share/gc/parallel/psYoungGen.cpp line 268: > >> 266: size_t original_committed_size = virtual_space()->committed_size(); >> 267: >> 268: while (true) { > > The `while` statement only runs once. May we find a better way to refactor the code? I don't see an easy to re-structure the code while keeping all the relevant logic in the current context. I added some comments; check if it makes the flow easier to follow. > src/hotspot/share/gc/parallel/psYoungGen.cpp line 334: > >> 332: assert(from_space()->capacity_in_bytes() == to_space()->capacity_in_bytes(), "inv"); >> 333: const size_t current_survivor_size = from_space()->capacity_in_bytes(); >> 334: assert(max_gen_size() > 2 * current_survivor_size, "inv"); > > Should this assertion be changed to `assert(max_gen_size() > current_eden_size + 2 * current_survivor_size, "inv");` ? Revised; needs to be `>=` though. > src/hotspot/share/gc/parallel/psYoungGen.cpp line 379: > >> 377: // We usually resize young-gen only after a successful young-gc. However, in emergency state, we wanna expand young-gen to its max-capacity. >> 378: // Young-gen should be empty normally after a full-gc. >> 379: if (eden_space()->is_empty() && to_space()->is_empty()) { > > Why don't you test the `from space` here? And actually, if the `eden space` is empty, the `from space` and `to space` are empty too, because the objects are firstly moved to `eden space`. See the method `PSParallelCompact::summary_phase` for more information. So here, you only need to test whether the `eden space` is empty. Added checking for `from_space`. If all live-objs don't fit in old-gen, leftovers will be kept in its own space. // Summarize the remaining spaces in the young gen. The initial target space // is the old gen. If a space does not fit entirely into the target, then the // remainder is compacted into the space itself and that space becomes the new // target. > src/hotspot/share/gc/parallel/psYoungGen.cpp line 487: > >> 485: >> 486: void PSYoungGen::resize_spaces(size_t requested_eden_size, >> 487: size_t requested_survivor_size) { > > You remove some `trace` level logs in this method. Please confirm whether it is your intent? Yes; when testing/developing, I don't find them to be very useful. > src/hotspot/share/gc/shared/adaptiveSizePolicy.hpp line 48: > >> 46: // Default: 100ms. >> 47: static constexpr double MinGCDistanceSecond = 0.100; >> 48: static_assert(MinGCDistanceSecond >= 0.001, "inv"); > > The`MinGCDistanceSecond` is just a contant; the static assertion seems unnecessary? It's more to convey the intend that this number have a lower bound if future changes want to lower it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094893944 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094867831 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094870461 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094882156 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094882750 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094883781 From ayang at openjdk.org Mon May 19 06:10:45 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 06:10:45 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v2] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> <3l8x32wbOr2FZzLV3lYfSbch-6hlT1te0uZXUeQVAcQ=.3ff8422e-fc0a-492f-a6bc-0df6acbc9a66@github.com> Message-ID: On Sun, 18 May 2025 13:53:53 GMT, Guoxiong Li wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into pgc-size-policy >> - pgc-size-policy > > src/hotspot/share/gc/parallel/psAdaptiveSizePolicy.cpp line 232: > >> 230: // Major times are too long, so we want less promotion. >> 231: incr_tenuring_threshold = true; >> 232: } > > You keep the condition `minor_cost > major_cost * _threshold_tolerance_percent` of the previous code. But it will be strange when we only read the new code (in the future). What about removing the condition `minor_cost > major_cost * _threshold_tolerance_percent` and moving the comment `we prefer young-gc over full-gc` to another place? I keep it this way because I find the structure to be more symmetric, but I don't have a strong opinion on this. If you prefer, I can remove the empty if-branch. (The resulting asm should be identical.) > When is_survivor_overflow is false, the promoted is 0 That's not true; objs that live long enough will be promoted as well, even when the survivor-space has plenty of free-space. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094855103 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2094856261 From ayang at openjdk.org Mon May 19 06:55:30 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 06:55:30 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc Message-ID: Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because full-gc always run-to-completion. Test: tier1-3 ------------- Commit messages: - g1-remove-full-gc-loop Changes: https://git.openjdk.org/jdk/pull/25296/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25296&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357218 Stats: 24 lines in 3 files changed: 0 ins; 16 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25296/head:pull/25296 PR: https://git.openjdk.org/jdk/pull/25296 From kbarrett at openjdk.org Mon May 19 08:14:54 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 19 May 2025 08:14:54 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc In-Reply-To: References: Message-ID: On Mon, 19 May 2025 06:49:57 GMT, Albert Mingkun Yang wrote: > Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because full-gc always run-to-completion. > > Test: tier1-3 Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1875: > 1873: // Request is trivially finished. > 1874: if (!GCCause::is_explicit_full_gc(cause) || op.gc_succeeded()) { > 1875: return op.gc_succeeded(); Complete removal of this part isn't correct. The premise of this change, that "full-gc always run-to-completion" is not correct. A `_g1_periodic_collection` may be cancelled: https://github.com/openjdk/jdk/blob/50a7c61d28b9885ff48f4fcd8bfd460b507bbcef/src/hotspot/share/gc/g1/g1VMOperations.cpp#L39-L48 Such a collection is not an explicit full-gc, so that part of the test may be true, when this function returns false because the gc did not succeed. ------------- PR Review: https://git.openjdk.org/jdk/pull/25296#pullrequestreview-2849834613 PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2095115204 From ayang at openjdk.org Mon May 19 08:27:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 08:27:55 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc In-Reply-To: References: Message-ID: On Mon, 19 May 2025 08:12:26 GMT, Kim Barrett wrote: >> Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. >> >> Test: tier1-3 > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1875: > >> 1873: // Request is trivially finished. >> 1874: if (!GCCause::is_explicit_full_gc(cause) || op.gc_succeeded()) { >> 1875: return op.gc_succeeded(); > > Complete removal of this part isn't correct. The premise of this change, that "full-gc always > run-to-completion" is not correct. A `_g1_periodic_collection` may be cancelled: > https://github.com/openjdk/jdk/blob/50a7c61d28b9885ff48f4fcd8bfd460b507bbcef/src/hotspot/share/gc/g1/g1VMOperations.cpp#L39-L48 > Such a collection is not an explicit full-gc, so that part of the test may be true, when this function > returns false because the gc did not succeed. Updated the description. The overall result of `!GCCause::is_explicit_full_gc(cause) || op.gc_succeeded()` is always true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2095139330 From gli at openjdk.org Mon May 19 11:08:56 2025 From: gli at openjdk.org (Guoxiong Li) Date: Mon, 19 May 2025 11:08:56 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v3] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Mon, 19 May 2025 06:04:56 GMT, Albert Mingkun Yang wrote: > I don't think throwing OOM, from gc's perspective, is an "error". Nevermind; I just obey the statement in methods `MemAllocator::Allocation::check_out_of_memory` and `Universe::out_of_memory_error_java_heap`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2095458632 From gli at openjdk.org Mon May 19 11:08:59 2025 From: gli at openjdk.org (Guoxiong Li) Date: Mon, 19 May 2025 11:08:59 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v2] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> <3l8x32wbOr2FZzLV3lYfSbch-6hlT1te0uZXUeQVAcQ=.3ff8422e-fc0a-492f-a6bc-0df6acbc9a66@github.com> Message-ID: On Mon, 19 May 2025 05:20:36 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/parallel/psAdaptiveSizePolicy.cpp line 232: >> >>> 230: // Major times are too long, so we want less promotion. >>> 231: incr_tenuring_threshold = true; >>> 232: } >> >> You keep the condition `minor_cost > major_cost * _threshold_tolerance_percent` of the previous code. But it will be strange when we only read the new code (in the future). What about removing the condition `minor_cost > major_cost * _threshold_tolerance_percent` and moving the comment `we prefer young-gc over full-gc` to another place? > > I keep it this way because I find the structure to be more symmetric, but I don't have a strong opinion on this. If you prefer, I can remove the empty if-branch. (The resulting asm should be identical.) I prefer removing it; waiting for opinions by other reviewers. >> src/hotspot/share/gc/parallel/psAdaptiveSizePolicy.cpp line 254: >> >>> 252: // survived is an underestimate >>> 253: _survived_bytes.add(survived + promoted); >>> 254: } >> >> The parameter `is_survivor_overflow` seems unnecessary. When `is_survivor_overflow` is `false`, the `promoted` is `0`. What about using `_survived_bytes.add(survived + promoted)` only and removing parameter `is_survivor_overflow` (and the related conditional code). > >> When is_survivor_overflow is false, the promoted is 0 > > That's not true; objs that live long enough will be promoted as well, even when the survivor-space has plenty of free-space. Ohh, you are right, I forgot it at that time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2095457633 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2095457808 From gli at openjdk.org Mon May 19 11:09:00 2025 From: gli at openjdk.org (Guoxiong Li) Date: Mon, 19 May 2025 11:09:00 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v3] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: <0l1GXiRVXTfUaPsDPyirWY0RnyyjxO95GfqnED2O1nw=.6f9d7504-3708-48f0-9e28-689772339276@github.com> On Mon, 19 May 2025 05:49:31 GMT, Albert Mingkun Yang wrote: > If all live-objs don't fit in old-gen, leftovers will be kept in its own space. Thanks for clarifying. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2095458158 From tschatzl at openjdk.org Mon May 19 12:02:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 19 May 2025 12:02:52 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc In-Reply-To: References: Message-ID: On Mon, 19 May 2025 06:49:57 GMT, Albert Mingkun Yang wrote: > Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. > > Test: tier1-3 src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1864: > 1862: > 1863: bool G1CollectedHeap::try_collect_fullgc(GCCause::Cause cause, > 1864: const G1GCCounters& counters_before) { There does not seem to be a reason to keep this helper method. It's the same complexity as the attempt to do a young collection in `try_collect()` now. Or move the attempt to do a young collection to a helper method for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2095184194 From tschatzl at openjdk.org Mon May 19 12:02:53 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 19 May 2025 12:02:53 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc In-Reply-To: References: Message-ID: On Mon, 19 May 2025 08:46:48 GMT, Thomas Schatzl wrote: >> Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. >> >> Test: tier1-3 > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1864: > >> 1862: >> 1863: bool G1CollectedHeap::try_collect_fullgc(GCCause::Cause cause, >> 1864: const G1GCCounters& counters_before) { > > There does not seem to be a reason to keep this helper method. It's the same complexity as the attempt to do a young collection in `try_collect()` now. > > Or move the attempt to do a young collection to a helper method for consistency. Fwiw, there is `G1CollectedHeap::do_collection_pause()`that might fit exactly already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2095520960 From tschatzl at openjdk.org Mon May 19 12:02:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 19 May 2025 12:02:54 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc In-Reply-To: References: Message-ID: On Mon, 19 May 2025 08:25:02 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1875: >> >>> 1873: // Request is trivially finished. >>> 1874: if (!GCCause::is_explicit_full_gc(cause) || op.gc_succeeded()) { >>> 1875: return op.gc_succeeded(); >> >> Complete removal of this part isn't correct. The premise of this change, that "full-gc always >> run-to-completion" is not correct. A `_g1_periodic_collection` may be cancelled: >> https://github.com/openjdk/jdk/blob/50a7c61d28b9885ff48f4fcd8bfd460b507bbcef/src/hotspot/share/gc/g1/g1VMOperations.cpp#L39-L48 >> Such a collection is not an explicit full-gc, so that part of the test may be true, when this function >> returns false because the gc did not succeed. > > Updated the description. The overall result of `!GCCause::is_explicit_full_gc(cause) || op.gc_succeeded()` is always true. The pre-existing code seems to be buggy: if the operation had been skipped, it should have returned false given the one caller that reads the boolean result (periodic collection invocation) to print a log line. I.e. maybe something like: `return op.prologue_succeeded() && op.gc_succeeded();` Similar to `G1CollectedHeap::do_collection_pause()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2095521025 From tschatzl at openjdk.org Mon May 19 12:08:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 19 May 2025 12:08:55 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc In-Reply-To: References: Message-ID: On Mon, 19 May 2025 08:25:02 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1875: >> >>> 1873: // Request is trivially finished. >>> 1874: if (!GCCause::is_explicit_full_gc(cause) || op.gc_succeeded()) { >>> 1875: return op.gc_succeeded(); >> >> Complete removal of this part isn't correct. The premise of this change, that "full-gc always >> run-to-completion" is not correct. A `_g1_periodic_collection` may be cancelled: >> https://github.com/openjdk/jdk/blob/50a7c61d28b9885ff48f4fcd8bfd460b507bbcef/src/hotspot/share/gc/g1/g1VMOperations.cpp#L39-L48 >> Such a collection is not an explicit full-gc, so that part of the test may be true, when this function >> returns false because the gc did not succeed. > > Updated the description. The overall result of `!GCCause::is_explicit_full_gc(cause) || op.gc_succeeded()` is always true. (Somehow comments seem to have disappeared, probably my fault. anyway:) This method returns `false` if the VM op has not been executed (skipped). The new code always returns `true`, which is wrong afaict. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2095561063 From rcastanedalo at openjdk.org Mon May 19 12:53:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 May 2025 12:53:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Sat, 17 May 2025 09:04:28 GMT, Andrew Haley wrote: > I think you only have to mark both the lea and the memory access with an exception table entry. Could you elaborate a bit more on this part of your suggestion? My understanding is that [C2's `PhaseOutput`](https://github.com/openjdk/jdk/blob/3acfa9e4e7be2f37ac55f97348aad4f74ba802a0/src/hotspot/share/opto/output.hpp#L72) (the component responsible for populating the [implicit null exception table](https://github.com/openjdk/jdk/blob/3acfa9e4e7be2f37ac55f97348aad4f74ba802a0/src/hotspot/share/opto/output.cpp#L3451)) can at most add one entry per Mach node (in this case `zLoadP`), where [the entry key is the address of the first emitted machine instruction](https://github.com/openjdk/jdk/blob/3acfa9e4e7be2f37ac55f97348aad4f74ba802a0/src/hotspot/share/opto/output.cpp#L1611-L1614). Therefore if we want to mark both the lea and the memory access as you suggest, we would need to extend `C2_MacroAssembler` to express which instructions we want to mark and extend C2's `PhaseOutput` to add entries for each of the marked instructions. Is there a simpler way I have missed to achieve this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2095643031 From mark.reinhold at oracle.com Mon May 19 13:33:37 2025 From: mark.reinhold at oracle.com (Mark Reinhold) Date: Mon, 19 May 2025 13:33:37 +0000 Subject: New candidate JEP: 521: Generational Shenandoah Message-ID: <20250519133336.A2D04815FAE@eggemoggin.niobe.net> https://openjdk.org/jeps/521 Summary: Change the generational mode of the Shenandoah garbage collector from an experimental feature to a product feature. - Mark From ayang at openjdk.org Mon May 19 13:54:34 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 13:54:34 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v2] In-Reply-To: References: Message-ID: > Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. > > Test: tier1-3 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25296/files - new: https://git.openjdk.org/jdk/pull/25296/files/233a5d1b..722aac36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25296&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25296&range=00-01 Stats: 20 lines in 2 files changed: 4 ins; 15 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25296/head:pull/25296 PR: https://git.openjdk.org/jdk/pull/25296 From ayang at openjdk.org Mon May 19 13:58:08 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 13:58:08 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v3] In-Reply-To: References: Message-ID: > Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. > > Test: tier1-3 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25296/files - new: https://git.openjdk.org/jdk/pull/25296/files/722aac36..666cc5a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25296&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25296&range=01-02 Stats: 10 lines in 3 files changed: 4 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25296/head:pull/25296 PR: https://git.openjdk.org/jdk/pull/25296 From ayang at openjdk.org Mon May 19 14:07:37 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 19 May 2025 14:07:37 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v4] In-Reply-To: References: Message-ID: <-DkbQ4L8XcW7TP2Bwe5lTZkLIbioLAqe9UcTYAW6rTw=.407663e4-7446-4ad1-a9b0-0809e39ca40b@github.com> > Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. > > Test: tier1-3 Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: - review - Revert "review" This reverts commit 666cc5a1cbc72453a09fdf4e9319bf140365859e. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25296/files - new: https://git.openjdk.org/jdk/pull/25296/files/666cc5a1..93c6e998 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25296&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25296&range=02-03 Stats: 8 lines in 3 files changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25296/head:pull/25296 PR: https://git.openjdk.org/jdk/pull/25296 From kbarrett at openjdk.org Mon May 19 14:36:03 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 19 May 2025 14:36:03 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v4] In-Reply-To: <-DkbQ4L8XcW7TP2Bwe5lTZkLIbioLAqe9UcTYAW6rTw=.407663e4-7446-4ad1-a9b0-0809e39ca40b@github.com> References: <-DkbQ4L8XcW7TP2Bwe5lTZkLIbioLAqe9UcTYAW6rTw=.407663e4-7446-4ad1-a9b0-0809e39ca40b@github.com> Message-ID: <1w2C_DUFFCelx2jQQVePFu_isA_vs2UXwjyuCcx-3VA=.a314fc91-4259-4f06-98d3-c4ddc05400b8@github.com> On Mon, 19 May 2025 14:07:37 GMT, Albert Mingkun Yang wrote: >> Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. >> >> Test: tier1-3 > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - review > - Revert "review" > > This reverts commit 666cc5a1cbc72453a09fdf4e9319bf140365859e. Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1885: > 1883: cause); > 1884: VMThread::execute(&op); > 1885: return op.prologue_succeeded(); They might now be the same value, but `gc_succeeded()` seems like a better semantic fit here. Looking around a little, it's not obvious why `prologue_succeeded()` is ever the right name for a client-accessible test for what seems like gc-completion. But that's a separate issue. ------------- PR Review: https://git.openjdk.org/jdk/pull/25296#pullrequestreview-2850997116 PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2095857029 From rkennke at openjdk.org Mon May 19 15:30:53 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 19 May 2025 15:30:53 GMT Subject: RFR: 8354078: Shenandoah: Make the generational mode be non-experimental (implementation) In-Reply-To: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> References: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> Message-ID: On Fri, 16 May 2025 17:30:11 GMT, William Kemper wrote: > Testing: > > % ./build/linux-x86_64-server-fastdebug/jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational --version > openjdk 25 2025-09-16 > OpenJDK Runtime Environment (fastdebug build 25-make-genshen-non-experimental) > OpenJDK 64-Bit Server VM (fastdebug build 25-make-genshen-non-experimental, mixed mode) Many tests specify `-XX:+UseExperimentalVMOptions` to be able to use generational mode. Those `-XX:+UseExperimentalVMOptions` should be removed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25270#issuecomment-2891447874 From wkemper at openjdk.org Mon May 19 17:30:35 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 19 May 2025 17:30:35 GMT Subject: RFR: 8354078: Shenandoah: Make the generational mode be non-experimental (implementation) [v2] In-Reply-To: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> References: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> Message-ID: > Testing: > > % ./build/linux-x86_64-server-fastdebug/jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational --version > openjdk 25 2025-09-16 > OpenJDK Runtime Environment (fastdebug build 25-make-genshen-non-experimental) > OpenJDK 64-Bit Server VM (fastdebug build 25-make-genshen-non-experimental, mixed mode) William Kemper has updated the pull request incrementally with one additional commit since the last revision: Update test asserting that generational mode is experimental ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25270/files - new: https://git.openjdk.org/jdk/pull/25270/files/719126e6..198fb3ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25270&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25270&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25270/head:pull/25270 PR: https://git.openjdk.org/jdk/pull/25270 From ysr at openjdk.org Mon May 19 23:20:53 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 19 May 2025 23:20:53 GMT Subject: RFR: 8354078: Implement JEP 521: Generational Shenandoah [v2] In-Reply-To: References: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> Message-ID: On Mon, 19 May 2025 17:30:35 GMT, William Kemper wrote: >> Testing: >> >> % ./build/linux-x86_64-server-fastdebug/jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational --version >> openjdk 25 2025-09-16 >> OpenJDK Runtime Environment (fastdebug build 25-make-genshen-non-experimental) >> OpenJDK 64-Bit Server VM (fastdebug build 25-make-genshen-non-experimental, mixed mode) > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Update test asserting that generational mode is experimental Looks good, thanks! ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25270#pullrequestreview-2852164466 From ayang at openjdk.org Tue May 20 05:07:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 20 May 2025 05:07:52 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v4] In-Reply-To: <1w2C_DUFFCelx2jQQVePFu_isA_vs2UXwjyuCcx-3VA=.a314fc91-4259-4f06-98d3-c4ddc05400b8@github.com> References: <-DkbQ4L8XcW7TP2Bwe5lTZkLIbioLAqe9UcTYAW6rTw=.407663e4-7446-4ad1-a9b0-0809e39ca40b@github.com> <1w2C_DUFFCelx2jQQVePFu_isA_vs2UXwjyuCcx-3VA=.a314fc91-4259-4f06-98d3-c4ddc05400b8@github.com> Message-ID: On Mon, 19 May 2025 14:33:12 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - Revert "review" >> >> This reverts commit 666cc5a1cbc72453a09fdf4e9319bf140365859e. > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1885: > >> 1883: cause); >> 1884: VMThread::execute(&op); >> 1885: return op.prologue_succeeded(); > > They might now be the same value, but `gc_succeeded()` seems like a better semantic fit here. > Looking around a little, it's not obvious why `prologue_succeeded()` is ever the right name for a > client-accessible test for what seems like gc-completion. But that's a separate issue. How about introducing a local var to better reflect the semantics? // GCs always run-to-completion once prologue succeeds. bool is_gc_succeeded = op.prologue_succeeded(); return is_gc_succeeded; This way we don't need to maintain a "redundant" field in `VM_G1CollectFull`. (Thomas also asked whether the same field in `VM_G1CollectForAllocation` is actually needed -- seems that `_gc_succeeded == prologue_succeeded()` holds as well, so we can potential remove that one as well.) If you dislike the suggestion, I will add back the removed field. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2096930311 From aboldtch at openjdk.org Tue May 20 06:51:37 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 20 May 2025 06:51:37 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v4] In-Reply-To: References: Message-ID: <3Z6vsQuZdv_yxZa2IFMWx4T5eBuJ56dD53HmR0r-jLQ=.299ce304-d769-4bd6-b3a1-af8892320227@github.com> > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8356455 - Use atomic load of tree size in print_on - cache_replace comments - Sort order - Use private inheritance - Separate tree logic to own class - 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25112/files - new: https://git.openjdk.org/jdk/pull/25112/files/0f406adf..a14736db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=02-03 Stats: 41426 lines in 1351 files changed: 19072 ins; 15706 del; 6648 mod Patch: https://git.openjdk.org/jdk/pull/25112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25112/head:pull/25112 PR: https://git.openjdk.org/jdk/pull/25112 From aboldtch at openjdk.org Tue May 20 06:55:21 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 20 May 2025 06:55:21 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic [v3] In-Reply-To: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> References: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> Message-ID: <1g6Mnw-J8whB4uoR6oC35lOo8Bmk2LWlFQ7yYLTNlRk=.e0cb68fb-9bb2-4e2b-b909-b8fa68138739@github.com> > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) required changing the way ZGC handle memory uncommitting (returning physical memory to the OS). Previously ZGC tracked how recently used memory was on a ZPage level. [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) did away with the ZPage abstraction for unused memory. But because of this ZGC does not have a convenient way of tracking the usage of a specific memory range. Instead [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) opted to keep a watermark in the cache unused mapped memory, to keep track of the amount of memory that was not used within the last ZUncommitDelay, and use this when deciding how much to uncommit. > > Because this measurement is not as granular as previously, and because uncommitting memory is something we want to do conservatively, as a response to low memory utilization, [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was written with the intent to spread out the uncommitting over some time interval. > > The actual implementation in [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) has a few issues which this RFE tries to address: > * Missing wait, the uncommitting is not actually spread out, but happens all at once. > * Reactivity, if the process starts using memory that was below the previous watermark, uncommitting should stop. > * Structure, the current implementation has a lot of different dependencies and has state spread out over multiple classes. Refactor to keep the logic contained to the ZUncommitter, and provide better named facilitating functions on the ZPartition and ZMappedCache. And make the lifecycle of ZUncommitter more explicit. > * Events, overhaul the JFR uncommit events to be sent (and track the time for) a chunk of uncommits without any waits. > > An alternative discussed has been to do uncommitting based on GC triggers rather than a periodically. So rather than using ZUncommitDelay, we could have our proactive GCs actually trigger and track uncommitting. This might be a future RFE, but it was not attempted here as it would change user facing APIs. [JDK-8329758](https://bugs.openjdk.org/browse/JDK-8329758) will more than likely overhaul the uncommit triggers as well, and the whole concept of ZUncommitDelay and having to tune how to uncommit will go away. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8356716 - Wrong too - Less archaic spelling of complete - Cleanup and simplify - Move all uncommit logic to zUncommitter - Log time spent uncommitting - Split reset_uncommit_cycle and add headroom - Rename _min_last_uncommit_cycle to _min_size_watermark - Use milliseconds instead of seconds - Improve events and statistics - ... and 8 more: https://git.openjdk.org/jdk/compare/5fe68235...43c0795a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25198/files - new: https://git.openjdk.org/jdk/pull/25198/files/96ce895f..43c0795a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=01-02 Stats: 41426 lines in 1351 files changed: 19072 ins; 15706 del; 6648 mod Patch: https://git.openjdk.org/jdk/pull/25198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25198/head:pull/25198 PR: https://git.openjdk.org/jdk/pull/25198 From tschatzl at openjdk.org Tue May 20 07:06:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 20 May 2025 07:06:52 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v4] In-Reply-To: References: <-DkbQ4L8XcW7TP2Bwe5lTZkLIbioLAqe9UcTYAW6rTw=.407663e4-7446-4ad1-a9b0-0809e39ca40b@github.com> <1w2C_DUFFCelx2jQQVePFu_isA_vs2UXwjyuCcx-3VA=.a314fc91-4259-4f06-98d3-c4ddc05400b8@github.com> Message-ID: On Tue, 20 May 2025 05:05:15 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1885: >> >>> 1883: cause); >>> 1884: VMThread::execute(&op); >>> 1885: return op.prologue_succeeded(); >> >> They might now be the same value, but `gc_succeeded()` seems like a better semantic fit here. >> Looking around a little, it's not obvious why `prologue_succeeded()` is ever the right name for a >> client-accessible test for what seems like gc-completion. But that's a separate issue. > > How about introducing a local var to better reflect the semantics? > > > // GCs always run-to-completion once prologue succeeds. > bool is_gc_succeeded = op.prologue_succeeded(); > return is_gc_succeeded; > > > This way we don't need to maintain a "redundant" field in `VM_G1CollectFull`. > > (Thomas also asked whether the same field in `VM_G1CollectForAllocation` is actually needed -- seems that `_gc_succeeded == prologue_succeeded()` holds as well, so we can potential remove that one as well.) > > If you dislike the suggestion, I will add back the removed field. The field does not need to be re-added in the full gc VM op, just add an accessor `gc_succeeded()` that returns the internal `prologue_succeeded()`. I will file a bug for removing the `_gc_succeeded` field in `VM_G1CollectForAllocation`, because it is always true if `doit` ran. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2097142894 From kbarrett at openjdk.org Tue May 20 07:11:53 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 20 May 2025 07:11:53 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v4] In-Reply-To: <-DkbQ4L8XcW7TP2Bwe5lTZkLIbioLAqe9UcTYAW6rTw=.407663e4-7446-4ad1-a9b0-0809e39ca40b@github.com> References: <-DkbQ4L8XcW7TP2Bwe5lTZkLIbioLAqe9UcTYAW6rTw=.407663e4-7446-4ad1-a9b0-0809e39ca40b@github.com> Message-ID: On Mon, 19 May 2025 14:07:37 GMT, Albert Mingkun Yang wrote: >> Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. >> >> Test: tier1-3 > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - review > - Revert "review" > > This reverts commit 666cc5a1cbc72453a09fdf4e9319bf140365859e. Changes requested by kbarrett (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25296#pullrequestreview-2852944143 From kbarrett at openjdk.org Tue May 20 07:11:54 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 20 May 2025 07:11:54 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v4] In-Reply-To: References: <-DkbQ4L8XcW7TP2Bwe5lTZkLIbioLAqe9UcTYAW6rTw=.407663e4-7446-4ad1-a9b0-0809e39ca40b@github.com> <1w2C_DUFFCelx2jQQVePFu_isA_vs2UXwjyuCcx-3VA=.a314fc91-4259-4f06-98d3-c4ddc05400b8@github.com> Message-ID: On Tue, 20 May 2025 07:04:34 GMT, Thomas Schatzl wrote: >> How about introducing a local var to better reflect the semantics? >> >> >> // GCs always run-to-completion once prologue succeeds. >> bool is_gc_succeeded = op.prologue_succeeded(); >> return is_gc_succeeded; >> >> >> This way we don't need to maintain a "redundant" field in `VM_G1CollectFull`. >> >> (Thomas also asked whether the same field in `VM_G1CollectForAllocation` is actually needed -- seems that `_gc_succeeded == prologue_succeeded()` holds as well, so we can potential remove that one as well.) >> >> If you dislike the suggestion, I will add back the removed field. > > The field does not need to be re-added in the full gc VM op, just add an accessor `gc_succeeded()` that returns the internal `prologue_succeeded()`. > > I will file a bug for removing the `_gc_succeeded` field in `VM_G1CollectForAllocation`, because it is always true if `doit` ran. I think the question that clients care about is whether the gc succeeded. Whether the prologue succeeded or not might be used as part of the implementation of that. That is, I think gc_succeeded ought to be part of the public API for GC operations, and prologue_succeeded shouldn't be, with some gc_succeeded operations being implemented as just a call to prologue_succeeded. That probably shouldn't be part of this change though. Instead, I'd prefer this change maintain the status quo in this area. Or do what @tschatzl suggested, which I saw as I was about to post this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2097153112 From ayang at openjdk.org Tue May 20 07:20:30 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 20 May 2025 07:20:30 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v5] In-Reply-To: References: Message-ID: <5Fjc4no1wVBTCKFvLPROl168Dt4BMLqNqBBvk4NZ2U0=.c5d0f057-b2a9-4109-818e-27ccbd6088fb@github.com> > Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. > > Test: tier1-3 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25296/files - new: https://git.openjdk.org/jdk/pull/25296/files/93c6e998..f5768f82 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25296&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25296&range=03-04 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25296/head:pull/25296 PR: https://git.openjdk.org/jdk/pull/25296 From ayang at openjdk.org Tue May 20 07:20:30 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 20 May 2025 07:20:30 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v4] In-Reply-To: References: <-DkbQ4L8XcW7TP2Bwe5lTZkLIbioLAqe9UcTYAW6rTw=.407663e4-7446-4ad1-a9b0-0809e39ca40b@github.com> <1w2C_DUFFCelx2jQQVePFu_isA_vs2UXwjyuCcx-3VA=.a314fc91-4259-4f06-98d3-c4ddc05400b8@github.com> Message-ID: On Tue, 20 May 2025 07:09:25 GMT, Kim Barrett wrote: >> The field does not need to be re-added in the full gc VM op, just add an accessor `gc_succeeded()` that returns the internal `prologue_succeeded()`. >> >> I will file a bug for removing the `_gc_succeeded` field in `VM_G1CollectForAllocation`, because it is always true if `doit` ran. > > I think the question that clients care about is whether the gc succeeded. > Whether the prologue succeeded or not might be used as part of the > implementation of that. That is, I think gc_succeeded ought to be part of the > public API for GC operations, and prologue_succeeded shouldn't be, with some > gc_succeeded operations being implemented as just a call to > prologue_succeeded. > > That probably shouldn't be part of this change though. Instead, I'd prefer > this change maintain the status quo in this area. Or do what @tschatzl > suggested, which I saw as I was about to post this. Added back the API. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2097167881 From tschatzl at openjdk.org Tue May 20 07:27:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 20 May 2025 07:27:56 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v4] In-Reply-To: References: <-DkbQ4L8XcW7TP2Bwe5lTZkLIbioLAqe9UcTYAW6rTw=.407663e4-7446-4ad1-a9b0-0809e39ca40b@github.com> <1w2C_DUFFCelx2jQQVePFu_isA_vs2UXwjyuCcx-3VA=.a314fc91-4259-4f06-98d3-c4ddc05400b8@github.com> Message-ID: On Tue, 20 May 2025 07:17:16 GMT, Albert Mingkun Yang wrote: >> I think the question that clients care about is whether the gc succeeded. >> Whether the prologue succeeded or not might be used as part of the >> implementation of that. That is, I think gc_succeeded ought to be part of the >> public API for GC operations, and prologue_succeeded shouldn't be, with some >> gc_succeeded operations being implemented as just a call to >> prologue_succeeded. >> >> That probably shouldn't be part of this change though. Instead, I'd prefer >> this change maintain the status quo in this area. Or do what @tschatzl >> suggested, which I saw as I was about to post this. > > Added back the API. > I think the question that clients care about is whether the gc succeeded. Whether the prologue succeeded or not might be used as part of the implementation of that. That is, I think gc_succeeded ought to be part of the public API for GC operations, and prologue_succeeded shouldn't be, with some gc_succeeded operations being implemented as just a call to prologue_succeeded. Agree. Filed JDK-8357307. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25296#discussion_r2097178221 From tschatzl at openjdk.org Tue May 20 07:27:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 20 May 2025 07:27:55 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v5] In-Reply-To: <5Fjc4no1wVBTCKFvLPROl168Dt4BMLqNqBBvk4NZ2U0=.c5d0f057-b2a9-4109-818e-27ccbd6088fb@github.com> References: <5Fjc4no1wVBTCKFvLPROl168Dt4BMLqNqBBvk4NZ2U0=.c5d0f057-b2a9-4109-818e-27ccbd6088fb@github.com> Message-ID: On Tue, 20 May 2025 07:20:30 GMT, Albert Mingkun Yang wrote: >> Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. >> >> Test: tier1-3 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25296#pullrequestreview-2852993569 From kbarrett at openjdk.org Tue May 20 08:50:59 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 20 May 2025 08:50:59 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v5] In-Reply-To: <5Fjc4no1wVBTCKFvLPROl168Dt4BMLqNqBBvk4NZ2U0=.c5d0f057-b2a9-4109-818e-27ccbd6088fb@github.com> References: <5Fjc4no1wVBTCKFvLPROl168Dt4BMLqNqBBvk4NZ2U0=.c5d0f057-b2a9-4109-818e-27ccbd6088fb@github.com> Message-ID: <2f6oaEYGmphs2_g84XXZk5eLSxXOQfc3piZ4-Cjk0MM=.85a600d3-2370-48c4-8f8d-cc1f4bd3b2ca@github.com> On Tue, 20 May 2025 07:20:30 GMT, Albert Mingkun Yang wrote: >> Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. >> >> Test: tier1-3 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25296#pullrequestreview-2853310828 From stefank at openjdk.org Tue May 20 12:30:55 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 20 May 2025 12:30:55 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v4] In-Reply-To: <3Z6vsQuZdv_yxZa2IFMWx4T5eBuJ56dD53HmR0r-jLQ=.299ce304-d769-4bd6-b3a1-af8892320227@github.com> References: <3Z6vsQuZdv_yxZa2IFMWx4T5eBuJ56dD53HmR0r-jLQ=.299ce304-d769-4bd6-b3a1-af8892320227@github.com> Message-ID: On Tue, 20 May 2025 06:51:37 GMT, Axel Boldt-Christmas wrote: >> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. >> >> The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. >> >> Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. >> >> Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8356455 > - Use atomic load of tree size in print_on > - cache_replace comments > - Sort order > - Use private inheritance > - Separate tree logic to own class > - 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2853979533 From ayang at openjdk.org Tue May 20 13:21:03 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 20 May 2025 13:21:03 GMT Subject: RFR: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc [v5] In-Reply-To: <5Fjc4no1wVBTCKFvLPROl168Dt4BMLqNqBBvk4NZ2U0=.c5d0f057-b2a9-4109-818e-27ccbd6088fb@github.com> References: <5Fjc4no1wVBTCKFvLPROl168Dt4BMLqNqBBvk4NZ2U0=.c5d0f057-b2a9-4109-818e-27ccbd6088fb@github.com> Message-ID: On Tue, 20 May 2025 07:20:30 GMT, Albert Mingkun Yang wrote: >> Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. >> >> Test: tier1-3 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25296#issuecomment-2894371797 From ayang at openjdk.org Tue May 20 13:21:04 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 20 May 2025 13:21:04 GMT Subject: Integrated: 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc In-Reply-To: References: Message-ID: On Mon, 19 May 2025 06:49:57 GMT, Albert Mingkun Yang wrote: > Simple removing unnecessary loop in "caller" of `VM_G1CollectFull`, because an explicit full-gc always run-to-completion. > > Test: tier1-3 This pull request has now been integrated. Changeset: e6750a5b Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/e6750a5bb0580733820a0885d40704e050baf97d Stats: 39 lines in 4 files changed: 4 ins; 30 del; 5 mod 8357218: G1: Remove loop in G1CollectedHeap::try_collect_fullgc Reviewed-by: kbarrett, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/25296 From iwalulya at openjdk.org Tue May 20 15:34:29 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 20 May 2025 15:34:29 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v4] In-Reply-To: References: Message-ID: <92Vf3UjIhUOJYwfUnIyulaxKQdSBiLy_caiNo_9zc5I=.71b5f327-8231-4bcd-a83c-85ca98c7911c@github.com> > Hi, > > Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount - use align_up_to_region_byte_size - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount - Thomas Review - nit - refactor full collection ------------- Changes: https://git.openjdk.org/jdk/pull/24944/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24944&range=03 Stats: 42 lines in 8 files changed: 19 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/24944.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24944/head:pull/24944 PR: https://git.openjdk.org/jdk/pull/24944 From tschatzl at openjdk.org Tue May 20 16:55:03 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 20 May 2025 16:55:03 GMT Subject: RFR: 8357018: Guidance for ParallelRefProcEnabled is wrong in the man pages Message-ID: <7wtVnHeMGJhAjmqOxi3d6HuUzcHqDaX0xgoRyg2uojU=.58faa8c9-7cde-47c1-a902-a423803cdd35@github.com> Hi all, please review this small update to the parallel reference processing option in the manpage. Testing: compilation/manpage building? Thomas ------------- Commit messages: - 8357018 Changes: https://git.openjdk.org/jdk/pull/25323/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25323&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357018 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25323.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25323/head:pull/25323 PR: https://git.openjdk.org/jdk/pull/25323 From ayang at openjdk.org Tue May 20 18:26:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 20 May 2025 18:26:55 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v4] In-Reply-To: <92Vf3UjIhUOJYwfUnIyulaxKQdSBiLy_caiNo_9zc5I=.71b5f327-8231-4bcd-a83c-85ca98c7911c@github.com> References: <92Vf3UjIhUOJYwfUnIyulaxKQdSBiLy_caiNo_9zc5I=.71b5f327-8231-4bcd-a83c-85ca98c7911c@github.com> Message-ID: On Tue, 20 May 2025 15:34:29 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - use align_up_to_region_byte_size > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - Thomas Review > - nit > - refactor full collection Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24944#pullrequestreview-2855191453 From kbarrett at openjdk.org Tue May 20 18:49:54 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 20 May 2025 18:49:54 GMT Subject: RFR: 8357018: Guidance for ParallelRefProcEnabled is wrong in the man pages In-Reply-To: <7wtVnHeMGJhAjmqOxi3d6HuUzcHqDaX0xgoRyg2uojU=.58faa8c9-7cde-47c1-a902-a423803cdd35@github.com> References: <7wtVnHeMGJhAjmqOxi3d6HuUzcHqDaX0xgoRyg2uojU=.58faa8c9-7cde-47c1-a902-a423803cdd35@github.com> Message-ID: On Tue, 20 May 2025 09:17:50 GMT, Thomas Schatzl wrote: > Hi all, > > please review this small update to the parallel reference processing option in the manpage. > > Testing: compilation/manpage building? > > Thomas Marked as reviewed by kbarrett (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25323#pullrequestreview-2855257657 From Monica.Beckwith at microsoft.com Wed May 21 04:09:52 2025 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Wed, 21 May 2025 04:09:52 +0000 Subject: G1 AHS + Request for Feedback and Testing on G1 Heap Resizing Prototype In-Reply-To: References: Message-ID: Hello everyone, I have created a new sub-task: https://bugs.openjdk.org/browse/JDK-8357296 under the umbrella bug JDK-8353716 for G1 AHS. This sub-task implements the coordination mechanism that integrates all other AHS components. The key aspects of this implementation are: 1. Integration Points: - Coordinates SoftMax policy (JDK-8236073) with heap sizing decisions - Applies GCTimeRatio targets (JDK-8247843) in control loop - Triggers shrink decisions (JDK-8238687) based on GC metrics - Schedules memory uncommit (JDK-8238686) when appropriate 2. Key Files Modified: g1HeapSizingPolicy.{cpp,hpp} - Core control loop implementation g1_globals.hpp - AHS configuration framework 3. Runtime Controls: Primary controls: GCTimeRatio - GC vs application time ratio GCCpuOverheadTarget - Direct CPU target percentage SoftMaxHeapSize - Soft maximum heap size limit Tuning parameters: G1AHSDampingFactor - Control loop response damping G1UncommitDelay - Memory uncommit delay time Component Dependencies: [JDK-8353716] AHS Umbrella | v [JDK-8357296] Core Implementation | +-------------------+------------------+-----------------+ | | | | v v v v [JDK-8238687] [JDK-8236073] [JDK-8247843] [JDK-8238686] Shrink ????SoftMax ???GCTimeRatio Uncommit For more details, please see: https://github.com/microsoft/openjdk-workstreams/blob/main/G1-AHS/README.md I will submit a PR soon. Comments and suggestions are welcome either now or during the PR review. Thanks, Monica [cid:6398dce8-7e11-43e0-8034-10c796a60811] Book time to meet with me ________________________________ From: Monica Beckwith Sent: Thursday, May 8, 2025 5:47 PM To: hotspot-gc-dev at openjdk.org ; Ivan Walulya ; Thomas Schatzl ; Man Cao Subject: G1 AHS + Request for Feedback and Testing on G1 Heap Resizing Prototype Hi all, Thanks to everyone for the ongoing AHS discussions across 8236073, 8238686/87, and umbrella JDK-8353716. From the Microsoft side, we have been reviewing logs from a range of prod-like use cases across the broader MSFT environment, including first-party Java services (both Azure-hosted and non-Azure), as well as OSS-based deployments (Cassandra, Kafka, etc). We've also been benchmarking with various combinations (ReservePercent, GCTimeRatio, periodic GC, etc) and exploring early models to help gauge expected shrink/grow behavior under service conditions. These observations have shaped our perspective and contributions to upstream design discussions. Here's where we currently stand: ------------------------------------------------------------------------ 1. SoftMaxHeapSize semantics and placement ------------------------------------------------------------------------ We continue to support the current SoftMax proposal as a **soft upper bound** on heap usage?one that the GC controller respects, but may temporarily exceed if necessary. Our analysis of logs shows that an effective SoftMax, even when static, would help reduce RSS under light traffic without requiring aggressive full GCs. We also plan to evaluate the controller changes under PR #24211 once they?re merged, and we?d like to keep the option of a `jcmd GC.set_soft_max` interface, consistent with ZGC and future container signals (e.g. memory.high). ------------------------------------------------------------------------ 2. GCTimeRatio as a feedback driver ------------------------------------------------------------------------ We support the move to a higher default value for `GCTimeRatio` as it aligns well with throughput goals in our measured workloads, including SPECjbb2015, DBs, and Spring-based services. We plan to continue stepped testing across representative service patterns. We'd also support exposing an alias like `-XX:GCCPUPercent` to improve ergonomics for operators. ------------------------------------------------------------------------ 3. Reserve floor and shrink control ------------------------------------------------------------------------ We strongly recommend retaining `G1ReservePercent` as a configurable minimum, particularly in low-latency scenarios or when allocation bursts are expected immediately after idle phases. We?d also be open to exploring future adaptive variants of the reserve floor as the AHS loop matures. ------------------------------------------------------------------------ 4. Periodic GC fallback and field heuristics ------------------------------------------------------------------------ Until AHS-driven shrink behavior is well understood and widely adopted, we recommend retaining a periodic GC safety net?especially for services with extended idle phases. As AHS matures, we?ll continue to evaluate whether this fallback remains necessary in production. ------------------------------------------------------------------------ 5. Role of externally-supplied limits ------------------------------------------------------------------------ Internally, we?ve discussed how AHS should behave in managed container environments such as AKS. In most cases we expect the JVM to operate within cgroup-defined memory.max and possibly memory.high bounds. We don?t currently envision supporting non-cgroup (custom/embedded) environments on day one. We also believe that memory.high or RSS-based constraints could eventually serve as complementary signals for guiding heap elasticity, especially for AKS customers. These use cases are still exploratory, but we hope they can be accommodated within the direction of AHS without adding undue complexity to the core loop. ------------------------------------------------------------------------ 6. Design notes and alignment ------------------------------------------------------------------------ For reference, our current AHS evaluation and alignment write-up (including control flow diagrams and tuning strategy) is here: https://github.com/microsoft/openjdk-workstreams/tree/main/G1-AHS We?ll continue to update that as PRs land and more data becomes available. We welcome any feedback on the write-up or our alignment approach and would be happy to incorporate community input via PRs. We are also open to hosting the write-up within an OpenJDK project repo if that's deemed appropriate. Thanks again to everyone driving this effort forward?happy to continue refining as the pieces come together. Best regards, Monica -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-4rdhfupb.png Type: image/png Size: 528 bytes Desc: Outlook-4rdhfupb.png URL: From kirk at kodewerk.com Wed May 21 04:24:16 2025 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Tue, 20 May 2025 21:24:16 -0700 Subject: G1 AHS + Request for Feedback and Testing on G1 Heap Resizing Prototype In-Reply-To: References: Message-ID: Hi, Couple of questions which may have been answered but? > On May 20, 2025, at 9:09?PM, Monica Beckwith wrote: > > Hello everyone, > > I have created a new sub-task: https://bugs.openjdk.org/browse/JDK-8357296 under the umbrella bug JDK-8353716 for G1 AHS. > > This sub-task implements the coordination mechanism that integrates all other AHS components. > > The key aspects of this implementation are: > > 1. Integration Points: > > - Coordinates SoftMax policy (JDK-8236073) with heap sizing decisions > - Applies GCTimeRatio targets (JDK-8247843) in control loop > - Triggers shrink decisions (JDK-8238687) based on GC metrics Which GC metrics are used to drive sizing and what EW the expected behaviours and reactions? > - Schedules memory uncommit (JDK-8238686) when appropriate > > 2. Key Files Modified: > g1HeapSizingPolicy.{cpp,hpp} - Core control loop implementation > g1_globals.hpp - AHS configuration framework > > 3. Runtime Controls: > > Primary controls: > GCTimeRatio - GC vs application time ratio > GCCpuOverheadTarget - Direct CPU target percentage How do you anticipate that GCTimeRatio and GCCpuOverheadTarget would work together? > SoftMaxHeapSize - Soft maximum heap size limit > > Tuning parameters: > G1AHSDampingFactor - Control loop response damping > G1UncommitDelay - Memory uncommit delay time > > Component Dependencies: > > [JDK-8353716] > AHS Umbrella > | > v > [JDK-8357296] > Core Implementation > | > +-------------------+------------------+-----------------+ > | | | | > v v v v > [JDK-8238687] [JDK-8236073] [JDK-8247843] [JDK-8238686] > Shrink ????SoftMax ???GCTimeRatio Uncommit > > For more details, please see: > https://github.com/microsoft/openjdk-workstreams/blob/main/G1-AHS/README.md > > I will submit a PR soon. Comments and suggestions are welcome either now or during the PR review. > > Thanks, > Monica > > > Book time to meet with me > > From: Monica Beckwith > Sent: Thursday, May 8, 2025 5:47 PM > To: hotspot-gc-dev at openjdk.org ; Ivan Walulya ; Thomas Schatzl ; Man Cao > Subject: G1 AHS + Request for Feedback and Testing on G1 Heap Resizing Prototype > > Hi all, > > Thanks to everyone for the ongoing AHS discussions across 8236073, 8238686/87, and umbrella JDK-8353716. > > From the Microsoft side, we have been reviewing logs from a range of prod-like use cases across the broader MSFT environment, including first-party Java services (both Azure-hosted and non-Azure), as well as OSS-based deployments (Cassandra, Kafka, etc). We've also been benchmarking with various combinations (ReservePercent, GCTimeRatio, periodic GC, etc) and exploring early models to help gauge expected shrink/grow behavior under service conditions. These observations have shaped our perspective and contributions to upstream design discussions. > > Here's where we currently stand: > > ------------------------------------------------------------------------ > 1. SoftMaxHeapSize semantics and placement > ------------------------------------------------------------------------ > > We continue to support the current SoftMax proposal as a **soft upper bound** on heap usage?one that the GC controller respects, but may temporarily exceed if necessary. Our analysis of logs shows that an effective SoftMax, even when static, would help reduce RSS under light traffic without requiring aggressive full GCs. > > We also plan to evaluate the controller changes under PR #24211 once they?re merged, and we?d like to keep the option of a `jcmd GC.set_soft_max` interface, consistent with ZGC and future container signals (e.g. memory.high). > > ------------------------------------------------------------------------ > 2. GCTimeRatio as a feedback driver > ------------------------------------------------------------------------ > > We support the move to a higher default value for `GCTimeRatio` as it aligns well with throughput goals in our measured workloads, including SPECjbb2015, DBs, and Spring-based services. We plan to continue stepped testing across representative service patterns. We'd also support exposing an alias like `-XX:GCCPUPercent` to improve ergonomics for operators. > > ------------------------------------------------------------------------ > 3. Reserve floor and shrink control > ------------------------------------------------------------------------ > > We strongly recommend retaining `G1ReservePercent` as a configurable minimum, particularly in low-latency scenarios or when allocation bursts are expected immediately after idle phases. We?d also be open to exploring future adaptive variants of the reserve floor as the AHS loop matures. > > ------------------------------------------------------------------------ > 4. Periodic GC fallback and field heuristics > ------------------------------------------------------------------------ > > Until AHS-driven shrink behavior is well understood and widely adopted, we recommend retaining a periodic GC safety net?especially for services with extended idle phases. As AHS matures, we?ll continue to evaluate whether this fallback remains necessary in production. > > ------------------------------------------------------------------------ > 5. Role of externally-supplied limits > ------------------------------------------------------------------------ > > Internally, we?ve discussed how AHS should behave in managed container environments such as AKS. In most cases we expect the JVM to operate within cgroup-defined memory.max and possibly memory.high bounds. > > We don?t currently envision supporting non-cgroup (custom/embedded) environments on day one. We also believe that memory.high or RSS-based constraints could eventually serve as complementary signals for guiding heap elasticity, especially for AKS customers. > > These use cases are still exploratory, but we hope they can be accommodated within the direction of AHS without adding undue complexity to the core loop. > > ------------------------------------------------------------------------ > 6. Design notes and alignment > ------------------------------------------------------------------------ > > For reference, our current AHS evaluation and alignment write-up (including control flow diagrams and tuning strategy) is here: > > https://github.com/microsoft/openjdk-workstreams/tree/main/G1-AHS > > We?ll continue to update that as PRs land and more data becomes available. We welcome any feedback on the write-up or our alignment approach and would be happy to incorporate community input via PRs. We are also open to hosting the write-up within an OpenJDK project repo if that's deemed appropriate. > > Thanks again to everyone driving this effort forward?happy to continue refining as the pieces come together. > > Best regards, > Monica -------------- next part -------------- An HTML attachment was scrubbed... URL: From aboldtch at openjdk.org Wed May 21 06:02:59 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 21 May 2025 06:02:59 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v4] In-Reply-To: <3Z6vsQuZdv_yxZa2IFMWx4T5eBuJ56dD53HmR0r-jLQ=.299ce304-d769-4bd6-b3a1-af8892320227@github.com> References: <3Z6vsQuZdv_yxZa2IFMWx4T5eBuJ56dD53HmR0r-jLQ=.299ce304-d769-4bd6-b3a1-af8892320227@github.com> Message-ID: <8XYO5rI5BeobNdnjvgI6XSK-33UWoV6oj0GglYSGT18=.25b7f6d3-2cc4-4d76-bfc4-997dd4fe413b@github.com> On Tue, 20 May 2025 06:51:37 GMT, Axel Boldt-Christmas wrote: >> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. >> >> The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. >> >> Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. >> >> Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8356455 > - Use atomic load of tree size in print_on > - cache_replace comments > - Sort order > - Use private inheritance > - Separate tree logic to own class > - 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree Thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/25112#issuecomment-2896697560 From aboldtch at openjdk.org Wed May 21 06:03:00 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 21 May 2025 06:03:00 GMT Subject: Integrated: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree In-Reply-To: References: Message-ID: On Thu, 8 May 2025 05:21:20 GMT, Axel Boldt-Christmas wrote: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. This pull request has now been integrated. Changeset: 50e873f0 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/50e873f0e88d6643586907dea5731d739b7826dc Stats: 2215 lines in 5 files changed: 139 ins; 2032 del; 44 mod 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree Reviewed-by: stefank, eosterlund, jsikstro ------------- PR: https://git.openjdk.org/jdk/pull/25112 From tschatzl at openjdk.org Wed May 21 07:17:26 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 21 May 2025 07:17:26 GMT Subject: RFR: 8357306: G1: Remove _gc_succeeded from VM_G1CollectForAllocation because it is always true Message-ID: <5KzvE3ghL7_z59-qqjHDSgK_MIPDtfwcBqK7R6svX1o=.35135368-a45f-4cf7-8bc3-8042e45df353@github.com> Hi all, please review this refactoring of G1 VM GC operations to remove the _gc_succeeded members because they are not necessary any more - GC operations themselves (i.e. the doit() part) always succeed. Testing: tier1-3, gha Thanks, Thomas ------------- Commit messages: - * remove comment - * fix try-concurrent-... - * some minor refactoring - 8357306 Changes: https://git.openjdk.org/jdk/pull/25320/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25320&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357306 Stats: 57 lines in 4 files changed: 2 ins; 24 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/25320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25320/head:pull/25320 PR: https://git.openjdk.org/jdk/pull/25320 From iwalulya at openjdk.org Wed May 21 08:10:12 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 21 May 2025 08:10:12 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v4] In-Reply-To: <92Vf3UjIhUOJYwfUnIyulaxKQdSBiLy_caiNo_9zc5I=.71b5f327-8231-4bcd-a83c-85ca98c7911c@github.com> References: <92Vf3UjIhUOJYwfUnIyulaxKQdSBiLy_caiNo_9zc5I=.71b5f327-8231-4bcd-a83c-85ca98c7911c@github.com> Message-ID: On Tue, 20 May 2025 15:34:29 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - use align_up_to_region_byte_size > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - Thomas Review > - nit > - refactor full collection Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24944#issuecomment-2896999587 From iwalulya at openjdk.org Wed May 21 08:10:14 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 21 May 2025 08:10:14 GMT Subject: Integrated: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 09:02:43 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. > > Testing: Tier 1-3 This pull request has now been integrated. Changeset: 91194517 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/91194517c75a96fe7bcc2dcf5e9c42af9cf5975a Stats: 42 lines in 8 files changed: 19 ins; 0 del; 23 mod 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/24944 From shade at openjdk.org Wed May 21 09:39:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 May 2025 09:39:13 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v18] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - Spin lock induces false sharing - Merge branch 'master' into JDK-8231269-compile-task-weaks - Merge branch 'master' into JDK-8231269-compile-task-weaks - Rename CompilerTask::is_unloaded back to avoid losing comment context - Simplify select_for_compilation - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - Fix release builds - More thorough locking and redefinition escape hatch - Fix build failures: add more headers - ... and 22 more: https://git.openjdk.org/jdk/compare/a0cdf36b...51390bc2 ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=17 Stats: 427 lines in 12 files changed: 384 ins; 19 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From stefank at openjdk.org Wed May 21 09:55:02 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 21 May 2025 09:55:02 GMT Subject: RFR: 8357443: ZGC: Optimize old page iteration in remap remembered phase Message-ID: Before starting the relocation phase of a major collection we remap all pointers into the young generation so that we can disambiguate when an oop has bad bits for both the young generation and the old generation. See comment in remap_young_roots. One part of this is requires us to visit all old pages. To parallelize that part we have a class that distribute indices to the page table to the GC worker threads (See ZIndexDistributor). While looking into a potential, minor performance regression on Windows I noticed that the usage of constexpr in ZIndexDistributorClaimTree wasn't giving us the inlining we hoped for, which caused a noticeable worse performance on Windows compared to the other platforms. I created a patch for this that gave us the expected inlining. See https://github.com/openjdk/jdk/compare/master...stefank:jdk:8357443_zgc_optimize_remap_remembered While thinking about this a bit more I realized that we could use the "found old" optimization that we already use for the remset scanning. This finds the old pages without scanning the entire page table. This gives a significant enough boost that I propose that we do that instead. This mainly lowers the Major Collection times when you run a GC without any significant amount of objects in the old generation. So, most likely mostly important for micro benchmarks and small workloads. The below is the average time (ms) of the Concurrent Remap Roots phase from only running `System.gc()` 50 times before and after this PR. 4 GB MaxHeapSize Original Patch Default threads mac: 0.27812 0.0507 win: 0.9485 0.10452 linux-x64: 0.53858 0.092 linux-x64 NUMA: 0.89974 0.15452 linux-aarch64: 0.32574 0.15832 4 threads mac: 0.19112 0.04916 win: 0.83346 0.08796 linux-x64: 0.57692 0.09526 linux-x64 NUMA: 1.23684 0.17008 linux-aarch64: 0.334 0.21918 1 thread: mac: 0.19678 0.0589 win: 1.96496 0.09928 linux-x64: 1.00788 0.1381 linux-x64 NUMA: 2.77312 0.21134 linux-aarch64: 0.63696 0.31286 The second set of data is from using the extreme end of the supported heap size. This mimics how we previously used to have a large page table even for smaller heap size (we don't do that anymore for JDK 25). It shows a quite significant difference, but it also will likely be in the noise when running larger workloads. 16 TB MaxHeapSize Original Patch Default threads mac: 11.4903 0.11098 win: 54.3666 0.37164 linux-x64: 18.0898 0.21094 linux-x64 NUMA: 26.9786 0.46134 linux-aarch64: 20.7151 0.32846 4 threads mac: 6.4035 0.10096 win: 89.5496 0.32178 linux-x64: 27.883 0.2053 linux-x64 NUMA: 35.5636 0.30928 linux-aarch64: 15.4857 0.32004 1 thread: mac: 21.2717 0.1275 win: 307.155 0.3361 linux-x64: 62.5843 0.2309 linux-x64 NUMA: 92.0048 0.3798 linux-aarch64: 61.0375 0.42458 This change removes the last usage of ZIndexDistributor. I don't know if we want to remove it, or leave it in case we need it for any of our upcoming features. I've run this through tier1-7. ------------- Commit messages: - 8357443: ZGC: Optimize old page iteration in remap remembered phase Changes: https://git.openjdk.org/jdk/pull/25345/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25345&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357443 Stats: 107 lines in 4 files changed: 49 ins; 14 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/25345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25345/head:pull/25345 PR: https://git.openjdk.org/jdk/pull/25345 From tschatzl at openjdk.org Wed May 21 10:08:29 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 21 May 2025 10:08:29 GMT Subject: RFR: 8357306: G1: Remove _gc_succeeded from VM_G1CollectForAllocation because it is always true [v2] In-Reply-To: <5KzvE3ghL7_z59-qqjHDSgK_MIPDtfwcBqK7R6svX1o=.35135368-a45f-4cf7-8bc3-8042e45df353@github.com> References: <5KzvE3ghL7_z59-qqjHDSgK_MIPDtfwcBqK7R6svX1o=.35135368-a45f-4cf7-8bc3-8042e45df353@github.com> Message-ID: > Hi all, > > please review this refactoring of G1 VM GC operations to remove the _gc_succeeded members because they are not necessary any more - GC operations themselves (i.e. the doit() part) always succeed. > > Testing: tier1-3, gha > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into 8357306-remove-gc-succeeded-in-g1-vm-operations - * remove comment - * fix try-concurrent-... - * some minor refactoring - 8357306 Hi all, please review this refactoring of G1 VM GC operations to remove the _gc_succeeded members because they are not necessary any more - GC operations themselves (i.e. the doit() part) always succeed. Testing: tier1-3, gha Thanks, Thomas ------------- Changes: https://git.openjdk.org/jdk/pull/25320/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25320&range=01 Stats: 55 lines in 4 files changed: 2 ins; 22 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/25320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25320/head:pull/25320 PR: https://git.openjdk.org/jdk/pull/25320 From iwalulya at openjdk.org Wed May 21 10:13:54 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 21 May 2025 10:13:54 GMT Subject: RFR: 8357018: Guidance for ParallelRefProcEnabled is wrong in the man pages In-Reply-To: <7wtVnHeMGJhAjmqOxi3d6HuUzcHqDaX0xgoRyg2uojU=.58faa8c9-7cde-47c1-a902-a423803cdd35@github.com> References: <7wtVnHeMGJhAjmqOxi3d6HuUzcHqDaX0xgoRyg2uojU=.58faa8c9-7cde-47c1-a902-a423803cdd35@github.com> Message-ID: On Tue, 20 May 2025 09:17:50 GMT, Thomas Schatzl wrote: > Hi all, > > please review this small update to the parallel reference processing option in the manpage. > > Testing: compilation/manpage building? > > Thomas Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25323#pullrequestreview-2857121725 From thomas.schatzl at oracle.com Wed May 21 10:39:11 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 21 May 2025 12:39:11 +0200 Subject: G1 AHS + Request for Feedback and Testing on G1 Heap Resizing Prototype In-Reply-To: References: Message-ID: Hi Monica, thanks for providing more input on this issue. On 21.05.25 06:09, Monica Beckwith wrote: > Hello everyone, > > I have created a new sub-task: https://bugs.openjdk.org/browse/ > JDK-8357296 under the > umbrella bug JDK-8353716 for G1 AHS. > > This sub-task implements the coordination mechanism that integrates all > other AHS components. Not sure if "implementation" means "code for some API to fill in" here because the suggested component seems fairly abstract in nature. Obviously I haven't seen that code, but that makes me even more nervous :) [I tend to be a more bottom-up guy that first prototypes the pieces, each providing some value when integrated if possible (with the end goal in mind), and think about how to coordinate the currently implemented changes as needed. Iterate often instead of some in-advance top-down architecture. But I have no idea what the contents of that implementation are] > > The key aspects of this implementation are: > > 1. Integration Points: > > - Coordinates SoftMax policy (JDK-8236073) with heap sizing decisions > - Applies GCTimeRatio targets (JDK-8247843) in control loop > - Triggers shrink decisions (JDK-8238687) based on GC metrics > - Schedules memory uncommit (JDK-8238686) when appropriate > > 2. Key Files Modified: > ? g1HeapSizingPolicy.{cpp,hpp} - Core control loop implementation > ? g1_globals.hpp ? ? ? ? ? ? ?- AHS configuration framework > > 3. Runtime Controls: > > Primary controls: > ? GCTimeRatio ? ? ? ? ? - GC vs application time ratio > ? GCCpuOverheadTarget ? - Direct CPU target percentage To answer Kirk's question in the other thread: > How do you anticipate that GCTimeRatio and GCCpuOverheadTarget would > work together? Since they are effectively the same thing, my _current_ initial idea would be to when specifying both will make one override the other with an appropriate warning. Done. Since GCCpuOverheadTarget is only a usability issue we might delay its introduction a bit, and does not seem urgent. There is probably more to be discussed with people maintaining the other GCs too. > ? SoftMaxHeapSize ? ? ? - Soft maximum heap size limit > > Tuning parameters: > ? G1AHSDampingFactor ? ?- Control loop response damping Not exactly sure what this means and how this interacts with other known to be needed ways to help with issues found during testing. We tend to be very wary of adding new product options. One problem with this single option is that it will be a tradeoff: what at least came up before multiple times is control about - extent of heap change (may also depend on situation, e.g. startup boost) - response time of heap change for both heap expansion and shrinking (e.g. https://bugs.openjdk.org/browse/JDK-8349939, https://bugs.openjdk.org/browse/JDK-8349978; abstracting a bit from them). Recent internal discussion of some test results for JDK-8238687 strongly suggested that this control should be dependent on type of heap change. Even then, as diagnostic options. What I wanted to say here is that, _if_ that flag's purpose would be as described above, it seems to be insufficient. There is also the problem of integrating this proposal with the current prototype for JDK-8238687. One could only use that as additional factor for determining the (iirc currently internal magic constants) above, but seems unnecessary at first thought. > ? G1UncommitDelay ? ? ? - Memory uncommit delay time The purpose of this parameter is not completely clear to me; what delay is supposed to be controlled here? Would be nice to explain this a bit more as I could not find anything in that repo. The referenced CR seems to be about something different too. Probably you are envisioning something like ZUncommitDelay here, where the GC tracks region usage/access and uncommits regions that have not been used for a "long" time. I think that is one way to attack the issue that JDK-8238687 has: since heap size evaluation is on a gc basis, if there are no GCs since the application is idle, memory will be kept committed unnecessarily for a long time. Probably I mentioned this issue once or twice somewhere. I tried to sum up the problem and provide some background about other similar issues/mechanisms in the new "JDK-8357445: G1: Time-based heap size re-evaluation". > [...]> I will submit a PR soon. Comments and suggestions are welcome either now > or during the PR review. > It would be great if we main contributors could meet in a video call to discuss findings in the recent months, current state, results and thoughts in the next time, it seems. There is also no need to submit a formal PR, before that it would be as fine to just initially send a patch (e.g. create a branch on github and send a link) here for initial discussion like Ivan provided recently. Thanks, Thomas From shade at openjdk.org Wed May 21 11:10:15 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 May 2025 11:10:15 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v19] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More touchups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/51390bc2..f6bbc8d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=17-18 Stats: 20 lines in 2 files changed: 12 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From aboldtch at openjdk.org Wed May 21 11:49:52 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 21 May 2025 11:49:52 GMT Subject: RFR: 8357443: ZGC: Optimize old page iteration in remap remembered phase In-Reply-To: References: Message-ID: On Wed, 21 May 2025 09:49:52 GMT, Stefan Karlsson wrote: > Before starting the relocation phase of a major collection we remap all pointers into the young generation so that we can disambiguate when an oop has bad bits for both the young generation and the old generation. See comment in remap_young_roots. > > One part of this is requires us to visit all old pages. To parallelize that part we have a class that distribute indices to the page table to the GC worker threads (See ZIndexDistributor). > > While looking into a potential, minor performance regression on Windows I noticed that the usage of constexpr in ZIndexDistributorClaimTree wasn't giving us the inlining we hoped for, which caused a noticeable worse performance on Windows compared to the other platforms. I created a patch for this that gave us the expected inlining. See https://github.com/openjdk/jdk/compare/master...stefank:jdk:8357443_zgc_optimize_remap_remembered > > While thinking about this a bit more I realized that we could use the "found old" optimization that we already use for the remset scanning. This finds the old pages without scanning the entire page table. This gives a significant enough boost that I propose that we do that instead. > > This mainly lowers the Major Collection times when you run a GC without any significant amount of objects in the old generation. So, most likely mostly important for micro benchmarks and small workloads. > > The below is the average time (ms) of the Concurrent Remap Roots phase from only running `System.gc()` 50 times before and after this PR. > > > 4 GB MaxHeapSize > Original Patch > Default threads > > mac: 0.27812 0.0507 > win: 0.9485 0.10452 > linux-x64: 0.53858 0.092 > linux-x64 NUMA: 0.89974 0.15452 > linux-aarch64: 0.32574 0.15832 > > 4 threads > > mac: 0.19112 0.04916 > win: 0.83346 0.08796 > linux-x64: 0.57692 0.09526 > linux-x64 NUMA: 1.23684 0.17008 > linux-aarch64: 0.334 0.21918 > > 1 thread: > > mac: 0.19678 0.0589 > win: 1.96496 0.09928 > linux-x64: 1.00788 0.1381 > linux-x64 NUMA: 2.77312 0.21134 > linux-aarch64: 0.63696 0.31286 > > > The second set of data is from using the extreme end of the supported heap size. This mimics how we previously used to have a large page table even for smaller heap size (we don't do that anymore for JDK 25). It shows a quite significant difference, bu... lgtm. Very nice to use information we are already tracking rather than walking everything. > This change removes the last usage of ZIndexDistributor. I don't know if we want to remove it, or leave it in case we need it for any of our upcoming features. It is probably nice too at least keep our page table iterators in the code base, so you do not have to go dig them up / do something ad hoc if you ever want to check something. Whether they need ZIndexDistributor or not is another question. src/hotspot/share/gc/z/zGeneration.cpp line 952: > 950: > 951: ZRemembered* ZGenerationYoung::remembered() { > 952: return &_remembered; Suggestion: return &_remembered; src/hotspot/share/gc/z/zRemembered.cpp line 405: > 403: > 404: // This iterator uses the "found old" optimization. > 405: bool ZRemsetTableIterator::next(ZRemsetTableEntry* entry_addr) { Suggestion: bool ZRemsetTableIterator::next(ZRemsetTableEntry* entry_addr) { src/hotspot/share/gc/z/zRemembered.cpp line 475: > 473: _remembered(remembered), > 474: _mark(mark), > 475: _remset_table_iterator(remembered, true /* previous */) { Suggestion: _remset_table_iterator(remembered, true /* previous */) { ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25345#pullrequestreview-2857188465 PR Review Comment: https://git.openjdk.org/jdk/pull/25345#discussion_r2099943312 PR Review Comment: https://git.openjdk.org/jdk/pull/25345#discussion_r2100063807 PR Review Comment: https://git.openjdk.org/jdk/pull/25345#discussion_r2100064432 From stefank at openjdk.org Wed May 21 12:45:09 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 21 May 2025 12:45:09 GMT Subject: RFR: 8357443: ZGC: Optimize old page iteration in remap remembered phase [v2] In-Reply-To: References: Message-ID: > Before starting the relocation phase of a major collection we remap all pointers into the young generation so that we can disambiguate when an oop has bad bits for both the young generation and the old generation. See comment in remap_young_roots. > > One part of this is requires us to visit all old pages. To parallelize that part we have a class that distribute indices to the page table to the GC worker threads (See ZIndexDistributor). > > While looking into a potential, minor performance regression on Windows I noticed that the usage of constexpr in ZIndexDistributorClaimTree wasn't giving us the inlining we hoped for, which caused a noticeable worse performance on Windows compared to the other platforms. I created a patch for this that gave us the expected inlining. See https://github.com/openjdk/jdk/compare/master...stefank:jdk:8357443_zgc_optimize_remap_remembered > > While thinking about this a bit more I realized that we could use the "found old" optimization that we already use for the remset scanning. This finds the old pages without scanning the entire page table. This gives a significant enough boost that I propose that we do that instead. > > This mainly lowers the Major Collection times when you run a GC without any significant amount of objects in the old generation. So, most likely mostly important for micro benchmarks and small workloads. > > The below is the average time (ms) of the Concurrent Remap Roots phase from only running `System.gc()` 50 times before and after this PR. > > > 4 GB MaxHeapSize > Original Patch > Default threads > > mac: 0.27812 0.0507 > win: 0.9485 0.10452 > linux-x64: 0.53858 0.092 > linux-x64 NUMA: 0.89974 0.15452 > linux-aarch64: 0.32574 0.15832 > > 4 threads > > mac: 0.19112 0.04916 > win: 0.83346 0.08796 > linux-x64: 0.57692 0.09526 > linux-x64 NUMA: 1.23684 0.17008 > linux-aarch64: 0.334 0.21918 > > 1 thread: > > mac: 0.19678 0.0589 > win: 1.96496 0.09928 > linux-x64: 1.00788 0.1381 > linux-x64 NUMA: 2.77312 0.21134 > linux-aarch64: 0.63696 0.31286 > > > The second set of data is from using the extreme end of the supported heap size. This mimics how we previously used to have a large page table even for smaller heap size (we don't do that anymore for JDK 25). It shows a quite significant difference, bu... Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Axel Boldt-Christmas ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25345/files - new: https://git.openjdk.org/jdk/pull/25345/files/8097f3fb..b8d04be7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25345&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25345&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25345/head:pull/25345 PR: https://git.openjdk.org/jdk/pull/25345 From aboldtch at openjdk.org Wed May 21 12:45:10 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 21 May 2025 12:45:10 GMT Subject: RFR: 8357443: ZGC: Optimize old page iteration in remap remembered phase [v2] In-Reply-To: References: Message-ID: <-SAyBRWRAqyDKVA6ImuRZ3AR9HS1XrOeqWoscMiDR3g=.7d3d3fed-beb1-47f0-8d71-733c0438cb6b@github.com> On Wed, 21 May 2025 12:42:32 GMT, Stefan Karlsson wrote: >> Before starting the relocation phase of a major collection we remap all pointers into the young generation so that we can disambiguate when an oop has bad bits for both the young generation and the old generation. See comment in remap_young_roots. >> >> One part of this is requires us to visit all old pages. To parallelize that part we have a class that distribute indices to the page table to the GC worker threads (See ZIndexDistributor). >> >> While looking into a potential, minor performance regression on Windows I noticed that the usage of constexpr in ZIndexDistributorClaimTree wasn't giving us the inlining we hoped for, which caused a noticeable worse performance on Windows compared to the other platforms. I created a patch for this that gave us the expected inlining. See https://github.com/openjdk/jdk/compare/master...stefank:jdk:8357443_zgc_optimize_remap_remembered >> >> While thinking about this a bit more I realized that we could use the "found old" optimization that we already use for the remset scanning. This finds the old pages without scanning the entire page table. This gives a significant enough boost that I propose that we do that instead. >> >> This mainly lowers the Major Collection times when you run a GC without any significant amount of objects in the old generation. So, most likely mostly important for micro benchmarks and small workloads. >> >> The below is the average time (ms) of the Concurrent Remap Roots phase from only running `System.gc()` 50 times before and after this PR. >> >> >> 4 GB MaxHeapSize >> Original Patch >> Default threads >> >> mac: 0.27812 0.0507 >> win: 0.9485 0.10452 >> linux-x64: 0.53858 0.092 >> linux-x64 NUMA: 0.89974 0.15452 >> linux-aarch64: 0.32574 0.15832 >> >> 4 threads >> >> mac: 0.19112 0.04916 >> win: 0.83346 0.08796 >> linux-x64: 0.57692 0.09526 >> linux-x64 NUMA: 1.23684 0.17008 >> linux-aarch64: 0.334 0.21918 >> >> 1 thread: >> >> mac: 0.19678 0.0589 >> win: 1.96496 0.09928 >> linux-x64: 1.00788 0.1381 >> linux-x64 NUMA: 2.77312 0.21134 >> linux-aarch64: 0.63696 0.31286 >> >> >> The second set of data is from using the extreme end of the supported heap size. This mimics how we previously used to have a large page table even for smaller heap size ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Axel Boldt-Christmas Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25345#pullrequestreview-2857566293 From tschatzl at openjdk.org Wed May 21 12:56:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 21 May 2025 12:56:56 GMT Subject: RFR: 8357018: Guidance for ParallelRefProcEnabled is wrong in the man pages In-Reply-To: References: <7wtVnHeMGJhAjmqOxi3d6HuUzcHqDaX0xgoRyg2uojU=.58faa8c9-7cde-47c1-a902-a423803cdd35@github.com> Message-ID: On Tue, 20 May 2025 18:46:52 GMT, Kim Barrett wrote: >> Hi all, >> >> please review this small update to the parallel reference processing option in the manpage. >> >> Testing: compilation/manpage building? >> >> Thomas > > Marked as reviewed by kbarrett (Reviewer). Thanks @kimbarrett @walulyai for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25323#issuecomment-2897861913 From tschatzl at openjdk.org Wed May 21 12:56:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 21 May 2025 12:56:57 GMT Subject: Integrated: 8357018: Guidance for ParallelRefProcEnabled is wrong in the man pages In-Reply-To: <7wtVnHeMGJhAjmqOxi3d6HuUzcHqDaX0xgoRyg2uojU=.58faa8c9-7cde-47c1-a902-a423803cdd35@github.com> References: <7wtVnHeMGJhAjmqOxi3d6HuUzcHqDaX0xgoRyg2uojU=.58faa8c9-7cde-47c1-a902-a423803cdd35@github.com> Message-ID: On Tue, 20 May 2025 09:17:50 GMT, Thomas Schatzl wrote: > Hi all, > > please review this small update to the parallel reference processing option in the manpage. > > Testing: compilation/manpage building? > > Thomas This pull request has now been integrated. Changeset: a175767c Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/a175767ccfb3dbcc04d1ba97f9fb2f57dc5ab5cf Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod 8357018: Guidance for ParallelRefProcEnabled is wrong in the man pages Reviewed-by: kbarrett, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/25323 From stefank at openjdk.org Wed May 21 13:02:51 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 21 May 2025 13:02:51 GMT Subject: RFR: 8357443: ZGC: Optimize old page iteration in remap remembered phase [v2] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 12:45:09 GMT, Stefan Karlsson wrote: >> Before starting the relocation phase of a major collection we remap all pointers into the young generation so that we can disambiguate when an oop has bad bits for both the young generation and the old generation. See comment in remap_young_roots. >> >> One part of this is requires us to visit all old pages. To parallelize that part we have a class that distribute indices to the page table to the GC worker threads (See ZIndexDistributor). >> >> While looking into a potential, minor performance regression on Windows I noticed that the usage of constexpr in ZIndexDistributorClaimTree wasn't giving us the inlining we hoped for, which caused a noticeable worse performance on Windows compared to the other platforms. I created a patch for this that gave us the expected inlining. See https://github.com/openjdk/jdk/compare/master...stefank:jdk:8357443_zgc_optimize_remap_remembered >> >> While thinking about this a bit more I realized that we could use the "found old" optimization that we already use for the remset scanning. This finds the old pages without scanning the entire page table. This gives a significant enough boost that I propose that we do that instead. >> >> This mainly lowers the Major Collection times when you run a GC without any significant amount of objects in the old generation. So, most likely mostly important for micro benchmarks and small workloads. >> >> The below is the average time (ms) of the Concurrent Remap Roots phase from only running `System.gc()` 50 times before and after this PR. >> >> >> 4 GB MaxHeapSize >> Original Patch >> Default threads >> >> mac: 0.27812 0.0507 >> win: 0.9485 0.10452 >> linux-x64: 0.53858 0.092 >> linux-x64 NUMA: 0.89974 0.15452 >> linux-aarch64: 0.32574 0.15832 >> >> 4 threads >> >> mac: 0.19112 0.04916 >> win: 0.83346 0.08796 >> linux-x64: 0.57692 0.09526 >> linux-x64 NUMA: 1.23684 0.17008 >> linux-aarch64: 0.334 0.21918 >> >> 1 thread: >> >> mac: 0.19678 0.0589 >> win: 1.96496 0.09928 >> linux-x64: 1.00788 0.1381 >> linux-x64 NUMA: 2.77312 0.21134 >> linux-aarch64: 0.63696 0.31286 >> >> >> The second set of data is from using the extreme end of the supported heap size. This mimics how we previously used to have a large page table even for smaller heap size ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Axel Boldt-Christmas Thanks for reviewing! ------------- PR Review: https://git.openjdk.org/jdk/pull/25345#pullrequestreview-2857635087 From jsjolen at openjdk.org Wed May 21 13:50:56 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 21 May 2025 13:50:56 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v3] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 14:20:48 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. >> >> For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. >> >> The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. > > Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - removed last traces of hrt.ticks > - Merge branch 'master' into statsampler-removal > - feedback fixes > - removed the PerfDataSamplingInterval flag > - calculate timestamp in jstat instead of sampling > - StatSampler + sampling code removed All of this looks good to me. It seems like there's no user-visible change, except the removal of the global variable. Is that correct? ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24872#pullrequestreview-2857814078 From jbhateja at openjdk.org Wed May 21 14:09:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 May 2025 14:09:07 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill Message-ID: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate at 16-byte aligned stack address. ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill Changes: https://git.openjdk.org/jdk/pull/25351/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357267 Stats: 63 lines in 1 file changed: 46 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/25351.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25351/head:pull/25351 PR: https://git.openjdk.org/jdk/pull/25351 From kdnilsen at openjdk.org Wed May 21 16:08:05 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 21 May 2025 16:08:05 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old Message-ID: Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) The command line for these comparisons follows: ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ -XX:+UnlockExperimentalVMOptions \ -XX:-ShenandoahPacing \ -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms$s -Xmx$s \ -XX:ShenandoahMinimumOldTimeMs=25 \ -XX:ShenandoahFullGCThreshold=1024 \ -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \ -Xlog:"gc*=info,ergo" \ -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \ -XX:+UnlockDiagnosticVMOptions \ -jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \ -dInitializationDelay=45s \ -dDictionarySize=3000000 \ -dNumCustomers=300000 \ -dNumProducts=60000 \ -dCustomerThreads=500 \ -dCustomerPeriod=7s \ -dCustomerThinkTime=1s \ -dKeywordSearchCount=4 \ -dServerThreads=5 \ -dServerPeriod=5s \ -dProductNameLength=10 \ -dBrowsingHistoryQueueCount=5 \ -dSalesTransactionQueueCount=5 \ -dProductDescriptionLength=32 \ -dProductReplacementPeriod=25s \ -dProductReplacementCount=10 \ -dCustomerReplacementPeriod=30s \ -dCustomerReplacementCount=1000 \ -dBrowsingExpiration=1m \ -dPhasedUpdates=true \ -dPhasedUpdateInterval=90s \ -dSimulationDuration=25m \ -dResponseTimeMeasurements=100000 \ >$t.genshen.share-reserves.$r-evac-ratio.$s.out 2>$t.genshen.share-reserves.$r-evac-ratio.$s.err gzip $t.genshen.share-reserves.$r-evac-ratio.$s.out $t.genshen.share-reserves.$r-evac-ratio.$s.err We have tested this patch through our performance pipeline. Both aarch64 and x86 show similar results, a small increase in concurrent evacuation on the graphchi benchmark, with slight improvements of other metrics on a number of other test workloads: Genshen aarch64 ------------------------------------------------------------------------------------------------------- +16.35% graphchi/concurrent_evacuation p=0.00000 Control: 1.895ms (+/-392.33us) 306 Test: 2.205ms (+/-401.72us) 124 -33.43% specjbb2015/concurrent_marking_old p=0.00213 Control: 513.923ms (+/-225.22ms) 338 Test: 385.169ms (+/-231.25ms) 38 -28.58% specjbb2015/cm_parallel_mark_old p=0.00833 Control: 1.022s (+/-446.83ms) 333 Test: 794.476ms (+/-440.83ms) 35 -25.31% crypto.aes/shenandoahfinalupdaterefs_stopped p=0.00000 Control: 0.113ms (+/- 0.01ms) 285 Test: 0.090ms (+/- 0.02ms) 158 -18.52% scimark.fft.small/shenandoahfinalupdaterefs_stopped p=0.00000 Control: 0.106ms (+/- 0.01ms) 474 Test: 0.090ms (+/- 0.01ms) 180 -15.29% hyperalloc_a3072_o2048/concurrent_marking_old p=0.00103 Control: 384.599ms (+/- 76.47ms) 277 Test: 333.581ms (+/- 89.51ms) 55 -15.28% hyperalloc_a3072_o2048/cm_total_old p=0.00105 Control: 768.676ms (+/-152.94ms) 277 Test: 666.786ms (+/-178.97ms) 55 -15.28% hyperalloc_a3072_o2048/cm_parallel_mark_old p=0.00105 Control: 768.676ms (+/-152.94ms) 277 Test: 666.786ms (+/-178.97ms) 55 Shenandoah aarch64 ------------------------------------------------------------------------------------------------------- -12.07% extremem-phased/update_references p=0.00050 Control: 479.826ms (+/- 52.78ms) 23 Test: 428.148ms (+/- 2.26ms) 3 Genshen x86 ------------------------------------------------------------------------------------------------------- +16.35% graphchi/concurrent_evacuation p=0.00000 Control: 1.895ms (+/-392.33us) 306 Test: 2.205ms (+/-401.72us) 124 -33.43% specjbb2015/concurrent_marking_old p=0.00213 Control: 513.923ms (+/-225.22ms) 338 Test: 385.169ms (+/-231.25ms) 38 -28.58% specjbb2015/cm_parallel_mark_old p=0.00833 Control: 1.022s (+/-446.83ms) 333 Test: 794.476ms (+/-440.83ms) 35 -25.31% crypto.aes/shenandoahfinalupdaterefs_stopped p=0.00000 Control: 0.113ms (+/- 0.01ms) 285 Test: 0.090ms (+/- 0.02ms) 158 -18.52% scimark.fft.small/shenandoahfinalupdaterefs_stopped p=0.00000 Control: 0.106ms (+/- 0.01ms) 474 Test: 0.090ms (+/- 0.01ms) 180 -15.29% hyperalloc_a3072_o2048/concurrent_marking_old p=0.00103 Control: 384.599ms (+/- 76.47ms) 277 Test: 333.581ms (+/- 89.51ms) 55 -15.28% hyperalloc_a3072_o2048/cm_total_old p=0.00105 Control: 768.676ms (+/-152.94ms) 277 Test: 666.786ms (+/-178.97ms) 55 -15.28% hyperalloc_a3072_o2048/cm_parallel_mark_old p=0.00105 Control: 768.676ms (+/-152.94ms) 277 Test: 666.786ms (+/-178.97ms) 55 Shenandoah x86 ------------------------------------------------------------------------------------------------------- -12.07% extremem-phased/update_references p=0.00050 Control: 479.826ms (+/- 52.78ms) 23 Test: 428.148ms (+/- 2.26ms) 3 ------------- Commit messages: - Fix whitespace - Merge remote-tracking branch 'jdk/master' into share-collector-reserves - Merge branch 'share-collector-reserves' of https://github.com/kdnilsen/jdk into share-collector-reserves - make old gc more aggresive - Change fullgc phase5 return type - compute_old_generation_balance needs available computations under lock - Merge branch 'share-collector-reserves' of https://github.com/kdnilsen/jdk into share-collector-reserves - Fixup bugs introduced by most recent commit - Improve empty region accounting in FreeSet - Revert "Acquire heaplock before adjusting interval for old" - ... and 24 more: https://git.openjdk.org/jdk/compare/6162e2c5...3d55a646 Changes: https://git.openjdk.org/jdk/pull/25357/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357471 Stats: 1354 lines in 28 files changed: 781 ins; 333 del; 240 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From wkemper at openjdk.org Wed May 21 18:37:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 21 May 2025 18:37:54 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old In-Reply-To: References: Message-ID: On Wed, 21 May 2025 15:29:09 GMT, Kelvin Nilsen wrote: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahGenerationSizer.cpp line 193: > 191: const size_t new_size = old_gen->max_capacity(); > 192: log_info(gc, ergo)("Forcing transfer of %zu region(s) from %s to %s, yielding increased size: " PROPERFMT, > 193: regions, young_gen->name(), old_gen->name(), PROPERFMTARGS(new_size)); If this is now really only used for in-place promotions, can we change the log message to indicate the region is being promoted? I think when users see messages about things being "forced" in the log, they start to wonder if young/old sizes need to adjusted. src/hotspot/share/gc/shenandoah/shenandoahGenerationSizer.cpp line 206: > 204: old_gen->decrease_capacity(bytes_to_transfer); > 205: const size_t new_size = young_gen->max_capacity(); > 206: log_info(gc)("Forcing transfer of %zu region(s) from %s to %s, yielding increased size: " PROPERFMT, Can this be `gc, ergo` like the method to transfer regions to old? src/hotspot/share/gc/shenandoah/shenandoahGenerationalHeap.cpp line 298: > 296: if (copy == nullptr) { > 297: // If we failed to allocate in LAB, we'll try a shared allocation. > 298: #ifdef KELVIN_ORIGINAL This looks like debugging code? Should we back this out? src/hotspot/share/gc/shenandoah/shenandoahGenerationalHeap.hpp line 149: > 147: > 148: // Transfers surplus old regions to young, or takes regions from young to satisfy old region deficit > 149: TransferResult balance_generations(); Are we still using this `TransferResult` thing? Seems like we might be able to delete it with this change. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1670: > 1668: > 1669: if (mode()->is_generational()) { > 1670: // young-gen heuristics track young, bootstrap, and global GC cycle times This seems like an unrelated change. Mixing in bootstrap and global gc cycle times is likely to increase the average time and make the heuristics more aggressive. src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 279: > 277: static size_t setup_sizes(size_t max_heap_size); > 278: > 279: inline bool is_recycling() { Is this used anywhere? src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 376: > 374: "runs out of memory too early.") \ > 375: \ > 376: product(uintx, ShenandoahOldEvacRatioPercent, 75, EXPERIMENTAL, \ Phew, this is a lot of explanatory text and it reads like the target audience are GC developers. If we are going to expose this as a user configurable option, I think the help text should just explain how the behavior changes as this values goes up or down. Something like: > Increasing this allows for more old regions in mixed collections. Decreasing this reduces the number of old regions in mixed collections. The first sentence makes it seem as though this is the percentage of the entire heap to reserve for old evacuations, but the next clarifies that this is the percentage of the collection set. Question about this sentence: > A value of 100 allows a mixed evacuation to focus entirely on old-gen memory, allowing no young-gen regions to be collected. With a setting of 100, would GenShen still "preselect" young regions of tenuring age with sufficient garbage into the collection set? I also find the name of the option slightly confusing - is it a ratio? or a percentage? Seems like it's really a percentage (though it controls the ratio of reserves used for the collection). ------------- PR Review: https://git.openjdk.org/jdk/pull/25357#pullrequestreview-2858665038 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2100871366 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2100868280 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2100879212 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2100931112 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2100887452 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2100894053 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2100926884 From asemenyuk at openjdk.org Wed May 21 21:09:27 2025 From: asemenyuk at openjdk.org (Alexey Semenyuk) Date: Wed, 21 May 2025 21:09:27 GMT Subject: RFR: 8357503: gcbasher fails with java.lang.IllegalArgumentException: Unknown constant pool type Message-ID: Add missing CONSTANT_Dynamic. ------------- Commit messages: - 8357503: gcbasher fails with java.lang.IllegalArgumentException: Unknown constant pool type Changes: https://git.openjdk.org/jdk/pull/25370/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25370&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357503 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25370/head:pull/25370 PR: https://git.openjdk.org/jdk/pull/25370 From dholmes at openjdk.org Wed May 21 21:15:51 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 21 May 2025 21:15:51 GMT Subject: RFR: 8357503: gcbasher fails with java.lang.IllegalArgumentException: Unknown constant pool type In-Reply-To: References: Message-ID: On Wed, 21 May 2025 21:04:44 GMT, Alexey Semenyuk wrote: > Add missing CONSTANT_Dynamic. LGTM! Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25370#pullrequestreview-2859138752 From asemenyuk at openjdk.org Wed May 21 21:59:58 2025 From: asemenyuk at openjdk.org (Alexey Semenyuk) Date: Wed, 21 May 2025 21:59:58 GMT Subject: Integrated: 8357503: gcbasher fails with java.lang.IllegalArgumentException: Unknown constant pool type In-Reply-To: References: Message-ID: On Wed, 21 May 2025 21:04:44 GMT, Alexey Semenyuk wrote: > Add missing CONSTANT_Dynamic. This pull request has now been integrated. Changeset: 3ee14471 Author: Alexey Semenyuk URL: https://git.openjdk.org/jdk/commit/3ee14471e10ca83fe96b7ee1d80a67a1f8c7f4ec Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8357503: gcbasher fails with java.lang.IllegalArgumentException: Unknown constant pool type Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/25370 From sviswanathan at openjdk.org Wed May 21 22:31:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 May 2025 22:31:58 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Wed, 21 May 2025 12:33:26 GMT, Jatin Bhateja wrote: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 82: > 80: MacroAssembler* masm = _masm; > 81: if (VM_Version::supports_apx_f()) { > 82: __ push(rax); if _result is not equal to rax this also could be pushp rax here and popp rax in restore(). src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 83: > 81: if (VM_Version::supports_apx_f()) { > 82: __ push(rax); > 83: __ push(rcx); This could be __ pushp(rcx). src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 92: > 90: // Note: For PPX to work properly, a PPX-marked PUSH2 (respectively, POP2) should always > 91: // be matched with a PPX-marked POP2 (PUSH2), not with two PPX-marked POPs (PUSHs). > 92: __ pushp(rcx); This is saving old rsp on stack and restored using __ movptr(rsp, Address(rsp)) on the other end in restore(). So this should be __ push(rcx) and not __ pushp(rcx) as there is no corresponding __ popp() instruction for this pushp. src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 185: > 183: // Re-instantiate original stack pointer. > 184: __ movptr(rsp, Address(rsp)); > 185: __ pop(rcx); This could be __ popp(rcx). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2101275404 PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2101266706 PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2101264344 PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2101267521 From sviswanathan at openjdk.org Wed May 21 23:36:51 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 May 2025 23:36:51 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Wed, 21 May 2025 12:33:26 GMT, Jatin Bhateja wrote: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin @xmas92 Could you please also take a look at this? Intel APX add additional GPR registers (R16 - R31). Our understanding is that these also need to be saved and restored as part of ZRuntimeCallSpill. Is that correct? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2899502203 From aboldtch at openjdk.org Thu May 22 06:36:14 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 22 May 2025 06:36:14 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes Message-ID:
Background (expandable section) ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. In ZGC we call our memory regions pages or zpages.
### Proposal Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. And adds a "fast" medium page allocation path in the page allocator, which only claims memory from the cache, where the memory can directly satisfy one of these sizes. This will only fail if the cache is empty, or if all the memory is spread out in 2M segments. Then change the object allocator to use this for when mutator threads allocate or relocate objects. Reducing the probability that allocation or mutator relocation sees latencies induced by the page allocation. GC relocation workers still allocate only the largest medium page size, and has the GC take the cost remapping memory. This way we can reduce mutator latencies at the cost of potential temporary increased memory waste. The change from a fixed to variable size for medium pages requires some adaptation in relocation set selection. Earlier we simply used to compare live bytes sizes as a proxy for fragmentation, using pre-calculated values to avoid integer division and double arithmetic in the selection loop. I did something similar in ae2fa92f9356e2966726cc17f3e5d911be8b1f8e. May be a pre mature optimisation. Also apparently we have a `NumPartitions` define in the codebase so had to name the constant differently. (I could probably move it to the .cpp file as well.) https://github.com/openjdk/jdk/blob/f7619fd700ec6498948e5e84e8051be145683940/src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp#L44 #### Testing * All ZGC test tasks tier1-tier8 on all Oracle supported platforms * All ZGC test tasks tier1-tier8 on linux-x64,linux-x64-debug with `-XX:+ZStressFastMediumPageAllocation` Ran performance testing on an earlier baseline, Re-running performance testing on latest rebase ------------- Commit messages: - NumPartitions is reserved by Shenandoah - Add ZStressFastMediumPageAllocation - Add TestZMediumPageSizes - Optimized pre-filter - Multi-sized medium allocations Changes: https://git.openjdk.org/jdk/pull/25381/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357449 Stats: 392 lines in 19 files changed: 340 ins; 10 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/25381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25381/head:pull/25381 PR: https://git.openjdk.org/jdk/pull/25381 From ayang at openjdk.org Thu May 22 06:39:26 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 22 May 2025 06:39:26 GMT Subject: RFR: 8354517: Parallel: JDK-8339668 causes up to 3.7x slowdown in openjdk.bench.vm.gc.systemgc Message-ID: Before JDK-8339668, full-gc marking array-chunk size uses `ObjArrayMarkingStride`. This patch restores the old behavior (performance). The fix is extract out the chunk-size from `PartialArraySplitter` and use the either `ParGCArrayScanChunk` or `ObjArrayMarkingStride`, depending on the context -- the former is used during young-gc while the latter full-gc. Test: tier1-3; checked perf regression is gone ------------- Commit messages: - pgc-split-array Changes: https://git.openjdk.org/jdk/pull/25382/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25382&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354517 Stats: 9 lines in 5 files changed: 3 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25382/head:pull/25382 PR: https://git.openjdk.org/jdk/pull/25382 From aboldtch at openjdk.org Thu May 22 06:54:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 22 May 2025 06:54:56 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Wed, 21 May 2025 23:33:47 GMT, Sandhya Viswanathan wrote: > Intel APX add additional GPR registers (R16 - R31). Our understanding is that these also need to be saved and restored as part of ZRuntimeCallSpill. Is that correct? ZRuntimeCallSpill is used when doing calls into libjvm from contexts where we do not track the liveness of the registers. So all caller saved registers must be saved and restored. If all APX registers are caller saved, then yes this is correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2900114700 From ayang at openjdk.org Thu May 22 07:01:59 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 22 May 2025 07:01:59 GMT Subject: RFR: 8357306: G1: Remove _gc_succeeded from VM_G1CollectForAllocation because it is always true [v2] In-Reply-To: References: <5KzvE3ghL7_z59-qqjHDSgK_MIPDtfwcBqK7R6svX1o=.35135368-a45f-4cf7-8bc3-8042e45df353@github.com> Message-ID: On Wed, 21 May 2025 10:08:29 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of G1 VM GC operations to remove the _gc_succeeded members because they are not necessary any more - GC operations themselves (i.e. the doit() part) always succeed. >> >> Testing: tier1-3, gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into 8357306-remove-gc-succeeded-in-g1-vm-operations > - * remove comment > - * fix try-concurrent-... > - * some minor refactoring > - 8357306 > > Hi all, > > please review this refactoring of G1 VM GC operations to remove the _gc_succeeded members because they are not necessary any more - GC operations themselves (i.e. the doit() part) always succeed. > > Testing: tier1-3, gha > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25320#pullrequestreview-2860021285 From stuefe at openjdk.org Thu May 22 07:00:11 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 22 May 2025 07:00:11 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing on MacOS aarch64 [v5] In-Reply-To: References: Message-ID: On Wed, 10 Jul 2024 06:10:41 GMT, Thomas Stuefe wrote: >> See JBS issue. >> >> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. >> >> The patch: >> - exposes os::available_memory via Whitebox >> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` >> >> I have some misgivings about this solution, though: >> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. >> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) >> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. >> >> Despite my doubts, I think this is the best we can come up with if we want to have such a test. >> >> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java > > Co-authored-by: Andrey Turbanov > - Update test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java > > Co-authored-by: Andrey Turbanov I am closing this in favor of a new, simpler approach ------------- PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2900128718 From jsikstro at openjdk.org Thu May 22 08:01:01 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 22 May 2025 08:01:01 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v4] In-Reply-To: References: Message-ID: <3PWGWsTBWQcs_7hOuBg4oCJ9nunJsuUQrIxXX__5u6Q=.c4d7d011-eb14-477f-9cd8-d181f0037382@github.com> On Thu, 15 May 2025 11:47:07 GMT, Joel Sikstr?m wrote: >> Hello, >> >> The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. >> >> With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. >> >> To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: >> * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. >> * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). >> * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. >> * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. >> >> Testing: >> * GHA, Oracle's tier 1-4 >> * Manuel inspection of printed content > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Update new order in tests The CSR has now been approved. I'd say this is ready for review now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25214#issuecomment-2900266756 From stefank at openjdk.org Thu May 22 08:12:01 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 22 May 2025 08:12:01 GMT Subject: RFR: 8357533: ZGC: ZIndexDistributorClaimTree functions not inlined on Windows Message-ID: While investigating a minor performance regression in a System.gc() microbenchmark I saw that the Concurrent Remap Roots phase took a significant portion of the GC cycle time. It turns out that even though we use constexpr we don't get the inlining of the the functions as we expected and this results in a noticeable performance loss. When running System.gc() with on old GC worker thread with a 16TB max heap size we spend around 300 ms just iterating over the page table and its indices via ZIndexDistributorClaimTree. If the inlining is fixed this drops down to ~70 ms, which is similar to what we see on Linux and MacOS. We already have a patch out to remove the last usage of the ZIndexDistributor mechanism in https://github.com/openjdk/jdk/pull/25345, but ZIndexDistributor has less restrictions and is easier to use so we might want to keep it around (with fixed performance) so that it can be used for prototyping and maybe future features. Noteworthy in the patch is the following function: static constexpr int level_multiplier(int level) { assert(level < ClaimLevels, "Must be"); constexpr int array[ClaimLevels]{16, 16, 16}; return array[level]; } When the last statement is changed to: constexpr int result = array[level]; return result; The MS compiler complains that the expression is not a constant. And if we get the correct inlining the warning goes away. To get the required inlining I've changed the `level` parameter to be a template parameter and restructured the code to use template specialization instead of if-statements. This required some finagling with template classes to get partial specialization to work. Some other changes in the patch: * Limited includes of zIndexDistributor.inline.hpp * Extracted the atomic inc into zfetch_then_inc for easier evaluation of the performance impact of our Atomic implementation vs relaxed std::atomic vs non-atomic updates. * Hid the logging under a compile-time define. The logging is useful when changing/debugging this code, but otherwise it is not. ------------- Commit messages: - 8357533: ZGC: ZIndexDistributorClaimTree functions not inlined on Windows Changes: https://git.openjdk.org/jdk/pull/25385/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25385&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357533 Stats: 410 lines in 9 files changed: 227 ins; 84 del; 99 mod Patch: https://git.openjdk.org/jdk/pull/25385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25385/head:pull/25385 PR: https://git.openjdk.org/jdk/pull/25385 From tschatzl at openjdk.org Thu May 22 08:18:53 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 22 May 2025 08:18:53 GMT Subject: RFR: 8354517: Parallel: JDK-8339668 causes up to 3.7x slowdown in openjdk.bench.vm.gc.systemgc In-Reply-To: References: Message-ID: On Thu, 22 May 2025 06:34:07 GMT, Albert Mingkun Yang wrote: > Before JDK-8339668, full-gc marking array-chunk size uses `ObjArrayMarkingStride`. This patch restores the old behavior (performance). > > The fix is extract out the chunk-size from `PartialArraySplitter` and use the either `ParGCArrayScanChunk` or `ObjArrayMarkingStride`, depending on the context -- the former is used during young-gc while the latter full-gc. > > Test: tier1-3; checked perf regression is gone Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25382#pullrequestreview-2860243196 From stuefe at openjdk.org Thu May 22 08:26:54 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 22 May 2025 08:26:54 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v4] In-Reply-To: References: Message-ID: <7EztCfjwj8KrtUzBxNcIQOGccgfCh6DcKE9143ZoYis=.7ed24333-6ef0-43c1-8d99-482f6a845600@github.com> On Thu, 15 May 2025 11:47:07 GMT, Joel Sikstr?m wrote: >> Hello, >> >> The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. >> >> With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. >> >> To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: >> * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. >> * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). >> * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. >> * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. >> >> Testing: >> * GHA, Oracle's tier 1-4 >> * Manuel inspection of printed content > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Update new order in tests Metaspace changes (the few) seem fine. GC changes too. Minor remarks inline src/hotspot/share/memory/metaspace/metaspaceDCmd.cpp line 62: > 60: void MetaspaceDCmd::execute(DCmdSource source, TRAPS) { > 61: MetaspaceUtils::print_on(output()); > 62: Okay, though arguably somewhat redundant with the following output test/hotspot/jtreg/serviceability/dcmd/gc/HeapInfoTest.java line 48: > 46: OutputAnalyzer output = executor.execute(cmd); > 47: output.shouldNotContain("Unknown diagnostic command"); > 48: output.shouldHaveExitValue(0); This was already kind of weak before and is almost useless now :) can we improve on that? A command reporting back nothing would now result in a green test? ------------- PR Review: https://git.openjdk.org/jdk/pull/25214#pullrequestreview-2860244469 PR Review Comment: https://git.openjdk.org/jdk/pull/25214#discussion_r2101938786 PR Review Comment: https://git.openjdk.org/jdk/pull/25214#discussion_r2101933679 From aboldtch at openjdk.org Thu May 22 08:31:58 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 22 May 2025 08:31:58 GMT Subject: RFR: 8354517: Parallel: JDK-8339668 causes up to 3.7x slowdown in openjdk.bench.vm.gc.systemgc In-Reply-To: References: Message-ID: On Thu, 22 May 2025 06:34:07 GMT, Albert Mingkun Yang wrote: > Before JDK-8339668, full-gc marking array-chunk size uses `ObjArrayMarkingStride`. This patch restores the old behavior (performance). > > The fix is extract out the chunk-size from `PartialArraySplitter` and use the either `ParGCArrayScanChunk` or `ObjArrayMarkingStride`, depending on the context -- the former is used during young-gc while the latter full-gc. > > Test: tier1-3; checked perf regression is gone Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25382#pullrequestreview-2860283524 From jsikstro at openjdk.org Thu May 22 08:51:52 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 22 May 2025 08:51:52 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v4] In-Reply-To: <7EztCfjwj8KrtUzBxNcIQOGccgfCh6DcKE9143ZoYis=.7ed24333-6ef0-43c1-8d99-482f6a845600@github.com> References: <7EztCfjwj8KrtUzBxNcIQOGccgfCh6DcKE9143ZoYis=.7ed24333-6ef0-43c1-8d99-482f6a845600@github.com> Message-ID: On Thu, 22 May 2025 08:19:00 GMT, Thomas Stuefe wrote: >> Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> Update new order in tests > > src/hotspot/share/memory/metaspace/metaspaceDCmd.cpp line 62: > >> 60: void MetaspaceDCmd::execute(DCmdSource source, TRAPS) { >> 61: MetaspaceUtils::print_on(output()); >> 62: > > Okay, though arguably somewhat redundant with the following output Sure. Should I revert adding this line, leaving the Metaspace jcmd unchanged? > test/hotspot/jtreg/serviceability/dcmd/gc/HeapInfoTest.java line 48: > >> 46: OutputAnalyzer output = executor.execute(cmd); >> 47: output.shouldNotContain("Unknown diagnostic command"); >> 48: output.shouldHaveExitValue(0); > > This was already kind of weak before and is almost useless now :) can we improve on that? A command reporting back nothing would now result in a green test? I agree. It's hard to grep for specific information since most GC have different approaches to printing similar information. However, all GCs (even Epsilon) print the string "used", so maybe grepping for that is a reasonable approach, just to see that something is printed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25214#discussion_r2101998172 PR Review Comment: https://git.openjdk.org/jdk/pull/25214#discussion_r2102001583 From stefank at openjdk.org Thu May 22 09:42:54 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 22 May 2025 09:42:54 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes In-Reply-To: References: Message-ID: On Thu, 22 May 2025 06:32:04 GMT, Axel Boldt-Christmas wrote: >
Background (expandable section) > ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. > > The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. > > Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. > > For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. > > But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. > > A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. > > For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. > > However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. > > In ZGC we call our memory regions pages or zpages. >
> > ### Proposal > Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. > > And adds a "fast" medium page allocation path in the p... Thanks for building this support! I did an initial pass through the patch and added some comments. src/hotspot/share/gc/z/zGlobals.hpp line 46: > 44: // Page size shifts > 45: const size_t ZPageSizeSmallShift = ZGranuleSizeShift; > 46: extern int ZPageSizeMediumShift; Changing to int seems good, but this leaves an inconsistency with `ZPageSizeSmallShift`. I'd prefer if the type of that constant was also changed in this PR. src/hotspot/share/gc/z/zHeuristics.cpp line 49: > 47: // Enable medium pages > 48: ZPageSizeMediumMax = size; > 49: ZPageSizeMediumShift = log2i_exact(ZPageSizeMediumMax); Is this an indication that `ZPageSizeMediumShift`should be named `ZPageSizeMediumMaxShift` src/hotspot/share/gc/z/zPage.cpp line 47: > 45: assert(!_virtual.is_null(), "Should not be null"); > 46: assert((_type == ZPageType::small && size() == ZPageSizeSmall) || > 47: (_type == ZPageType::medium && size() <= ZPageSizeMediumMax && size() >= ZPageSizeMediumMin) || Could we flip the order: Suggestion: (_type == ZPageType::medium && size() >= ZPageSizeMediumMin && size() <= ZPageSizeMediumMax) || Or, alternatively move the two range limits to the "outside" of the expression: Suggestion: (_type == ZPageType::medium && ZPageSizeMediumMin <= size() && size() <= ZPageSizeMediumMax) || or add a helper to check if a size is within the medium page range. src/hotspot/share/gc/z/zRelocationSetSelector.cpp line 61: > 59: bool ZRelocationSetSelectorGroup::is_disabled() { > 60: // Medium pages are disabled when their page size is zero > 61: return _page_type == ZPageType::medium && !ZPageSizeMediumEnabled; The comment above described why the implementation looked like it did. Should it be updated? src/hotspot/share/gc/z/zRelocationSetSelector.hpp line 82: > 80: private: > 81: static constexpr int NumPartitionsShift = 11; > 82: static constexpr int NPartitions = int(1) << NumPartitionsShift; Shenandoah shouldn't leak out unprefixed internal names. Let's fix that as a separate RFE. src/hotspot/share/gc/z/z_globals.hpp line 108: > 106: \ > 107: product(bool, ZUseMediumPageSizeRange, true, DIAGNOSTIC, \ > 108: "Allow multiple medium pages sizes") \ Maybe skip the s to match the flag ? Suggestion: "Allow multiple medium page sizes") \ ------------- PR Review: https://git.openjdk.org/jdk/pull/25381#pullrequestreview-2860438724 PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2102058013 PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2102062103 PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2102083430 PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2102094622 PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2102102690 PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2102108933 From rcastanedalo at openjdk.org Thu May 22 10:00:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 May 2025 10:00:56 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Wed, 21 May 2025 12:33:26 GMT, Jatin Bhateja wrote: > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2900619263 From aboldtch at openjdk.org Thu May 22 10:14:46 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 22 May 2025 10:14:46 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v2] In-Reply-To: References: Message-ID: <9CmzG3CI92njZVVRrywI7jMuaoH7wsHRmFfeSABlU3o=.a439ed2c-3f34-4406-9780-1e0d06d0a66c@github.com> >
Background (expandable section) > ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. > > The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. > > Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. > > For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. > > But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. > > A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. > > For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. > > However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. > > In ZGC we call our memory regions pages or zpages. >
> > ### Proposal > Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. > > And adds a "fast" medium page allocation path in the p... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Stefan Karlsson ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25381/files - new: https://git.openjdk.org/jdk/pull/25381/files/f7619fd7..f62339e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25381/head:pull/25381 PR: https://git.openjdk.org/jdk/pull/25381 From aboldtch at openjdk.org Thu May 22 10:14:47 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 22 May 2025 10:14:47 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:38:44 GMT, Stefan Karlsson wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Stefan Karlsson > > src/hotspot/share/gc/z/z_globals.hpp line 108: > >> 106: \ >> 107: product(bool, ZUseMediumPageSizeRange, true, DIAGNOSTIC, \ >> 108: "Allow multiple medium pages sizes") \ > > Maybe skip the s to match the flag ? > Suggestion: > > "Allow multiple medium page sizes") \ Suggestion: "Allow multiple medium page sizes") \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2102185701 From aboldtch at openjdk.org Thu May 22 10:29:49 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 22 May 2025 10:29:49 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v3] In-Reply-To: References: Message-ID: >
Background (expandable section) > ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. > > The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. > > Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. > > For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. > > But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. > > A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. > > For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. > > However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. > > In ZGC we call our memory regions pages or zpages. >
> > ### Proposal > Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. > > And adds a "fast" medium page allocation path in the p... Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: - Retype ZPageSizeSmallShift to int - Rename ZPageSizeMediumShift -> ZPageSizeMediumMaxShift - Update is_disabled comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25381/files - new: https://git.openjdk.org/jdk/pull/25381/files/f62339e6..3b215b37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=01-02 Stats: 10 lines in 6 files changed: 1 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25381/head:pull/25381 PR: https://git.openjdk.org/jdk/pull/25381 From aboldtch at openjdk.org Thu May 22 10:29:50 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 22 May 2025 10:29:50 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v3] In-Reply-To: References: Message-ID: <0aggS_0YVSc-HCKKorUks676iZSTUAGAYSA0_EJpqzo=.1d327338-6212-4384-9efd-4585e3b07e59@github.com> On Thu, 22 May 2025 09:16:08 GMT, Stefan Karlsson wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: >> >> - Retype ZPageSizeSmallShift to int >> - Rename ZPageSizeMediumShift -> ZPageSizeMediumMaxShift >> - Update is_disabled comment > > src/hotspot/share/gc/z/zGlobals.hpp line 46: > >> 44: // Page size shifts >> 45: const size_t ZPageSizeSmallShift = ZGranuleSizeShift; >> 46: extern int ZPageSizeMediumShift; > > Changing to int seems good, but this leaves an inconsistency with `ZPageSizeSmallShift`. I'd prefer if the type of that constant was also changed in this PR. Yeah, was unsure how far to push this. Eventually all our shift variables should probably be typed as int. But will change the page size shifts at least then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2102216608 From stuefe at openjdk.org Thu May 22 11:59:58 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 22 May 2025 11:59:58 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing on MacOS aarch64 [v5] In-Reply-To: References: Message-ID: On Wed, 10 Jul 2024 06:10:41 GMT, Thomas Stuefe wrote: >> See JBS issue. >> >> It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. >> >> The patch: >> - exposes os::available_memory via Whitebox >> - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` >> >> I have some misgivings about this solution, though: >> 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. >> 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) >> 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. >> >> Despite my doubts, I think this is the best we can come up with if we want to have such a test. >> >> Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java > > Co-authored-by: Andrey Turbanov > - Update test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java > > Co-authored-by: Andrey Turbanov Closing this since Skara is wonky; I have no idea how to fix these strange jcheck errors that refer to files I did not even touch. New PR: https://github.com/openjdk/jdk/pull/25384 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2900963501 From stefank at openjdk.org Thu May 22 12:39:52 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 22 May 2025 12:39:52 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v3] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 10:29:49 GMT, Axel Boldt-Christmas wrote: >>
Background (expandable section) >> ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. >> >> The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. >> >> Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. >> >> For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. >> >> But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. >> >> A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. >> >> For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. >> >> However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. >> >> In ZGC we call our memory regions pages or zpages. >>
>> >> ### Proposal >> Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. >> >> And ad... > > Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: > > - Retype ZPageSizeSmallShift to int > - Rename ZPageSizeMediumShift -> ZPageSizeMediumMaxShift > - Update is_disabled comment FYI: I've sent out a PR to hide the Shenandoah NumPartitions define: https://github.com/openjdk/jdk/pull/25392 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25381#issuecomment-2901075915 From stefank at openjdk.org Thu May 22 12:41:03 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 22 May 2025 12:41:03 GMT Subject: RFR: 8357563: Shenandoah headers leak un-prefixed defines Message-ID: We hit a compilation error in ZGC when we defined a constant NumPartitions. This happened because there is a define name NumPartitions inside shenandoahFreeSet.hpp. I propose that this (and its friends) are hid inside the ShenandoahRegionPartitions class, which is the only user of these defines. An alternative would be to prefix the define with something that is unlikely to clash with other parts of HotSpot. This PR is my suggestion for a change to solve this so this name conflict. Does this seem like an acceptable solution, or do you want something else? Thanks! ------------- Commit messages: - 8357563: Shenandoah headers leak un-prefixed defines Changes: https://git.openjdk.org/jdk/pull/25392/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25392&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357563 Stats: 12 lines in 1 file changed: 6 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25392.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25392/head:pull/25392 PR: https://git.openjdk.org/jdk/pull/25392 From jbhateja at openjdk.org Thu May 22 13:29:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 May 2025 13:29:11 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v2] In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25351/files - new: https://git.openjdk.org/jdk/pull/25351/files/79d7778e..efc4f011 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=00-01 Stats: 22 lines in 1 file changed: 10 ins; 2 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25351.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25351/head:pull/25351 PR: https://git.openjdk.org/jdk/pull/25351 From jbhateja at openjdk.org Thu May 22 13:32:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 May 2025 13:32:54 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Thu, 22 May 2025 09:57:47 GMT, Roberto Casta?eda Lozano wrote: > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2901240075 From cnorrbin at openjdk.org Thu May 22 14:09:37 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 22 May 2025 14:09:37 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v4] In-Reply-To: References: Message-ID: > Hi everyone, > > This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. > > For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. > > The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into statsampler-removal - removed last traces of hrt.ticks - Merge branch 'master' into statsampler-removal - feedback fixes - removed the PerfDataSamplingInterval flag - calculate timestamp in jstat instead of sampling - StatSampler + sampling code removed ------------- Changes: https://git.openjdk.org/jdk/pull/24872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24872&range=03 Stats: 864 lines in 25 files changed: 150 ins; 655 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/24872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24872/head:pull/24872 PR: https://git.openjdk.org/jdk/pull/24872 From cnorrbin at openjdk.org Thu May 22 14:14:52 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 22 May 2025 14:14:52 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v3] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 13:48:07 GMT, Johan Sj?len wrote: > It seems like there's no user-visible change, except the removal of the global variable. Is that correct? Besides the global variable, there's a slight change in the behaviour of the perfdata-counters for Serial/Parallel, as mentioned in the PR. The JBS issue has a clarifying comment explaining why this shouldn't have any meaningful impact. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24872#issuecomment-2901383778 From shade at openjdk.org Thu May 22 14:56:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 22 May 2025 14:56:54 GMT Subject: RFR: 8357563: Shenandoah headers leak un-prefixed defines In-Reply-To: References: Message-ID: On Thu, 22 May 2025 12:36:07 GMT, Stefan Karlsson wrote: > We hit a compilation error in ZGC when we defined a constant NumPartitions. This happened because there is a define name NumPartitions inside shenandoahFreeSet.hpp. I propose that this (and its friends) are hid inside the ShenandoahRegionPartitions class, which is the only user of these defines. An alternative would be to prefix the define with something that is unlikely to clash with other parts of HotSpot. > > This PR is my suggestion for a change to solve this so this name conflict. Does this seem like an acceptable solution, or do you want something else? Thanks! Yes, this is fine. These should never have been in global scope, especially in the header that can easily be transitively included. Actually, I would question even the type-casted triad, and probably a single constant would instead do. Leave it to a follow-up, if present problem blocks current development. @kdnilsen @earthling-amzn ^ ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25392#pullrequestreview-2861568514 PR Comment: https://git.openjdk.org/jdk/pull/25392#issuecomment-2901536748 From stefank at openjdk.org Thu May 22 15:07:54 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 22 May 2025 15:07:54 GMT Subject: RFR: 8357563: Shenandoah headers leak un-prefixed defines In-Reply-To: References: Message-ID: On Thu, 22 May 2025 12:36:07 GMT, Stefan Karlsson wrote: > We hit a compilation error in ZGC when we defined a constant NumPartitions. This happened because there is a define name NumPartitions inside shenandoahFreeSet.hpp. I propose that this (and its friends) are hid inside the ShenandoahRegionPartitions class, which is the only user of these defines. An alternative would be to prefix the define with something that is unlikely to clash with other parts of HotSpot. > > This PR is my suggestion for a change to solve this so this name conflict. Does this seem like an acceptable solution, or do you want something else? Thanks! FWIW, this doesn't block us, it just requires us to use a name that doesn't follow our naming convention, so if you have a better solution I'm OK with waiting for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25392#issuecomment-2901575076 From shade at openjdk.org Thu May 22 15:16:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 22 May 2025 15:16:59 GMT Subject: RFR: 8357563: Shenandoah headers leak un-prefixed defines In-Reply-To: References: Message-ID: On Thu, 22 May 2025 12:36:07 GMT, Stefan Karlsson wrote: > We hit a compilation error in ZGC when we defined a constant NumPartitions. This happened because there is a define name NumPartitions inside shenandoahFreeSet.hpp. I propose that this (and its friends) are hid inside the ShenandoahRegionPartitions class, which is the only user of these defines. An alternative would be to prefix the define with something that is unlikely to clash with other parts of HotSpot. > > This PR is my suggestion for a change to solve this so this name conflict. Does this seem like an acceptable solution, or do you want something else? Thanks! Nah, current version is fine. Folding this triad into a single constant would likely require dealing with signed-unsigned comparisons, casts back to enums, all that jazz. This would be a good starter task for our engineers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25392#issuecomment-2901614463 From never at openjdk.org Thu May 22 15:34:00 2025 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 22 May 2025 15:34:00 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v5] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 20:59:33 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed trailing space This seems reasonable to me. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25307#pullrequestreview-2861697506 From kdnilsen at openjdk.org Thu May 22 15:44:04 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 22 May 2025 15:44:04 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data [v7] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 16:38:03 GMT, Kelvin Nilsen wrote: >> i suspect performance impact is minimal. > > I've committed changes that endeavor to implement the suggested refactor. Performance impact does appear to be minimal. This broader refactoring does change behavior slightly. In particular: > > 1. We now have a better understanding of live-memory evacuated during mixed evacuations. This allows the selection of old-candidates for mixed evacuations to be more conservative. We'll have fewer old regions in order to honor the intended budget. > 2. Potentially, this will result in more mixed evacuations, but each mixed evacuation should take less time. > 3. There should be no impact on behavior of traditional Shenandoah. > > On one recently completed test run, we observed the following impacts compared to tip: > Shenandoah > ------------------------------------------------------------------------------------------------------- > +80.69% specjbb2015/trigger_failure p=0.00000 > Control: 58.250 (+/- 13.48 ) 110 > Test: 105.250 (+/- 33.13 ) 30 > > > Genshen > ------------------------------------------------------------------------------------------------------- > -19.46% jme/context_switch_count p=0.00176 > Control: 117.420 (+/- 28.01 ) 108 > Test: 98.292 (+/- 32.76 ) 30 > > Perhaps we need more data to decide whether this is "significant". This result seems to be consistent. The effect on traditional Shenandoah is apparently to reduce the size of traditional Shenandoah collection sets also because certain regions that would have been collected are now rejected due to "better awareness" of how much live data will need to be copied. The amount of garbage associated with candidate regions for the young collection set is reduced by the amount of allocations above TAMS. Previously, this had been erroneously reported as garbage. This has the effect of delaying reclamation of some garbage, resulting in an increase in allocation failures on the specjbb 2025 workload. We might argue that the original behavior was incorrect, in that it was allowing violation of the intended evacuation budget. We apparently were getting away with this violation because we were able to flip mutator regions to collector space, and/or because evacuation waste was sufficient to accommodate the unbudgeted evacuations. Now that we have more accurate accounting of live memory, we could perhaps slightly reduce the default evacuation waste budget if we want to claw back the losses in specjbb performance (to enable larger collection sets) as part of this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24319#discussion_r2102870209 From dnsimon at openjdk.org Thu May 22 17:04:00 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 May 2025 17:04:00 GMT Subject: Integrated: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 In-Reply-To: References: Message-ID: On Mon, 19 May 2025 17:50:21 GMT, Doug Simon wrote: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. This pull request has now been integrated. Changeset: 1258af42 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/1258af42bec92a2797897cb6126b60b582a29d76 Stats: 7 lines in 2 files changed: 7 ins; 0 del; 0 mod 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 Reviewed-by: never, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/25307 From dnsimon at openjdk.org Thu May 22 17:03:59 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 May 2025 17:03:59 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v5] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 20:59:33 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed trailing space Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2901962121 From zgu at openjdk.org Thu May 22 17:09:53 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 22 May 2025 17:09:53 GMT Subject: RFR: 8354517: Parallel: JDK-8339668 causes up to 3.7x slowdown in openjdk.bench.vm.gc.systemgc In-Reply-To: References: Message-ID: On Thu, 22 May 2025 06:34:07 GMT, Albert Mingkun Yang wrote: > Before JDK-8339668, full-gc marking array-chunk size uses `ObjArrayMarkingStride`. This patch restores the old behavior (performance). > > The fix is extract out the chunk-size from `PartialArraySplitter` and use the either `ParGCArrayScanChunk` or `ObjArrayMarkingStride`, depending on the context -- the former is used during young-gc while the latter full-gc. > > Test: tier1-3; checked perf regression is gone Thank you for fixing it. Look good! ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25382#pullrequestreview-2861982016 From jbhateja at openjdk.org Thu May 22 17:42:06 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 May 2025 17:42:06 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v3] In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25351/files - new: https://git.openjdk.org/jdk/pull/25351/files/efc4f011..9b5c2ac4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=01-02 Stats: 7 lines in 1 file changed: 3 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25351.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25351/head:pull/25351 PR: https://git.openjdk.org/jdk/pull/25351 From sviswanathan at openjdk.org Thu May 22 17:49:53 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 22 May 2025 17:49:53 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v3] In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Thu, 22 May 2025 17:42:06 GMT, Jatin Bhateja wrote: >> Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. >> These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. >> ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. >> >> Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25351#pullrequestreview-2862099217 From jsikstro at openjdk.org Thu May 22 19:49:17 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 22 May 2025 19:49:17 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v5] In-Reply-To: References: Message-ID: > Hello, > > The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. > > With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. > > To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: > * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. > * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). > * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. > * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. > > Testing: > * GHA, Oracle's tier 1-4 > * Manuel inspection of printed content Joel Sikstr?m has updated the pull request incrementally with four additional commits since the last revision: - Feedback on Metaspace jcmd - Copyright years - Make HeapInfoTest.java more robust - Switch naming order of ring-buffer names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25214/files - new: https://git.openjdk.org/jdk/pull/25214/files/689a2230..60c5b606 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=03-04 Stats: 21 lines in 5 files changed: 7 ins; 2 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25214/head:pull/25214 PR: https://git.openjdk.org/jdk/pull/25214 From jsikstro at openjdk.org Thu May 22 19:49:17 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 22 May 2025 19:49:17 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v4] In-Reply-To: References: Message-ID: <6IZ5ldgwmNwRUkmLsCLCEjdvruFE1g73JUxql0Jx8jo=.b91e7e28-9555-41a5-b482-4aa5a257e4d0@github.com> On Thu, 15 May 2025 11:47:07 GMT, Joel Sikstr?m wrote: >> Hello, >> >> The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. >> >> With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. >> >> To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: >> * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. >> * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). >> * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. >> * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. >> >> Testing: >> * GHA, Oracle's tier 1-4 >> * Manuel inspection of printed content > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Update new order in tests I fixed some offline feedback from @stefank on naming and addressed feedback from @tstuefe in new commits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25214#issuecomment-2902380574 From stuefe at openjdk.org Fri May 23 04:33:57 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 23 May 2025 04:33:57 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v4] In-Reply-To: References: <7EztCfjwj8KrtUzBxNcIQOGccgfCh6DcKE9143ZoYis=.7ed24333-6ef0-43c1-8d99-482f6a845600@github.com> Message-ID: <8EPYeJoeysrxsu4xgcisxF3PDL0NiqmdzYdhmLwtsZI=.cba6e5f6-4f4e-4fd2-9497-7c9f2e4d4bd2@github.com> On Thu, 22 May 2025 08:46:55 GMT, Joel Sikstr?m wrote: >> src/hotspot/share/memory/metaspace/metaspaceDCmd.cpp line 62: >> >>> 60: void MetaspaceDCmd::execute(DCmdSource source, TRAPS) { >>> 61: MetaspaceUtils::print_on(output()); >>> 62: >> >> Okay, though arguably somewhat redundant with the following output > > Sure. Should I revert adding this line, leaving the Metaspace jcmd unchanged? No, on second thought it is fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25214#discussion_r2103779705 From stuefe at openjdk.org Fri May 23 04:38:57 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 23 May 2025 04:38:57 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v5] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 19:49:17 GMT, Joel Sikstr?m wrote: >> Hello, >> >> The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. >> >> With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. >> >> To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: >> * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. >> * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). >> * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. >> * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. >> >> Testing: >> * GHA, Oracle's tier 1-4 >> * Manuel inspection of printed content > > Joel Sikstr?m has updated the pull request incrementally with four additional commits since the last revision: > > - Feedback on Metaspace jcmd > - Copyright years > - Make HeapInfoTest.java more robust > - Switch naming order of ring-buffer names Marked as reviewed by stuefe (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25214#pullrequestreview-2863132900 From stuefe at openjdk.org Fri May 23 04:38:58 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 23 May 2025 04:38:58 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v4] In-Reply-To: References: <7EztCfjwj8KrtUzBxNcIQOGccgfCh6DcKE9143ZoYis=.7ed24333-6ef0-43c1-8d99-482f6a845600@github.com> Message-ID: On Thu, 22 May 2025 08:48:47 GMT, Joel Sikstr?m wrote: >> test/hotspot/jtreg/serviceability/dcmd/gc/HeapInfoTest.java line 48: >> >>> 46: OutputAnalyzer output = executor.execute(cmd); >>> 47: output.shouldNotContain("Unknown diagnostic command"); >>> 48: output.shouldHaveExitValue(0); >> >> This was already kind of weak before and is almost useless now :) can we improve on that? A command reporting back nothing would now result in a green test? > > I agree. It's hard to grep for specific information since most GC have different approaches to printing similar information. > > However, all GCs (even Epsilon) print the string "used", so maybe grepping for that is a reasonable approach, just to see that something is printed? Heap.info is important; we should have better tests for this. But we can hold this off for a separate PR, so I am fine with scanning for "used". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25214#discussion_r2103784060 From jsikstro at openjdk.org Fri May 23 06:48:31 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 23 May 2025 06:48:31 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v6] In-Reply-To: References: Message-ID: > Hello, > > The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. > > With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. > > To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: > * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. > * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). > * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. > * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. > > Testing: > * GHA, Oracle's tier 1-4 > * Manuel inspection of printed content Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Add back MetaspaceUtils::print_on() in Metaspace jcmd ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25214/files - new: https://git.openjdk.org/jdk/pull/25214/files/60c5b606..b54ebc02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=04-05 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25214/head:pull/25214 PR: https://git.openjdk.org/jdk/pull/25214 From ayang at openjdk.org Fri May 23 08:26:01 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 May 2025 08:26:01 GMT Subject: RFR: 8354517: Parallel: JDK-8339668 causes up to 3.7x slowdown in openjdk.bench.vm.gc.systemgc In-Reply-To: References: Message-ID: On Thu, 22 May 2025 06:34:07 GMT, Albert Mingkun Yang wrote: > Before JDK-8339668, full-gc marking array-chunk size uses `ObjArrayMarkingStride`. This patch restores the old behavior (performance). > > The fix is extract out the chunk-size from `PartialArraySplitter` and use the either `ParGCArrayScanChunk` or `ObjArrayMarkingStride`, depending on the context -- the former is used during young-gc while the latter full-gc. > > Test: tier1-3; checked perf regression is gone Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25382#issuecomment-2903671208 From ayang at openjdk.org Fri May 23 08:26:01 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 May 2025 08:26:01 GMT Subject: Integrated: 8354517: Parallel: JDK-8339668 causes up to 3.7x slowdown in openjdk.bench.vm.gc.systemgc In-Reply-To: References: Message-ID: On Thu, 22 May 2025 06:34:07 GMT, Albert Mingkun Yang wrote: > Before JDK-8339668, full-gc marking array-chunk size uses `ObjArrayMarkingStride`. This patch restores the old behavior (performance). > > The fix is extract out the chunk-size from `PartialArraySplitter` and use the either `ParGCArrayScanChunk` or `ObjArrayMarkingStride`, depending on the context -- the former is used during young-gc while the latter full-gc. > > Test: tier1-3; checked perf regression is gone This pull request has now been integrated. Changeset: 36f6d155 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/36f6d155e3b9d0b279be33414573217ea38551ac Stats: 9 lines in 5 files changed: 3 ins; 0 del; 6 mod 8354517: Parallel: JDK-8339668 causes up to 3.7x slowdown in openjdk.bench.vm.gc.systemgc Reviewed-by: tschatzl, aboldtch, zgu ------------- PR: https://git.openjdk.org/jdk/pull/25382 From kbarrett at openjdk.org Fri May 23 08:26:53 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 23 May 2025 08:26:53 GMT Subject: RFR: 8357563: Shenandoah headers leak un-prefixed defines In-Reply-To: References: Message-ID: <5Oe187YfD1o0YVwumXL2aQsmvn0YaHaJLZF9bd2P2OM=.c701cb36-5f92-4022-b66e-34053988a0e9@github.com> On Thu, 22 May 2025 12:36:07 GMT, Stefan Karlsson wrote: > We hit a compilation error in ZGC when we defined a constant NumPartitions. This happened because there is a define name NumPartitions inside shenandoahFreeSet.hpp. I propose that this (and its friends) are hid inside the ShenandoahRegionPartitions class, which is the only user of these defines. An alternative would be to prefix the define with something that is unlikely to clash with other parts of HotSpot. > > This PR is my suggestion for a change to solve this so this name conflict. Does this seem like an acceptable solution, or do you want something else? Thanks! Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25392#pullrequestreview-2863632272 From ayang at openjdk.org Fri May 23 08:28:16 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 May 2025 08:28:16 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v5] In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - pgc-size-policy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25000/files - new: https://git.openjdk.org/jdk/pull/25000/files/e39ece09..320e590b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=03-04 Stats: 47415 lines in 735 files changed: 32367 ins; 11142 del; 3906 mod Patch: https://git.openjdk.org/jdk/pull/25000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25000/head:pull/25000 PR: https://git.openjdk.org/jdk/pull/25000 From tschatzl at openjdk.org Fri May 23 08:41:04 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 23 May 2025 08:41:04 GMT Subject: RFR: 8357621: G1: Clean up G1BiasedArray Message-ID: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> Hi all, please review this minor touch-up of the G1BiasedArray classes, removing some unused methods and improving method visibility a bit. Testing: gha Thanks, Thomas ------------- Commit messages: - * remove whitespace - * one more method removed - 8357621 Changes: https://git.openjdk.org/jdk/pull/25406/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25406&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357621 Stats: 65 lines in 2 files changed: 19 ins; 46 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25406.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25406/head:pull/25406 PR: https://git.openjdk.org/jdk/pull/25406 From stefank at openjdk.org Fri May 23 09:44:55 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 May 2025 09:44:55 GMT Subject: RFR: 8357563: Shenandoah headers leak un-prefixed defines In-Reply-To: References: Message-ID: On Thu, 22 May 2025 12:36:07 GMT, Stefan Karlsson wrote: > We hit a compilation error in ZGC when we defined a constant NumPartitions. This happened because there is a define name NumPartitions inside shenandoahFreeSet.hpp. I propose that this (and its friends) are hid inside the ShenandoahRegionPartitions class, which is the only user of these defines. An alternative would be to prefix the define with something that is unlikely to clash with other parts of HotSpot. > > This PR is my suggestion for a change to solve this so this name conflict. Does this seem like an acceptable solution, or do you want something else? Thanks! Thanks for reviewing! I'll go ahead and integrate this now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25392#issuecomment-2903875809 From stefank at openjdk.org Fri May 23 09:44:56 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 May 2025 09:44:56 GMT Subject: Integrated: 8357563: Shenandoah headers leak un-prefixed defines In-Reply-To: References: Message-ID: <6DX_5lovjExqU4gFkqwy6kviSfid6VxJZFbxnQkpYzQ=.3658b682-2ca5-4d35-b46d-b793428b5384@github.com> On Thu, 22 May 2025 12:36:07 GMT, Stefan Karlsson wrote: > We hit a compilation error in ZGC when we defined a constant NumPartitions. This happened because there is a define name NumPartitions inside shenandoahFreeSet.hpp. I propose that this (and its friends) are hid inside the ShenandoahRegionPartitions class, which is the only user of these defines. An alternative would be to prefix the define with something that is unlikely to clash with other parts of HotSpot. > > This PR is my suggestion for a change to solve this so this name conflict. Does this seem like an acceptable solution, or do you want something else? Thanks! This pull request has now been integrated. Changeset: 68ee06f0 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/68ee06f0c9ec420cb1a60e0b361971372b18b82b Stats: 12 lines in 1 file changed: 6 ins; 6 del; 0 mod 8357563: Shenandoah headers leak un-prefixed defines Reviewed-by: shade, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/25392 From aboldtch at openjdk.org Fri May 23 09:51:14 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 23 May 2025 09:51:14 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v4] In-Reply-To: References: Message-ID: >
Background (expandable section) > ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. > > The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. > > Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. > > For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. > > But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. > > A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. > > For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. > > However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. > > In ZGC we call our memory regions pages or zpages. >
> > ### Proposal > Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. > > And adds a "fast" medium page allocation path in the p... Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Revert "NumPartitions is reserved by Shenandoah" This reverts commit f7619fd700ec6498948e5e84e8051be145683940. - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8357449 - Retype ZPageSizeSmallShift to int - Rename ZPageSizeMediumShift -> ZPageSizeMediumMaxShift - Update is_disabled comment - Apply suggestions from code review Co-authored-by: Stefan Karlsson - NumPartitions is reserved by Shenandoah - Add ZStressFastMediumPageAllocation - Add TestZMediumPageSizes - Optimized pre-filter - ... and 1 more: https://git.openjdk.org/jdk/compare/5b2509cc...43a42685 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25381/files - new: https://git.openjdk.org/jdk/pull/25381/files/3b215b37..43a42685 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=02-03 Stats: 61930 lines in 946 files changed: 36651 ins; 20473 del; 4806 mod Patch: https://git.openjdk.org/jdk/pull/25381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25381/head:pull/25381 PR: https://git.openjdk.org/jdk/pull/25381 From stefank at openjdk.org Fri May 23 09:56:52 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 May 2025 09:56:52 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v4] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 09:51:14 GMT, Axel Boldt-Christmas wrote: >>
Background (expandable section) >> ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. >> >> The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. >> >> Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. >> >> For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. >> >> But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. >> >> A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. >> >> For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. >> >> However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. >> >> In ZGC we call our memory regions pages or zpages. >>
>> >> ### Proposal >> Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. >> >> And ad... > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Revert "NumPartitions is reserved by Shenandoah" > > This reverts commit f7619fd700ec6498948e5e84e8051be145683940. > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8357449 > - Retype ZPageSizeSmallShift to int > - Rename ZPageSizeMediumShift -> ZPageSizeMediumMaxShift > - Update is_disabled comment > - Apply suggestions from code review > > Co-authored-by: Stefan Karlsson > - NumPartitions is reserved by Shenandoah > - Add ZStressFastMediumPageAllocation > - Add TestZMediumPageSizes > - Optimized pre-filter > - ... and 1 more: https://git.openjdk.org/jdk/compare/cbe112a4...43a42685 The NumPartitions have now been confined in the Shenandoah code so you can go probably rename NPartitions to NumPartitions after a rebase. Edit: I see that you've already done that. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25381#issuecomment-2903906949 From stefank at openjdk.org Fri May 23 10:13:52 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 May 2025 10:13:52 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v6] In-Reply-To: References: Message-ID: <5Om_Z03KbgSpEbB4U4ZnzOL2TWA_nrfMmgmRlGpAJ1c=.50e48d0c-ae93-4ec8-982b-b757c7b8f95a@github.com> On Fri, 23 May 2025 06:48:31 GMT, Joel Sikstr?m wrote: >> Hello, >> >> The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. >> >> With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. >> >> To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: >> * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. >> * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). >> * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. >> * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. >> >> Testing: >> * GHA, Oracle's tier 1-4 >> * Manuel inspection of printed content > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Add back MetaspaceUtils::print_on() in Metaspace jcmd This looks good to me! I added a small nit. src/hotspot/share/gc/shared/collectedHeap.hpp line 55: > 53: class GCHeapLog; > 54: class MemoryPool; > 55: class GCMetaspaceLog; Sort these ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25214#pullrequestreview-2863925232 PR Review Comment: https://git.openjdk.org/jdk/pull/25214#discussion_r2104275736 From sjohanss at openjdk.org Fri May 23 10:21:55 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 23 May 2025 10:21:55 GMT Subject: RFR: 8357621: G1: Clean up G1BiasedArray In-Reply-To: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> References: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> Message-ID: On Fri, 23 May 2025 08:07:04 GMT, Thomas Schatzl wrote: > Hi all, > > please review this minor touch-up of the G1BiasedArray classes, removing some unused methods and improving method visibility a bit. > > Testing: gha > > Thanks, > Thomas Looks good. Changes requested by sjohanss (Reviewer). Or are there build issues. src/hotspot/share/gc/g1/g1BiasedArray.hpp line 171: > 169: protected: > 170: // Returns the address of the element the given address maps to > 171: T* address_mapped_to(HeapWord* address) { Seems to be used by testing. ------------- Marked as reviewed by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25406#pullrequestreview-2863934534 PR Review: https://git.openjdk.org/jdk/pull/25406#pullrequestreview-2863945561 PR Comment: https://git.openjdk.org/jdk/pull/25406#issuecomment-2903965932 PR Review Comment: https://git.openjdk.org/jdk/pull/25406#discussion_r2104289012 From sjohanss at openjdk.org Fri May 23 10:37:51 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 23 May 2025 10:37:51 GMT Subject: RFR: 8357306: G1: Remove _gc_succeeded from VM_G1CollectForAllocation because it is always true [v2] In-Reply-To: References: <5KzvE3ghL7_z59-qqjHDSgK_MIPDtfwcBqK7R6svX1o=.35135368-a45f-4cf7-8bc3-8042e45df353@github.com> Message-ID: On Wed, 21 May 2025 10:08:29 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of G1 VM GC operations to remove the _gc_succeeded members because they are not necessary any more - GC operations themselves (i.e. the doit() part) always succeed. >> >> Testing: tier1-3, gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into 8357306-remove-gc-succeeded-in-g1-vm-operations > - * remove comment > - * fix try-concurrent-... > - * some minor refactoring > - 8357306 > > Hi all, > > please review this refactoring of G1 VM GC operations to remove the _gc_succeeded members because they are not necessary any more - GC operations themselves (i.e. the doit() part) always succeed. > > Testing: tier1-3, gha > > Thanks, > Thomas Marked as reviewed by sjohanss (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25320#pullrequestreview-2863980136 From jsikstro at openjdk.org Fri May 23 10:47:37 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 23 May 2025 10:47:37 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v7] In-Reply-To: References: Message-ID: <8dQ4VQlJ3ZBdMZJnq43HeBRLRiqOutWvvxvOccMAtNY=.403da2ad-8b0b-460a-b71c-4fb628b8aeaa@github.com> > Hello, > > The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. > > With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. > > To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: > * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. > * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). > * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. > * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. > > Testing: > * GHA, Oracle's tier 1-4 > * Manuel inspection of printed content Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Sort forward declarations in collectedHeap.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25214/files - new: https://git.openjdk.org/jdk/pull/25214/files/b54ebc02..cd326c2a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25214&range=05-06 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25214/head:pull/25214 PR: https://git.openjdk.org/jdk/pull/25214 From jsikstro at openjdk.org Fri May 23 10:47:37 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 23 May 2025 10:47:37 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v6] In-Reply-To: <5Om_Z03KbgSpEbB4U4ZnzOL2TWA_nrfMmgmRlGpAJ1c=.50e48d0c-ae93-4ec8-982b-b757c7b8f95a@github.com> References: <5Om_Z03KbgSpEbB4U4ZnzOL2TWA_nrfMmgmRlGpAJ1c=.50e48d0c-ae93-4ec8-982b-b757c7b8f95a@github.com> Message-ID: On Fri, 23 May 2025 10:11:12 GMT, Stefan Karlsson wrote: >> Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> Add back MetaspaceUtils::print_on() in Metaspace jcmd > > src/hotspot/share/gc/shared/collectedHeap.hpp line 55: > >> 53: class GCHeapLog; >> 54: class MemoryPool; >> 55: class GCMetaspaceLog; > > Sort these Fixed. I weant ahead and sorted GCMemoryManager as well, which is now in the right alphabetically-ordered place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25214#discussion_r2104326038 From stefank at openjdk.org Fri May 23 10:48:52 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 May 2025 10:48:52 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v2] In-Reply-To: <24IePcN9bC99HgcU_rg6t0cKl4p-pQSSifJtqgokyxY=.2037c51e-4159-4a69-824f-46abb8801256@github.com> References: <24IePcN9bC99HgcU_rg6t0cKl4p-pQSSifJtqgokyxY=.2037c51e-4159-4a69-824f-46abb8801256@github.com> Message-ID: On Fri, 16 May 2025 07:11:38 GMT, Joel Sikstr?m wrote: >> Hello, >> >> This RFE improves utility for converting to/from, iterating over and defining structures that are indexed using the `ZPageAge` type. >> >> Converting to/from ZPageAge and its underlying type (uint8_t, often just uint) is currently done via using static_cast. This works fine because sane values are converted in all use cases. However, to make conversion safer (and also more readable), I propose we add a `to_zpageage` and a corresponding `untype` that checks that the conversion is valid. Such conversion methods should be used instead of calling `static_cast`. >> >> We currently define a value called `ZPageAgeMax`, which is defined as `static_cast(ZPageAge::old)`. The majority of places that use this value actualy use `ZPageAgeMax + 1`, which is equivalent to the number of ages. Instead, I propose we define and use a value that represents the number of possible ages, called `ZPageAgeCount`. >> >> Lastly, to make iterating over ages more accessible, I propose we create an intreface of enum iterators of ZPageAge. This will also create a foundation for generating values that require a ZPageAge in the future. Since the end of the enum iterators are exclusive, I've opted to use the following value as end for the iterators: >> >> constexpr ZPageAge ZPageAgeLastPlusOne = static_cast(ZPageAgeCount); >> >> >> I see us using either this or a sentinel/dummy value at the end of the enum class, but I prefer having a value similar to `ZPageAgeLastPlusOne` over a dummy value. >> >> Testing: >> * Oracle's tier 1-4 >> * GHA > > Joel Sikstr?m has updated the pull request incrementally with two additional commits since the last revision: > > - Copyright years > - Simplify untype(ZPageAge age) A couple of questions before fully reviewing this? src/hotspot/share/gc/z/zAllocator.inline.hpp line 38: > 36: > 37: inline ZAllocatorForRelocation* ZAllocator::relocation(ZPageAge page_age) { > 38: return _relocation[untype(page_age) - 1]; I wonder if we should have our own defined ZPageAge operators for `+` and `-` (or maybe inc/dec) and we could add verification that we don't fall out of the range? Suggestion: return _relocation[untype(page_age - 1)]; src/hotspot/share/gc/z/zPageAge.hpp line 55: > 53: static_cast(ZPageAge::eden), > 54: ZPageAgeCount); > 55: Could this be using the other define to set this up? Suggestion: ENUMERATOR_VALUE_RANGE(ZPageAge, ZPageAge::eden, ZPageAge::old); Or does that not work? ------------- PR Review: https://git.openjdk.org/jdk/pull/25251#pullrequestreview-2863980273 PR Review Comment: https://git.openjdk.org/jdk/pull/25251#discussion_r2104314371 PR Review Comment: https://git.openjdk.org/jdk/pull/25251#discussion_r2104312132 From stefank at openjdk.org Fri May 23 10:49:56 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 May 2025 10:49:56 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v7] In-Reply-To: <8dQ4VQlJ3ZBdMZJnq43HeBRLRiqOutWvvxvOccMAtNY=.403da2ad-8b0b-460a-b71c-4fb628b8aeaa@github.com> References: <8dQ4VQlJ3ZBdMZJnq43HeBRLRiqOutWvvxvOccMAtNY=.403da2ad-8b0b-460a-b71c-4fb628b8aeaa@github.com> Message-ID: On Fri, 23 May 2025 10:47:37 GMT, Joel Sikstr?m wrote: >> Hello, >> >> The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. >> >> With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. >> >> To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: >> * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. >> * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). >> * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. >> * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. >> >> Testing: >> * GHA, Oracle's tier 1-4 >> * Manuel inspection of printed content > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Sort forward declarations in collectedHeap.hpp Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25214#pullrequestreview-2864009659 From stefank at openjdk.org Fri May 23 11:11:51 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 May 2025 11:11:51 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v2] In-Reply-To: <24IePcN9bC99HgcU_rg6t0cKl4p-pQSSifJtqgokyxY=.2037c51e-4159-4a69-824f-46abb8801256@github.com> References: <24IePcN9bC99HgcU_rg6t0cKl4p-pQSSifJtqgokyxY=.2037c51e-4159-4a69-824f-46abb8801256@github.com> Message-ID: On Fri, 16 May 2025 07:11:38 GMT, Joel Sikstr?m wrote: >> Hello, >> >> This RFE improves utility for converting to/from, iterating over and defining structures that are indexed using the `ZPageAge` type. >> >> Converting to/from ZPageAge and its underlying type (uint8_t, often just uint) is currently done via using static_cast. This works fine because sane values are converted in all use cases. However, to make conversion safer (and also more readable), I propose we add a `to_zpageage` and a corresponding `untype` that checks that the conversion is valid. Such conversion methods should be used instead of calling `static_cast`. >> >> We currently define a value called `ZPageAgeMax`, which is defined as `static_cast(ZPageAge::old)`. The majority of places that use this value actualy use `ZPageAgeMax + 1`, which is equivalent to the number of ages. Instead, I propose we define and use a value that represents the number of possible ages, called `ZPageAgeCount`. >> >> Lastly, to make iterating over ages more accessible, I propose we create an intreface of enum iterators of ZPageAge. This will also create a foundation for generating values that require a ZPageAge in the future. Since the end of the enum iterators are exclusive, I've opted to use the following value as end for the iterators: >> >> constexpr ZPageAge ZPageAgeLastPlusOne = static_cast(ZPageAgeCount); >> >> >> I see us using either this or a sentinel/dummy value at the end of the enum class, but I prefer having a value similar to `ZPageAgeLastPlusOne` over a dummy value. >> >> Testing: >> * Oracle's tier 1-4 >> * GHA > > Joel Sikstr?m has updated the pull request incrementally with two additional commits since the last revision: > > - Copyright years > - Simplify untype(ZPageAge age) src/hotspot/share/utilities/enumIterator.hpp line 269: > 267: } > 268: > 269: template I think this should be using T to match the surrounding code. Suggestion: template ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25251#discussion_r2104357958 From stefank at openjdk.org Fri May 23 11:11:52 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 May 2025 11:11:52 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v2] In-Reply-To: References: <24IePcN9bC99HgcU_rg6t0cKl4p-pQSSifJtqgokyxY=.2037c51e-4159-4a69-824f-46abb8801256@github.com> Message-ID: On Fri, 23 May 2025 11:06:37 GMT, Stefan Karlsson wrote: >> Joel Sikstr?m has updated the pull request incrementally with two additional commits since the last revision: >> >> - Copyright years >> - Simplify untype(ZPageAge age) > > src/hotspot/share/utilities/enumIterator.hpp line 269: > >> 267: } >> 268: >> 269: template > > I think this should be using T to match the surrounding code. > Suggestion: > > template BTW, if you are going to change this could you also fix the pre-existing issues with the includes so that it matches the style guide? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25251#discussion_r2104361556 From tschatzl at openjdk.org Fri May 23 11:13:31 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 23 May 2025 11:13:31 GMT Subject: RFR: 8357621: G1: Clean up G1BiasedArray [v2] In-Reply-To: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> References: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> Message-ID: > Hi all, > > please review this minor touch-up of the G1BiasedArray classes, removing some unused methods and improving method visibility a bit. > > Testing: gha > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Re-add address_mapped_to, used by testing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25406/files - new: https://git.openjdk.org/jdk/pull/25406/files/77b6ff36..fa1c19c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25406&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25406&range=00-01 Stats: 18 lines in 2 files changed: 17 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25406.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25406/head:pull/25406 PR: https://git.openjdk.org/jdk/pull/25406 From sjohanss at openjdk.org Fri May 23 11:53:52 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 23 May 2025 11:53:52 GMT Subject: RFR: 8357621: G1: Clean up G1BiasedArray [v2] In-Reply-To: References: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> Message-ID: <6EWv3XT0A0wsktjFthVcbzwU12wMkm8rrK-2jcfKaw8=.2dd8e56a-151c-49b9-9327-a3df039181ab@github.com> On Fri, 23 May 2025 11:13:31 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this minor touch-up of the G1BiasedArray classes, removing some unused methods and improving method visibility a bit. >> >> Testing: gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Re-add address_mapped_to, used by testing src/hotspot/share/gc/g1/g1BiasedArray.hpp line 106: > 104: class G1BiasedMappedArray : public G1BiasedMappedArrayBase { > 105: > 106: T* base() const { return (T*)G1BiasedMappedArrayBase::_base; } Looks like `base() `needs to be public for testing as well :( ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25406#discussion_r2104429106 From tschatzl at openjdk.org Fri May 23 12:03:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 23 May 2025 12:03:57 GMT Subject: Integrated: 8357306: G1: Remove _gc_succeeded from VM_G1CollectForAllocation because it is always true In-Reply-To: <5KzvE3ghL7_z59-qqjHDSgK_MIPDtfwcBqK7R6svX1o=.35135368-a45f-4cf7-8bc3-8042e45df353@github.com> References: <5KzvE3ghL7_z59-qqjHDSgK_MIPDtfwcBqK7R6svX1o=.35135368-a45f-4cf7-8bc3-8042e45df353@github.com> Message-ID: On Tue, 20 May 2025 07:37:19 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring of G1 VM GC operations to remove the _gc_succeeded members because they are not necessary any more - GC operations themselves (i.e. the doit() part) always succeed. > > Testing: tier1-3, gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: 48df41b6 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/48df41b6997cfe2c8aa3bc46ea25eff01f615d31 Stats: 55 lines in 4 files changed: 2 ins; 22 del; 31 mod 8357306: G1: Remove _gc_succeeded from VM_G1CollectForAllocation because it is always true Reviewed-by: ayang, sjohanss ------------- PR: https://git.openjdk.org/jdk/pull/25320 From tschatzl at openjdk.org Fri May 23 12:03:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 23 May 2025 12:03:57 GMT Subject: RFR: 8357306: G1: Remove _gc_succeeded from VM_G1CollectForAllocation because it is always true [v2] In-Reply-To: References: <5KzvE3ghL7_z59-qqjHDSgK_MIPDtfwcBqK7R6svX1o=.35135368-a45f-4cf7-8bc3-8042e45df353@github.com> Message-ID: On Thu, 22 May 2025 06:58:52 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into 8357306-remove-gc-succeeded-in-g1-vm-operations >> - * remove comment >> - * fix try-concurrent-... >> - * some minor refactoring >> - 8357306 >> >> Hi all, >> >> please review this refactoring of G1 VM GC operations to remove the _gc_succeeded members because they are not necessary any more - GC operations themselves (i.e. the doit() part) always succeed. >> >> Testing: tier1-3, gha >> >> Thanks, >> Thomas > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @kstefanj for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/25320#issuecomment-2904196630 From jsikstro at openjdk.org Fri May 23 12:24:55 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 23 May 2025 12:24:55 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v2] In-Reply-To: References: <24IePcN9bC99HgcU_rg6t0cKl4p-pQSSifJtqgokyxY=.2037c51e-4159-4a69-824f-46abb8801256@github.com> Message-ID: <8kzCnssycUZffSpP0077NIrilq68CNuTxLtiWFK5teE=.f1b7ed83-96f0-4e48-92d2-e13b32693c1b@github.com> On Fri, 23 May 2025 10:36:25 GMT, Stefan Karlsson wrote: >> Joel Sikstr?m has updated the pull request incrementally with two additional commits since the last revision: >> >> - Copyright years >> - Simplify untype(ZPageAge age) > > src/hotspot/share/gc/z/zAllocator.inline.hpp line 38: > >> 36: >> 37: inline ZAllocatorForRelocation* ZAllocator::relocation(ZPageAge page_age) { >> 38: return _relocation[untype(page_age) - 1]; > > I wonder if we should have our own defined ZPageAge operators for `+` and `-` (or maybe inc/dec) and we could add verification that we don't fall out of the range? > Suggestion: > > return _relocation[untype(page_age - 1)]; I agree. That also allows us to do something similar to how the offset types are handled in zAddress.inline.hpp, to check if the size being added/substracted is within the underlying type. > src/hotspot/share/gc/z/zPageAge.hpp line 55: > >> 53: static_cast(ZPageAge::eden), >> 54: ZPageAgeCount); >> 55: > > Could this be using the other define to set this up? > Suggestion: > > ENUMERATOR_VALUE_RANGE(ZPageAge, > ZPageAge::eden, > ZPageAge::old); > > Or does that not work? Yes, using the other define (`ENUMERATOR_RANGE`) with the enum type instead of the underlying type makes more sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25251#discussion_r2104477274 PR Review Comment: https://git.openjdk.org/jdk/pull/25251#discussion_r2104477138 From jsikstro at openjdk.org Fri May 23 12:24:56 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 23 May 2025 12:24:56 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v2] In-Reply-To: References: <24IePcN9bC99HgcU_rg6t0cKl4p-pQSSifJtqgokyxY=.2037c51e-4159-4a69-824f-46abb8801256@github.com> Message-ID: On Fri, 23 May 2025 11:09:20 GMT, Stefan Karlsson wrote: >> src/hotspot/share/utilities/enumIterator.hpp line 269: >> >>> 267: } >>> 268: >>> 269: template >> >> I think this should be using T to match the surrounding code. >> Suggestion: >> >> template > > BTW, if you are going to change this could you also fix the pre-existing issues with the includes so that it matches the style guide? I agree, `T` is better. I'll fix includes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25251#discussion_r2104477364 From tschatzl at openjdk.org Fri May 23 12:25:36 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 23 May 2025 12:25:36 GMT Subject: RFR: 8357621: G1: Clean up G1BiasedArray [v3] In-Reply-To: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> References: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> Message-ID: > Hi all, > > please review this minor touch-up of the G1BiasedArray classes, removing some unused methods and improving method visibility a bit. > > Testing: gha > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * fix gtests: move gtest specific code to gtest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25406/files - new: https://git.openjdk.org/jdk/pull/25406/files/fa1c19c8..03da340c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25406&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25406&range=01-02 Stats: 35 lines in 3 files changed: 16 ins; 17 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25406.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25406/head:pull/25406 PR: https://git.openjdk.org/jdk/pull/25406 From jsikstro at openjdk.org Fri May 23 12:28:45 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 23 May 2025 12:28:45 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v3] In-Reply-To: References: Message-ID: > Hello, > > This RFE improves utility for converting to/from, iterating over and defining structures that are indexed using the `ZPageAge` type. > > Converting to/from ZPageAge and its underlying type (uint8_t, often just uint) is currently done via using static_cast. This works fine because sane values are converted in all use cases. However, to make conversion safer (and also more readable), I propose we add a `to_zpageage` and a corresponding `untype` that checks that the conversion is valid. Such conversion methods should be used instead of calling `static_cast`. > > We currently define a value called `ZPageAgeMax`, which is defined as `static_cast(ZPageAge::old)`. The majority of places that use this value actualy use `ZPageAgeMax + 1`, which is equivalent to the number of ages. Instead, I propose we define and use a value that represents the number of possible ages, called `ZPageAgeCount`. > > Lastly, to make iterating over ages more accessible, I propose we create an intreface of enum iterators of ZPageAge. This will also create a foundation for generating values that require a ZPageAge in the future. Since the end of the enum iterators are exclusive, I've opted to use the following value as end for the iterators: > > constexpr ZPageAge ZPageAgeLastPlusOne = static_cast(ZPageAgeCount); > > > I see us using either this or a sentinel/dummy value at the end of the enum class, but I prefer having a value similar to `ZPageAgeLastPlusOne` over a dummy value. > > Testing: > * Oracle's tier 1-4 > * GHA Joel Sikstr?m has updated the pull request incrementally with five additional commits since the last revision: - Style fix :) - Added operator+/- for ZPageAge - Fix include order in enumIterator.hpp - Use T instead of EnumType - Use ENUMERATOR_RANGE instead of ENUMERATOR_VALUE_RANGE ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25251/files - new: https://git.openjdk.org/jdk/pull/25251/files/3e0af5e3..25ddc320 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25251&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25251&range=01-02 Stats: 26 lines in 5 files changed: 15 ins; 2 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/25251.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25251/head:pull/25251 PR: https://git.openjdk.org/jdk/pull/25251 From jsikstro at openjdk.org Fri May 23 12:32:53 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 23 May 2025 12:32:53 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v4] In-Reply-To: References: <7EztCfjwj8KrtUzBxNcIQOGccgfCh6DcKE9143ZoYis=.7ed24333-6ef0-43c1-8d99-482f6a845600@github.com> Message-ID: On Fri, 23 May 2025 04:36:25 GMT, Thomas Stuefe wrote: >> I agree. It's hard to grep for specific information since most GC have different approaches to printing similar information. >> >> However, all GCs (even Epsilon) print the string "used", so maybe grepping for that is a reasonable approach, just to see that something is printed? > > Heap.info is important; we should have better tests for this. But we can hold this off for a separate PR, so I am fine with scanning for "used". I agree that the test should be improved. I'll file an issue once this PR is integrated to follow up on that. Maybe some inspiration can be taken from the SA test in `test/hotspot/jtreg/serviceability/sa/TestUniverse.java`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25214#discussion_r2104489194 From tschatzl at openjdk.org Fri May 23 12:36:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 23 May 2025 12:36:51 GMT Subject: RFR: 8357621: G1: Clean up G1BiasedArray [v2] In-Reply-To: <6EWv3XT0A0wsktjFthVcbzwU12wMkm8rrK-2jcfKaw8=.2dd8e56a-151c-49b9-9327-a3df039181ab@github.com> References: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> <6EWv3XT0A0wsktjFthVcbzwU12wMkm8rrK-2jcfKaw8=.2dd8e56a-151c-49b9-9327-a3df039181ab@github.com> Message-ID: On Fri, 23 May 2025 11:51:15 GMT, Stefan Johansson wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Re-add address_mapped_to, used by testing > > src/hotspot/share/gc/g1/g1BiasedArray.hpp line 106: > >> 104: class G1BiasedMappedArray : public G1BiasedMappedArrayBase { >> 105: >> 106: T* base() const { return (T*)G1BiasedMappedArrayBase::_base; } > > Looks like `base() `needs to be public for testing as well :( Sorry for all these issues, I just found out that I did not have gtests configured for my build. Should be fixed now. I moved the test helper methods into the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25406#discussion_r2104496443 From tony at rivosinc.com Fri May 23 13:18:03 2025 From: tony at rivosinc.com (Tony Printezis) Date: Fri, 23 May 2025 09:18:03 -0400 Subject: GC and pointer masking Message-ID: <9A0FD4F5-6E94-419C-870F-68F37AB632BC@rivosinc.com> Hi all, Pointer masking is available for some architectures (including RISC-V!). This can allow us to mark the top bits of an object reference with what type of objects it is (young / old / humongous / etc.) without needing to clear those bits explicitly before we use the reference. This can be helpful both in the GC itself but also in the barriers (e.g., efficiently filter out young objects in barriers that are not needed on young objects). Has anyone already looked into taking advantage of pointer masking in HotSpot? I tried a couple of searches but I didn?t find anything. If there?s been a discussion on this before, can you please point me to it? Thanks, Tony From aboldtch at openjdk.org Fri May 23 13:51:09 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 23 May 2025 13:51:09 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v5] In-Reply-To: References: Message-ID: >
Background (expandable section) > ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. > > The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. > > Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. > > For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. > > But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. > > A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. > > For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. > > However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. > > In ZGC we call our memory regions pages or zpages. >
> > ### Proposal > Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. > > And adds a "fast" medium page allocation path in the p... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Missing -XX:+UnlockDiagnosticVMOptions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25381/files - new: https://git.openjdk.org/jdk/pull/25381/files/43a42685..f6069efb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=03-04 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25381/head:pull/25381 PR: https://git.openjdk.org/jdk/pull/25381 From rcastanedalo at openjdk.org Fri May 23 13:55:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 23 May 2025 13:55:57 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Thu, 22 May 2025 13:30:28 GMT, Jatin Bhateja wrote: > > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > > > > > Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. > > Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java OK, thanks for checking Jatin! Have you also checked whether, at least in some of the cases, some of the APX EGPRs are live across the runtime call (i.e. are defined before the call and used after the call), and whether the called runtime routine typically clobbers these registers? Knowing that this case is exercised in the test runs would be good to be confident about the correctness of the patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2904504259 From stefank at openjdk.org Fri May 23 15:13:53 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 May 2025 15:13:53 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v3] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 12:28:45 GMT, Joel Sikstr?m wrote: >> Hello, >> >> This RFE improves utility for converting to/from, iterating over and defining structures that are indexed using the `ZPageAge` type. >> >> Converting to/from ZPageAge and its underlying type (uint8_t, often just uint) is currently done via using static_cast. This works fine because sane values are converted in all use cases. However, to make conversion safer (and also more readable), I propose we add a `to_zpageage` and a corresponding `untype` that checks that the conversion is valid. Such conversion methods should be used instead of calling `static_cast`. >> >> We currently define a value called `ZPageAgeMax`, which is defined as `static_cast(ZPageAge::old)`. The majority of places that use this value actualy use `ZPageAgeMax + 1`, which is equivalent to the number of ages. Instead, I propose we define and use a value that represents the number of possible ages, called `ZPageAgeCount`. >> >> Lastly, to make iterating over ages more accessible, I propose we create an intreface of enum iterators of ZPageAge. This will also create a foundation for generating values that require a ZPageAge in the future. Since the end of the enum iterators are exclusive, I've opted to use the following value as end for the iterators: >> >> constexpr ZPageAge ZPageAgeLastPlusOne = static_cast(ZPageAgeCount); >> >> >> I see us using either this or a sentinel/dummy value at the end of the enum class, but I prefer having a value similar to `ZPageAgeLastPlusOne` over a dummy value. >> >> Testing: >> * Oracle's tier 1-4 >> * GHA > > Joel Sikstr?m has updated the pull request incrementally with five additional commits since the last revision: > > - Style fix :) > - Added operator+/- for ZPageAge > - Fix include order in enumIterator.hpp > - Use T instead of EnumType > - Use ENUMERATOR_RANGE instead of ENUMERATOR_VALUE_RANGE Looks good but I found a couple nits. src/hotspot/share/gc/z/zAllocator.inline.hpp line 30: > 28: > 29: #include "gc/z/zAddress.inline.hpp" > 30: #include "gc/z/zPageAge.inline.hpp" Sort order. src/hotspot/share/utilities/enumIterator.hpp line 241: > 239: private: > 240: > 241: struct ConstExprConstructTag {}; The above is implicitly private so this should probably be removed: Suggestion: struct ConstExprConstructTag {}; ------------- PR Review: https://git.openjdk.org/jdk/pull/25251#pullrequestreview-2864760693 PR Review Comment: https://git.openjdk.org/jdk/pull/25251#discussion_r2104793485 PR Review Comment: https://git.openjdk.org/jdk/pull/25251#discussion_r2104801793 From kdnilsen at openjdk.org Fri May 23 16:05:58 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 23 May 2025 16:05:58 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old In-Reply-To: References: Message-ID: On Wed, 21 May 2025 18:05:55 GMT, William Kemper wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1670: > >> 1668: >> 1669: if (mode()->is_generational()) { >> 1670: // young-gen heuristics track young, bootstrap, and global GC cycle times > > This seems like an unrelated change. Mixing in bootstrap and global gc cycle times is likely to increase the average time and make the heuristics more aggressive. Good catch. I will remove this code from this PR. (It will come back in a subsequent PR, to support adaptive old evac ratios. There, we consolidate the accounting for all gc in the "young heuristics", but we parameterize each phase so as to not bias triggering. Marking phase time is parameterized with bytes marked, for example, ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2104904914 From kdnilsen at openjdk.org Fri May 23 16:37:57 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 23 May 2025 16:37:57 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old In-Reply-To: References: Message-ID: On Wed, 21 May 2025 17:54:12 GMT, William Kemper wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > src/hotspot/share/gc/shenandoah/shenandoahGenerationSizer.cpp line 206: > >> 204: old_gen->decrease_capacity(bytes_to_transfer); >> 205: const size_t new_size = young_gen->max_capacity(); >> 206: log_info(gc)("Forcing transfer of %zu region(s) from %s to %s, yielding increased size: " PROPERFMT, > > Can this be `gc, ergo` like the method to transfer regions to old? Agreed. Making this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2104996772 From kdnilsen at openjdk.org Fri May 23 17:20:59 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 23 May 2025 17:20:59 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old In-Reply-To: References: Message-ID: On Wed, 21 May 2025 17:55:51 GMT, William Kemper wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > src/hotspot/share/gc/shenandoah/shenandoahGenerationSizer.cpp line 193: > >> 191: const size_t new_size = old_gen->max_capacity(); >> 192: log_info(gc, ergo)("Forcing transfer of %zu region(s) from %s to %s, yielding increased size: " PROPERFMT, >> 193: regions, young_gen->name(), old_gen->name(), PROPERFMTARGS(new_size)); > > If this is now really only used for in-place promotions, can we change the log message to indicate the region is being promoted? I think when users see messages about things being "forced" in the log, they start to wonder if young/old sizes need to adjusted. It actually does have one other use. I'm refactoring to distinguish the different purposes. Replacing ShenandoahGenerationSizer log messages with log_develop_debug messages. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2105079634 From kdnilsen at openjdk.org Fri May 23 17:57:52 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 23 May 2025 17:57:52 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old In-Reply-To: References: Message-ID: On Wed, 21 May 2025 18:00:34 GMT, William Kemper wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalHeap.cpp line 298: > >> 296: if (copy == nullptr) { >> 297: // If we failed to allocate in LAB, we'll try a shared allocation. >> 298: #ifdef KELVIN_ORIGINAL > > This looks like debugging code? Should we back this out? Sorry I forgot to clean this up. Refined the comment and removed the #ifdef controls > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp line 279: > >> 277: static size_t setup_sizes(size_t max_heap_size); >> 278: >> 279: inline bool is_recycling() { > > Is this used anywhere? Good catch. Removing this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2105174403 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2105175118 From kdnilsen at openjdk.org Fri May 23 18:40:58 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 23 May 2025 18:40:58 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old In-Reply-To: References: Message-ID: On Wed, 21 May 2025 18:32:00 GMT, William Kemper wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 376: > >> 374: "runs out of memory too early.") \ >> 375: \ >> 376: product(uintx, ShenandoahOldEvacRatioPercent, 75, EXPERIMENTAL, \ > > Phew, this is a lot of explanatory text and it reads like the target audience are GC developers. If we are going to expose this as a user configurable option, I think the help text should just explain how the behavior changes as this values goes up or down. Something like: >> Increasing this allows for more old regions in mixed collections. Decreasing this reduces the number of old regions in mixed collections. > > The first sentence makes it seem as though this is the percentage of the entire heap to reserve for old evacuations, but the next clarifies that this is the percentage of the collection set. > > Question about this sentence: >> A value of 100 allows a mixed evacuation to focus entirely on old-gen memory, allowing no young-gen regions to be collected. > > With a setting of 100, would GenShen still "preselect" young regions of tenuring age with sufficient garbage into the collection set? > > I also find the name of the option slightly confusing - is it a ratio? or a percentage? Seems like it's really a percentage (though it controls the ratio of reserves used for the collection). Thank you for honest feedback. Way too many words. I've pared the description back and tried to clarify the confusion you describe. Am also renaming to ShenandoahOldEvacPercent. Please let me know what you think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2105223767 From kdnilsen at openjdk.org Fri May 23 19:23:32 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 23 May 2025 19:23:32 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v2] In-Reply-To: References: Message-ID: > Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). > > This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. > > The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. > > ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) > > With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. > > ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) > > At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. > > ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) > > The command line for these comparisons follows: > > > ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:+Unlock... Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - respond to reviewer feedback - Keep gc cycle times with heuristics for the relevant generation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25357/files - new: https://git.openjdk.org/jdk/pull/25357/files/3d55a646..ae8c83c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25357&range=00-01 Stats: 135 lines in 9 files changed: 19 ins; 72 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/25357.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357 PR: https://git.openjdk.org/jdk/pull/25357 From sjohanss at openjdk.org Fri May 23 19:32:55 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 23 May 2025 19:32:55 GMT Subject: RFR: 8357621: G1: Clean up G1BiasedArray [v3] In-Reply-To: References: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> Message-ID: On Fri, 23 May 2025 12:25:36 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this minor touch-up of the G1BiasedArray classes, removing some unused methods and improving method visibility a bit. >> >> Testing: gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix gtests: move gtest specific code to gtest Marked as reviewed by sjohanss (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25406#pullrequestreview-2865503746 From kdnilsen at openjdk.org Fri May 23 19:40:52 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 23 May 2025 19:40:52 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v2] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 18:35:01 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: >> >> - respond to reviewer feedback >> - Keep gc cycle times with heuristics for the relevant generation > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalHeap.hpp line 149: > >> 147: >> 148: // Transfers surplus old regions to young, or takes regions from young to satisfy old region deficit >> 149: TransferResult balance_generations(); > > Are we still using this `TransferResult` thing? Seems like we might be able to delete it with this change. You're right. Deleting. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2105303068 From vlivanov at openjdk.org Fri May 23 22:08:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 23 May 2025 22:08:55 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v19] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 11:10:15 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More touchups I feel uneasy about all the complications introduced by coordination between accessors. It looks like supporting concurrent release operation adds a lot of complexity. Weak -> strong transition is monotonic, so shouldn't need as much care. What do you think about making release operation part of CompileTask recycling (e.g., in `UnloadableMethodHandle` destructor)? By the time it happens, there should not be any other users of the task. (Otherwise, recycling concurrently accesses task is unsafe anyway). ------------- PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2865813410 From wkemper at openjdk.org Fri May 23 23:45:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 23 May 2025 23:45:53 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v2] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 19:23:32 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - respond to reviewer feedback > - Keep gc cycle times with heuristics for the relevant generation src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 376: > 374: "runs out of memory too early.") \ > 375: \ > 376: product(uintx, ShenandoahOldEvacPercent, 75, EXPERIMENTAL, \ This is much easier to read. Thank you for making the change! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2105525384 From gli at openjdk.org Sun May 25 09:06:34 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 25 May 2025 09:06:34 GMT Subject: RFR: 8357109: Parallel: Fix typo in YoungedGeneration Message-ID: Hi all, This trivial patch fixes two typos. Thanks for your review. Best Regards, -- Guoxiong ------------- Commit messages: - JDK-8357109 Changes: https://git.openjdk.org/jdk/pull/25436/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25436&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357109 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25436.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25436/head:pull/25436 PR: https://git.openjdk.org/jdk/pull/25436 From zgu at openjdk.org Sun May 25 13:32:56 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Sun, 25 May 2025 13:32:56 GMT Subject: RFR: 8357109: Parallel: Fix typo in YoungedGeneration In-Reply-To: References: Message-ID: <-D5KYrGkIop12UQfnB4lTKFNKQZVLiIaCKklKGYwDis=.12871fee-0149-4f7f-b67a-2116d18c5806@github.com> On Sun, 25 May 2025 09:02:03 GMT, Guoxiong Li wrote: > Hi all, > > This trivial patch fixes two typos. Thanks for your review. > > Best Regards, > -- Guoxiong LGTM and trivial ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25436#pullrequestreview-2866844921 From syan at openjdk.org Sun May 25 15:17:50 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 25 May 2025 15:17:50 GMT Subject: RFR: 8357109: Parallel: Fix typo in YoungedGeneration In-Reply-To: References: Message-ID: On Sun, 25 May 2025 09:02:03 GMT, Guoxiong Li wrote: > Hi all, > > This trivial patch fixes two typos. Thanks for your review. > > Best Regards, > -- Guoxiong Changes requested by syan (Committer). src/hotspot/share/gc/shared/gc_globals.hpp line 365: > 363: \ > 364: product(uint, YoungGenerationSizeSupplement, 80, \ > 365: "Supplement to YoungedGenerationSizeIncrement used at startup") \ Should we update the copyright year to 2025 ------------- PR Review: https://git.openjdk.org/jdk/pull/25436#pullrequestreview-2866874667 PR Review Comment: https://git.openjdk.org/jdk/pull/25436#discussion_r2106230437 From ayang at openjdk.org Mon May 26 06:26:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 26 May 2025 06:26:52 GMT Subject: RFR: 8357621: G1: Clean up G1BiasedArray [v3] In-Reply-To: References: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> Message-ID: On Fri, 23 May 2025 12:25:36 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this minor touch-up of the G1BiasedArray classes, removing some unused methods and improving method visibility a bit. >> >> Testing: gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix gtests: move gtest specific code to gtest Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25406#pullrequestreview-2867390119 From ayang at openjdk.org Mon May 26 06:27:53 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 26 May 2025 06:27:53 GMT Subject: RFR: 8357109: Parallel: Fix typo in YoungedGeneration In-Reply-To: References: Message-ID: On Sun, 25 May 2025 09:02:03 GMT, Guoxiong Li wrote: > Hi all, > > This trivial patch fixes two typos. Thanks for your review. > > Best Regards, > -- Guoxiong Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25436#pullrequestreview-2867391973 From aboldtch at openjdk.org Mon May 26 08:16:17 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 26 May 2025 08:16:17 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v19] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 11:10:15 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More touchups Just a drive by comment. Not sure what our opinion is w.r.t. `mutable`, but how do we feel about typing the spin lock as `mutable` and keep `is_safe()` and `method*()` const. We can then keep the old signature for `CompileTask::is_unloaded()` `CompileTask::method()` and `ArenaStatCounter::ArenaStatCounter(...)`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2867592646 From tschatzl at openjdk.org Mon May 26 08:25:09 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 26 May 2025 08:25:09 GMT Subject: RFR: 8357621: G1: Clean up G1BiasedArray [v3] In-Reply-To: References: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> Message-ID: On Fri, 23 May 2025 12:25:36 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this minor touch-up of the G1BiasedArray classes, removing some unused methods and improving method visibility a bit. >> >> Testing: gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix gtests: move gtest specific code to gtest The two GHA failures seem to be something unrelated, CDS issues. Thanks @albertnetymk @kstefanj for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/25406#issuecomment-2908927351 From tschatzl at openjdk.org Mon May 26 08:30:24 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 26 May 2025 08:30:24 GMT Subject: Integrated: 8357621: G1: Clean up G1BiasedArray In-Reply-To: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> References: <3Pw3sgr9MNkv2MIev1giMShq5dQ6mkN_Fypx6LUNB_4=.992a4e01-b642-410e-a9e6-20708fac674f@github.com> Message-ID: On Fri, 23 May 2025 08:07:04 GMT, Thomas Schatzl wrote: > Hi all, > > please review this minor touch-up of the G1BiasedArray classes, removing some unused methods and improving method visibility a bit. > > Testing: gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: 9946c85e Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/9946c85e2f905f18340a76cebce04b0003783cb4 Stats: 81 lines in 3 files changed: 34 ins; 46 del; 1 mod 8357621: G1: Clean up G1BiasedArray Reviewed-by: sjohanss, ayang ------------- PR: https://git.openjdk.org/jdk/pull/25406 From jbhateja at openjdk.org Mon May 26 12:03:56 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 26 May 2025 12:03:56 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Fri, 23 May 2025 13:53:23 GMT, Roberto Casta?eda Lozano wrote: > > > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > > > > > > > > Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. > > > > > > Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java > > OK, thanks for checking Jatin! > > Have you also checked whether, at least in some of the cases, some of the APX EGPRs are live across the runtime call (i.e. are defined before the call and used after the call), and whether the called runtime routine typically clobbers these registers? Knowing that this case is exercised in the test runs would be good to be confident about the correctness of the patch. Hi @robcasloz, The patch uses new push2/pop2 instructions, which reduces dynamic instruction count needed to save and restore all the caller-saved registers. New instruction sequence based on push2/pop2 not only saves EGPRs but also existing GPRs with shorter JIT sequence. We verified our fix using the following standalone gtest with the Intel Software Development Emulator. [test_ZRuntimeCallSpill_cpp.txt](https://github.com/user-attachments/files/20440415/test_ZRuntimeCallSpill_cpp.txt) Given that gtests is a build-time validation and the JVM itself is built with with minimum feature set, hence am hesitant to add this along with the patch. BTW, ZRuntimeCallSpill is called as part of the slow path barrier for native methods, which can modify EGPRs. Let me know if you think it's good to land in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2909487712 From aboldtch at openjdk.org Mon May 26 12:13:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 26 May 2025 12:13:56 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v3] In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Thu, 22 May 2025 17:42:06 GMT, Jatin Bhateja wrote: >> Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. >> These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. >> ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. >> >> Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Seems fine. Eventually it would be nice if we could generalise this and have the logic in the MacroAssembler. Just a small comment about the conditional rax push and pop. src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 196: > 194: __ movptr(_result, rax); > 195: __ popp(rax); > 196: } Same here. Suggestion: if (_result != rax) { if (_result != nullptr) { __ movptr(_result, rax); } __ popp(rax); } src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 211: > 209: __ movptr(_result, rax); > 210: __ pop(rax); > 211: } Was unsure if we should change the behaviour in the else branch in this PR. But it seems like an alright change. However, I think it is easier to see that this does the correct thing if the condition for pushing and popping are the same. Suggestion: if (_result != rax) { if (_result != noreg) { __ movptr(_result, rax); } __ pop(rax); } ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25351#pullrequestreview-2868218162 PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2107199288 PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2107197141 From jbhateja at openjdk.org Mon May 26 12:56:24 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 26 May 2025 12:56:24 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v4] In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Axel's comments incorporated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25351/files - new: https://git.openjdk.org/jdk/pull/25351/files/9b5c2ac4..bd8b9c51 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=02-03 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25351.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25351/head:pull/25351 PR: https://git.openjdk.org/jdk/pull/25351 From rcastanedalo at openjdk.org Mon May 26 14:07:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 May 2025 14:07:53 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Fri, 23 May 2025 13:53:23 GMT, Roberto Casta?eda Lozano wrote: >>> > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >>> >>> Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. >> >> Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it >> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java > >> > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >> > >> > >> > Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. >> >> Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java > > OK, thanks for checking Jatin! > > Have you also checked whether, at least in some of the cases, some of the APX EGPRs are live across the runtime call (i.e. are defined before the call and used after the call), and whether the called runtime routine typically clobbers these registers? Knowing that this case is exercised in the test runs would be good to be confident about the correctness of the patch. > Hi @robcasloz, The patch uses new push2/pop2 instructions, which reduces dynamic instruction count needed to save and restore all the caller-saved registers. New instruction sequence based on push2/pop2 not only saves EGPRs but also existing GPRs with shorter JIT sequence. We verified our fix using the following standalone gtest with the Intel Software Development Emulator. > > [test_ZRuntimeCallSpill_cpp.txt](https://github.com/user-attachments/files/20440415/test_ZRuntimeCallSpill_cpp.txt) > > Given that gtests is a build-time validation and the JVM itself is built with with minimum feature set, hence am hesitant to add this along with the patch. BTW, ZRuntimeCallSpill is called as part of the slow path barrier for native methods, which can modify EGPRs. > > Let me know if you think it's good to land in. Thanks for the details! Let me run some internal testing, since the PR affects spilling of non-extended registers too (due to special handling of `_result == rax`). Will come back with the results within a day or two. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2909871002 From jsikstro at openjdk.org Mon May 26 14:48:55 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 26 May 2025 14:48:55 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic [v3] In-Reply-To: <1g6Mnw-J8whB4uoR6oC35lOo8Bmk2LWlFQ7yYLTNlRk=.e0cb68fb-9bb2-4e2b-b909-b8fa68138739@github.com> References: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> <1g6Mnw-J8whB4uoR6oC35lOo8Bmk2LWlFQ7yYLTNlRk=.e0cb68fb-9bb2-4e2b-b909-b8fa68138739@github.com> Message-ID: On Tue, 20 May 2025 06:55:21 GMT, Axel Boldt-Christmas wrote: >> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) required changing the way ZGC handle memory uncommitting (returning physical memory to the OS). Previously ZGC tracked how recently used memory was on a ZPage level. [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) did away with the ZPage abstraction for unused memory. But because of this ZGC does not have a convenient way of tracking the usage of a specific memory range. Instead [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) opted to keep a watermark in the cache unused mapped memory, to keep track of the amount of memory that was not used within the last ZUncommitDelay, and use this when deciding how much to uncommit. >> >> Because this measurement is not as granular as previously, and because uncommitting memory is something we want to do conservatively, as a response to low memory utilization, [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was written with the intent to spread out the uncommitting over some time interval. >> >> The actual implementation in [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) has a few issues which this RFE tries to address: >> * Missing wait, the uncommitting is not actually spread out, but happens all at once. >> * Reactivity, if the process starts using memory that was below the previous watermark, uncommitting should stop. >> * Structure, the current implementation has a lot of different dependencies and has state spread out over multiple classes. Refactor to keep the logic contained to the ZUncommitter, and provide better named facilitating functions on the ZPartition and ZMappedCache. And make the lifecycle of ZUncommitter more explicit. >> * Events, overhaul the JFR uncommit events to be sent (and track the time for) a chunk of uncommits without any waits. >> >> An alternative discussed has been to do uncommitting based on GC triggers rather than a periodically. So rather than using ZUncommitDelay, we could have our proactive GCs actually trigger and track uncommitting. This might be a future RFE, but it was not attempted here as it would change user facing APIs. [JDK-8329758](https://bugs.openjdk.org/browse/JDK-8329758) will more than likely overhaul the uncommit triggers as well, and the whole concept of ZUncommitDelay and having to tune how to uncommit will go away. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8356716 > - Wrong too > - Less archaic spelling of complete > - Cleanup and simplify > - Move all uncommit logic to zUncommitter > - Log time spent uncommitting > - Split reset_uncommit_cycle and add headroom > - Rename _min_last_uncommit_cycle to _min_size_watermark > - Use milliseconds instead of seconds > - Improve events and statistics > - ... and 8 more: https://git.openjdk.org/jdk/compare/b3925eac...43c0795a I like how uncommitting becomes more robust with this patch and the overall design of having an uncommit "cycle". Some thoughts: I think we should move activation of the uncommit cycle to `ZUncommitter::run_thread()`, so that it is on the same "level/depth" that also deactivates it. I'm not 100% sure of our style in ZGC, but since we're at it I think the functions in zUncommiter.cpp should match the order in the header, or the other way around. src/hotspot/share/gc/z/zPhysicalMemoryManager.cpp line 119: > 117: if (ZUncommitDelay > max_delay_without_overflow) { > 118: FLAG_SET_ERGO(ZUncommitDelay, max_delay_without_overflow); > 119: } This is a nice addition! src/hotspot/share/gc/z/zUncommitter.cpp line 116: > 114: // Done > 115: break; > 116: } Is it possible to convert this to something like the following to make it clearer that this is the "end condition" of the cycle? From what I can see, 2/3 paths that return 0 in `uncommit()` calls `cancel_uncommit_cycle()`. Suggestion: if (uncommit_cycle_is_canceled() || uncommit_cycle_is_finished()) { // No more work, cycle is done. break; } src/hotspot/share/gc/z/zUncommitter.cpp line 358: > 356: cancel_uncommit_cycle(); > 357: return 0; > 358: } Maybe? Suggestion: if (limit == 0) { // This may occur if the current max capacity for this partition is 0 cancel_uncommit_cycle(); return 0; } ------------- PR Review: https://git.openjdk.org/jdk/pull/25198#pullrequestreview-2868116856 PR Review Comment: https://git.openjdk.org/jdk/pull/25198#discussion_r2107304605 PR Review Comment: https://git.openjdk.org/jdk/pull/25198#discussion_r2107391145 PR Review Comment: https://git.openjdk.org/jdk/pull/25198#discussion_r2107129706 From ayang at openjdk.org Mon May 26 17:39:24 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 26 May 2025 17:39:24 GMT Subject: RFR: 8357801: Parallel: Remove deprecated PSVirtualSpace methods Message-ID: Simple removing some deprecated methods by changing to pointer-type for some fields. Test: tier1-3 ------------- Commit messages: - remove-more - pgc-remove-constructor Changes: https://git.openjdk.org/jdk/pull/25459/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25459&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357801 Stats: 42 lines in 6 files changed: 2 ins; 25 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/25459.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25459/head:pull/25459 PR: https://git.openjdk.org/jdk/pull/25459 From shade at openjdk.org Mon May 26 19:01:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 26 May 2025 19:01:38 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v20] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: - Switch to mutable - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - Spin lock induces false sharing - Merge branch 'master' into JDK-8231269-compile-task-weaks - Merge branch 'master' into JDK-8231269-compile-task-weaks - Rename CompilerTask::is_unloaded back to avoid losing comment context - Simplify select_for_compilation - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - ... and 25 more: https://git.openjdk.org/jdk/compare/ed4cd2ac...0c1c5d65 ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=19 Stats: 429 lines in 11 files changed: 389 ins; 22 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Mon May 26 19:01:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 26 May 2025 19:01:38 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v19] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 07:53:19 GMT, Axel Boldt-Christmas wrote: > Not sure what our opinion is w.r.t. `mutable`, but how do we feel about typing the spin lock as `mutable` and keep `is_safe()` and `method*()` const. I like this a lot! Dropping `const` just to satisfy spin lock (an implementation detail) felt really awkward. New version uses `mutable`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2910465166 From zgu at openjdk.org Tue May 27 01:04:55 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 27 May 2025 01:04:55 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v5] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> <3l8x32wbOr2FZzLV3lYfSbch-6hlT1te0uZXUeQVAcQ=.3ff8422e-fc0a-492f-a6bc-0df6acbc9a66@github.com> <-k-DamMcH1pZ4vSAkWjhlM5PD777oPKlkrX0JK2SsSk=.de6913a9-3e2f-4cc5-bc53-e251c82ed78d@github.com> Message-ID: On Sun, 18 May 2025 15:21:37 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 404: >> >>> 402: collect_at_safepoint(!should_run_young_gc); >>> 403: >>> 404: if (is_gc_overhead_limit_reached()) { >> >> Maybe want to adopt current algorithm, start to clear soft references when approaching gc overhead limit? >> Running a full gc and clearing all soft references without retrying allocation and throws OOM, seems a bit harsh. >> >> People still use soft references for caches, reclaim soft references could potentially free large amount of memory. > > Revised a bit; the limitation of what we have on master is that it doesn't detect gc-overhead for young-gcs. If many young-gcs are run, gc-overhead checking should kick in as well. I wonder if you should try `expand_heap_and_allocate()` under `_is_heap_almost_full` situation as well. I am afraid that it might throw OOM before heap is fully expanded again, because compact GC does not expand heap. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2107963971 From aboldtch at openjdk.org Tue May 27 06:08:10 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 27 May 2025 06:08:10 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic [v4] In-Reply-To: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> References: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> Message-ID: <84mT0fVEmePl7CsLeYrl_Mzc4Xln3-Vg7zu7YBk6GPo=.188c4be6-fc2b-46ac-b80a-2a60a7e42318@github.com> > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) required changing the way ZGC handle memory uncommitting (returning physical memory to the OS). Previously ZGC tracked how recently used memory was on a ZPage level. [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) did away with the ZPage abstraction for unused memory. But because of this ZGC does not have a convenient way of tracking the usage of a specific memory range. Instead [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) opted to keep a watermark in the cache unused mapped memory, to keep track of the amount of memory that was not used within the last ZUncommitDelay, and use this when deciding how much to uncommit. > > Because this measurement is not as granular as previously, and because uncommitting memory is something we want to do conservatively, as a response to low memory utilization, [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was written with the intent to spread out the uncommitting over some time interval. > > The actual implementation in [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) has a few issues which this RFE tries to address: > * Missing wait, the uncommitting is not actually spread out, but happens all at once. > * Reactivity, if the process starts using memory that was below the previous watermark, uncommitting should stop. > * Structure, the current implementation has a lot of different dependencies and has state spread out over multiple classes. Refactor to keep the logic contained to the ZUncommitter, and provide better named facilitating functions on the ZPartition and ZMappedCache. And make the lifecycle of ZUncommitter more explicit. > * Events, overhaul the JFR uncommit events to be sent (and track the time for) a chunk of uncommits without any waits. > > An alternative discussed has been to do uncommitting based on GC triggers rather than a periodically. So rather than using ZUncommitDelay, we could have our proactive GCs actually trigger and track uncommitting. This might be a future RFE, but it was not attempted here as it would change user facing APIs. [JDK-8329758](https://bugs.openjdk.org/browse/JDK-8329758) will more than likely overhaul the uncommit triggers as well, and the whole concept of ZUncommitDelay and having to tune how to uncommit will go away. Axel Boldt-Christmas has updated the pull request incrementally with six additional commits since the last revision: - Avoid excessive logging if ZUncommitDelay == 0 - Move uncommit logic from MappedCache to Uncommitter + cleanup and comment - Better cycle activate / deactivate scoping - Remove newline - Cleanup time logging - Comment describing `uncommitted == 0` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25198/files - new: https://git.openjdk.org/jdk/pull/25198/files/43c0795a..35856089 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=02-03 Stats: 96 lines in 4 files changed: 46 ins; 24 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/25198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25198/head:pull/25198 PR: https://git.openjdk.org/jdk/pull/25198 From aboldtch at openjdk.org Tue May 27 06:10:58 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 27 May 2025 06:10:58 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic [v3] In-Reply-To: References: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> <1g6Mnw-J8whB4uoR6oC35lOo8Bmk2LWlFQ7yYLTNlRk=.e0cb68fb-9bb2-4e2b-b909-b8fa68138739@github.com> Message-ID: On Mon, 26 May 2025 13:45:03 GMT, Joel Sikstr?m wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: >> >> - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8356716 >> - Wrong too >> - Less archaic spelling of complete >> - Cleanup and simplify >> - Move all uncommit logic to zUncommitter >> - Log time spent uncommitting >> - Split reset_uncommit_cycle and add headroom >> - Rename _min_last_uncommit_cycle to _min_size_watermark >> - Use milliseconds instead of seconds >> - Improve events and statistics >> - ... and 8 more: https://git.openjdk.org/jdk/compare/6ee4af7a...43c0795a > > src/hotspot/share/gc/z/zUncommitter.cpp line 116: > >> 114: // Done >> 115: break; >> 116: } > > Is it possible to convert this to something like the following to make it clearer that this is the "end condition" of the cycle? From what I can see, 2/3 paths that return 0 in `uncommit()` calls `cancel_uncommit_cycle()`. > Suggestion: > > if (uncommit_cycle_is_canceled() || uncommit_cycle_is_finished()) { > // No more work, cycle is done. > break; > } It is used as a proxy to not have to retake the lock. Will add a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25198#discussion_r2108258237 From jsikstro at openjdk.org Tue May 27 07:02:04 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 27 May 2025 07:02:04 GMT Subject: RFR: 8356848: Separate Metaspace and GC printing [v5] In-Reply-To: References: Message-ID: <4V4E8FS8SBmH9GFYr2LZOl90Ktio1sXTkKR43L1osLU=.2549b4ee-a8cb-41a1-a7c4-b4a11f3fa218@github.com> On Fri, 23 May 2025 04:36:38 GMT, Thomas Stuefe wrote: >> Joel Sikstr?m has updated the pull request incrementally with four additional commits since the last revision: >> >> - Feedback on Metaspace jcmd >> - Copyright years >> - Make HeapInfoTest.java more robust >> - Switch naming order of ring-buffer names > > Marked as reviewed by stuefe (Reviewer). Thank you @tstuefe and @stefank for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25214#issuecomment-2911361909 From jsikstro at openjdk.org Tue May 27 07:02:04 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 27 May 2025 07:02:04 GMT Subject: Integrated: 8356848: Separate Metaspace and GC printing In-Reply-To: References: Message-ID: On Tue, 13 May 2025 14:10:33 GMT, Joel Sikstr?m wrote: > Hello, > > The goal of this RFE is to separate Metaspace printing from GC printing. The main reason Metaspace and GC printing is coupled the way it is right now is because historically, the permanent generation (PermGen), which was replaced by Metaspace, was part of the GC heap. Hence, it made sense to also print info about the PermGen when printing the GC heap. > > With Metaspace replacing the PermGen, which uses memory that is separate from the GC heap, the coupling has become more loose, raising the question if Metaspace should be printed somewhere else (maybe when printing *other* Metaspace stuff?). A reason to still print Metaspace when printing the heap is that the GC is responsible for unloading classes and nmethods, which means it makes sense to print Metaspace information in connection to when a GC is performed. > > To better reflect the current state of the JVM, I propse we make the following changes to separate Metaspace from GC printing: > * Move Metaspace printing from HeapInfoDCmd to MetaspaceDCmd. > * Move Metaspace printing from the "Heap:" section to "Metaspace:" section in vmError.cpp (hs_err files, the VM.info jcmd and -XX:+PrintVMInfoAtExit). > * Use gc+exit instead of gc+heap+exit as tags for the LogTarget during exit printing to reflect that it's not only the heap being printed. > * And the largest change in terms of LOC, separate Metaspace and GC Heap prints in the before/after GC invocation(s) printing. This is also recorded in a ring buffer, which is printed in vmError.cpp. > > Testing: > * GHA, Oracle's tier 1-4 > * Manuel inspection of printed content This pull request has now been integrated. Changeset: 85af573c Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/85af573cb6b5063c24f1efcbfb80bbace2883c7c Stats: 148 lines in 14 files changed: 60 ins; 31 del; 57 mod 8356848: Separate Metaspace and GC printing Reviewed-by: stefank, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/25214 From ayang at openjdk.org Tue May 27 07:16:41 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 May 2025 07:16:41 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v6] In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge branch 'master' into pgc-size-policy - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - pgc-size-policy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25000/files - new: https://git.openjdk.org/jdk/pull/25000/files/320e590b..872b18bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=04-05 Stats: 12205 lines in 260 files changed: 7807 ins; 3245 del; 1153 mod Patch: https://git.openjdk.org/jdk/pull/25000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25000/head:pull/25000 PR: https://git.openjdk.org/jdk/pull/25000 From ayang at openjdk.org Tue May 27 07:16:41 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 May 2025 07:16:41 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v6] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> <3l8x32wbOr2FZzLV3lYfSbch-6hlT1te0uZXUeQVAcQ=.3ff8422e-fc0a-492f-a6bc-0df6acbc9a66@github.com> <-k-DamMcH1pZ4vSAkWjhlM5PD777oPKlkrX0JK2SsSk=.de6913a9-3e2f-4cc5-bc53-e251c82ed78d@github.com> Message-ID: On Tue, 27 May 2025 01:02:22 GMT, Zhengyu Gu wrote: >> Revised a bit; the limitation of what we have on master is that it doesn't detect gc-overhead for young-gcs. If many young-gcs are run, gc-overhead checking should kick in as well. > > I wonder if you should try `expand_heap_and_allocate()` under `_is_heap_almost_full` situation as well. I am afraid that it might throw OOM before heap is fully expanded again, because compact GC does not expand heap. `expand_heap_and_allocate` is invoked in both situations; also, heap is expanded inside `summary_phase` in order to hold all live-objs inside old-gen. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2108377133 From rcastanedalo at openjdk.org Tue May 27 07:31:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 07:31:53 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v4] In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Mon, 26 May 2025 12:56:24 GMT, Jatin Bhateja wrote: >> Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. >> These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. >> ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. >> >> Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Axel's comments incorporated Test results look good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25351#pullrequestreview-2869935527 From rcastanedalo at openjdk.org Tue May 27 07:46:43 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 07:46:43 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v6] In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Include address mode test in 'legitimize_address' - Excluded IR checks for testLoadVolatile on PPC64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25066/files - new: https://git.openjdk.org/jdk/pull/25066/files/b92500a2..cf4f3b30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=04-05 Stats: 12 lines in 2 files changed: 3 ins; 2 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From rcastanedalo at openjdk.org Tue May 27 07:46:44 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 07:46:44 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 08:33:38 GMT, Axel Boldt-Christmas wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace control type with PhaseCFG::is_CFG test > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 141: > >> 139: Address legitimize_address(const Address &a, int size, Register scratch) { >> 140: if (a.getMode() == Address::base_plus_offset) { >> 141: if (legitimize_address_requires_lea(a, size)) { > > It is a little strange that `legitimize_address_requires_lea` is only the second condition and not > > return a.getMode() == Address::base_plus_offset && !Address::offset_ok_for_immed(a.offset(), exact_log2(size)); > > > And have the check in `legitimize_address` simply be `if (legitimize_address_requires_lea(a, size))` > > I guess we never end up calling `legitimize_address_requires_lea` with a literal address, where it would assert in `a.offset()`. But requiring the Address parameter of legitimize_address_requires_lea to be in a specific mode as a precondition seems weird to me. Thanks @xmas92, I fully agree, done (commit cf4f3b30). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2108444241 From rcastanedalo at openjdk.org Tue May 27 07:49:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 07:49:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <7A1m0eMpB5HCjbOZSGwjngpdkfTWMdTu_YjsqNif-Gk=.4d3d2d29-013e-4721-897e-d8e9f81f786d@github.com> On Fri, 16 May 2025 09:33:11 GMT, Martin Doerr wrote: > I guess it's not worth stepping over the memory barrier. Disabling this rule for PPC64 should be ok, too. Thanks for testing and reporting @TheRealMDoerr, I agree that it would be too much complexity for little return. I disabled the rule for PPC64 (commit fdf34f90). Please let me know if that works as expected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2911516324 From rcastanedalo at openjdk.org Tue May 27 07:49:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 07:49:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: <7A1m0eMpB5HCjbOZSGwjngpdkfTWMdTu_YjsqNif-Gk=.4d3d2d29-013e-4721-897e-d8e9f81f786d@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> <7A1m0eMpB5HCjbOZSGwjngpdkfTWMdTu_YjsqNif-Gk=.4d3d2d29-013e-4721-897e-d8e9f81f786d@github.com> Message-ID: On Tue, 27 May 2025 07:46:22 GMT, Roberto Casta?eda Lozano wrote: >> Thanks for implementing it and thanks for the ping. It basically works on PPC64, but one IR rule is failing: >> >> Failed IR Rules (1) of Methods (1) >> ---------------------------------- >> 1) Method "static java.lang.Object compiler.gcbarriers.TestImplicitNullChecks.testLoadVolatile(compiler.gcbarriers.TestImplicitNullChecks$OuterWithVolatileField)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={FINAL_CODE}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#NULL_CHECK#_", "1"}, applyIfPlatformOr={}, applyIfPlatform={"aarch64", "false"}, failOn={}, applyIfOr={"UseZGC", "true", "UseG1GC", "true"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "Final Code": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\d+(\s){2}(NullCheck.*)+(\s){2}===.*)" >> - Failed comparison: [found] 0 = 1 [given] >> - No nodes matched! >> >> >> This is probably because PPC64 uses a membar_volatile before volatile load, so the graph looks differently: >> >> 33 Prolog === [[ ]] [2380000000033] >> 9 MachProj === 10 [[ 8 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> R3 11 MachProj === 10 [[ 8 26 ]] #5 Oop:compiler/gcbarriers/TestImplicitNullChecks$OuterWithVolatileField * !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> 12 MachProj === 10 [[ 4 17 ]] #1/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> 13 MachProj === 10 [[ 4 21 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> R1 14 MachProj === 10 [[ 4 2 17 ]] #3 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> 15 MachProj === 10 [[ 4 17 ]] #4 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> 0 Con === 10 [[ ]] #top >> 8 zeroCheckP_reg_imm0 === 9 11 [[ 7 22 ]] P=0.000001, C=-1.000000 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) >> >> BB#002: >> 31 Region === 31 22 [[ 31 21 26 ]] >> 21 membar_volatile === 31 0 13 0 0 [[ 20 23 ]] !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) >> 20 MachProj === 21 [[ 19 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) >> 23 MachProj === 21 ... > >> I guess it's not worth stepping over the memory barrier. Disabling this rule for PPC64 should be ok, too. > > Thanks for testing and reporting @TheRealMDoerr, I agree that it would be too much complexity for little return. I disabled the rule for PPC64 (commit fdf34f90). Please let me know if that works as expected. > @robcasloz : Hi, Thanks for the ping! I performed tier1-3 tests on linux-riscv64 platform, result is good. The new test `test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java` also pass when running with G1 and ZGC using fastdebug build. @RealFYang Thanks for testing and reporting! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2911518019 From rcastanedalo at openjdk.org Tue May 27 08:18:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 08:18:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Sat, 17 May 2025 09:04:28 GMT, Andrew Haley wrote: >> OK. C2 does not currently support creating exception table entries with arbitrary offsets relative to the start address of the code emitted for a Mach node, so that support would have to be added. I prototyped this support [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks), see calls to `record_exception_pc_offset()`. I don't think it is, overall, simpler than the approach proposed in this PR - definitely not from a `PhaseOutput`/`C2_MacroAssembler` perspective. But if you still think it is worth exploring, I will create a new prototype with the `record_exception_pc_offset()` on top of this PR to make it easier to compare. > > I don't think you have to do that. I think you only have to mark both the lea and the memory access with an exception table entry. The segfault handler sees the two entries, deduces that this access is split into two instructions, and does the right thing. @theRealAph is it OK to proceed with this PR as it is, or do you still think it would be better to extend C2 with multiple implicit null exception table entries per Mach node? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2108544177 From tschatzl at openjdk.org Tue May 27 08:19:09 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 27 May 2025 08:19:09 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation Message-ID: Hi all, please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. Testing: gha, tier1 ------------- Commit messages: - * fix copyright - 8354428 Changes: https://git.openjdk.org/jdk/pull/25447/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25447&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354428 Stats: 16 lines in 4 files changed: 0 ins; 5 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25447.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25447/head:pull/25447 PR: https://git.openjdk.org/jdk/pull/25447 From mbaesken at openjdk.org Tue May 27 08:19:09 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 27 May 2025 08:19:09 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation In-Reply-To: References: Message-ID: <4ouxt3gZpoBua9ZyXHAN8hN3hSg7lHGJO0Ab5mlGJNs=.aec05ac2-c64d-45a5-b534-d45573ed9153@github.com> On Mon, 26 May 2025 10:15:01 GMT, Thomas Schatzl wrote: > Hi all, > > please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. > > The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. > > Testing: gha, tier1 Seems some copyright info in headers needs adjustment, see vmStructs_g1.hpp . ------------- PR Comment: https://git.openjdk.org/jdk/pull/25447#issuecomment-2911470743 From ayang at openjdk.org Tue May 27 08:20:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 May 2025 08:20:55 GMT Subject: RFR: 8357559: G1HeapRegionManager refactor rename functions related to the number of regions in different states In-Reply-To: <0TrV6SjwI9aJKP9ZXMLIFj-IAOn55_ntGEP4bqysCcQ=.59148004-c443-431c-b908-79cf1f6d55c4@github.com> References: <0TrV6SjwI9aJKP9ZXMLIFj-IAOn55_ntGEP4bqysCcQ=.59148004-c443-431c-b908-79cf1f6d55c4@github.com> Message-ID: On Fri, 23 May 2025 08:53:15 GMT, Ivan Walulya wrote: > Hi, > > Please review this refactoring of functions names in G1HeapRegionManager and removing duplicate methods. > > Testing: gha Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25408#pullrequestreview-2870129099 From tschatzl at openjdk.org Tue May 27 08:20:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 27 May 2025 08:20:55 GMT Subject: RFR: 8357559: G1HeapRegionManager refactor rename functions related to the number of regions in different states In-Reply-To: <0TrV6SjwI9aJKP9ZXMLIFj-IAOn55_ntGEP4bqysCcQ=.59148004-c443-431c-b908-79cf1f6d55c4@github.com> References: <0TrV6SjwI9aJKP9ZXMLIFj-IAOn55_ntGEP4bqysCcQ=.59148004-c443-431c-b908-79cf1f6d55c4@github.com> Message-ID: On Fri, 23 May 2025 08:53:15 GMT, Ivan Walulya wrote: > Hi, > > Please review this refactoring of functions names in G1HeapRegionManager and removing duplicate methods. > > Testing: gha Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25408#pullrequestreview-2870132697 From tschatzl at openjdk.org Tue May 27 08:22:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 27 May 2025 08:22:57 GMT Subject: RFR: 8357109: Parallel: Fix typo in YoungedGeneration In-Reply-To: References: Message-ID: On Sun, 25 May 2025 09:02:03 GMT, Guoxiong Li wrote: > Hi all, > > This trivial patch fixes two typos. Thanks for your review. > > Best Regards, > -- Guoxiong Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25436#pullrequestreview-2870141334 From tschatzl at openjdk.org Tue May 27 08:22:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 27 May 2025 08:22:58 GMT Subject: RFR: 8357109: Parallel: Fix typo in YoungedGeneration In-Reply-To: References: Message-ID: On Sun, 25 May 2025 15:15:28 GMT, SendaoYan wrote: >> Hi all, >> >> This trivial patch fixes two typos. Thanks for your review. >> >> Best Regards, >> -- Guoxiong > > src/hotspot/share/gc/shared/gc_globals.hpp line 365: > >> 363: \ >> 364: product(uint, YoungGenerationSizeSupplement, 80, \ >> 365: "Supplement to YoungedGenerationSizeIncrement used at startup") \ > > Should we update the copyright year to 2025 It's up to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25436#discussion_r2108545104 From ayang at openjdk.org Tue May 27 08:24:51 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 May 2025 08:24:51 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation In-Reply-To: References: Message-ID: On Mon, 26 May 2025 10:15:01 GMT, Thomas Schatzl wrote: > Hi all, > > please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. > > The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. > > Testing: gha, tier1 src/hotspot/share/gc/g1/vmStructs_g1.hpp line 51: > 49: nonstatic_field(G1HeapRegionTable, _base, address) \ > 50: nonstatic_field(G1HeapRegionTable, _length, size_t) \ > 51: nonstatic_field(G1HeapRegionTable, _biased_base, size_t) \ Why `size_t` for `uintptr_t _biased_base;`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25447#discussion_r2108557618 From jbhateja at openjdk.org Tue May 27 08:31:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 May 2025 08:31:57 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Mon, 26 May 2025 14:05:42 GMT, Roberto Casta?eda Lozano wrote: >>> > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >>> > >>> > >>> > Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. >>> >>> Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java >> >> OK, thanks for checking Jatin! >> >> Have you also checked whether, at least in some of the cases, some of the APX EGPRs are live across the runtime call (i.e. are defined before the call and used after the call), and whether the called runtime routine typically clobbers these registers? Knowing that this case is exercised in the test runs would be good to be confident about the correctness of the patch. > >> Hi @robcasloz, The patch uses new push2/pop2 instructions, which reduces dynamic instruction count needed to save and restore all the caller-saved registers. New instruction sequence based on push2/pop2 not only saves EGPRs but also existing GPRs with shorter JIT sequence. We verified our fix using the following standalone gtest with the Intel Software Development Emulator. >> >> [test_ZRuntimeCallSpill_cpp.txt](https://github.com/user-attachments/files/20440415/test_ZRuntimeCallSpill_cpp.txt) >> >> Given that gtests is a build-time validation and the JVM itself is built with with minimum feature set, hence am hesitant to add this along with the patch. BTW, ZRuntimeCallSpill is called as part of the slow path barrier for native methods, which can modify EGPRs. >> >> Let me know if you think it's good to land in. > > Thanks for the details! Let me run some internal testing, since the PR affects spilling of non-extended registers too (due to special handling of `_result == rax`). Will come back with the results within a day or two. Thanks @robcasloz , @xmas92 and @sviswa7 for your reviews and approvals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2911639907 From jbhateja at openjdk.org Tue May 27 08:31:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 May 2025 08:31:57 GMT Subject: Integrated: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: <80JRfWJnkss2B0sKMAPjyA9YyH1UHeRNhTKX3dqNpYo=.1b2ce9e4-20be-4fd9-86a0-a947e4a127bf@github.com> On Wed, 21 May 2025 12:33:26 GMT, Jatin Bhateja wrote: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 5924c2d6 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/5924c2d6c7f636b428bc7f43abe2115af4532358 Stats: 78 lines in 1 file changed: 55 ins; 0 del; 23 mod 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill Reviewed-by: rcastanedalo, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/25351 From kbarrett at openjdk.org Tue May 27 09:03:55 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 27 May 2025 09:03:55 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation In-Reply-To: References: Message-ID: On Mon, 26 May 2025 10:15:01 GMT, Thomas Schatzl wrote: > Hi all, > > please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. > > The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. > > Testing: gha, tier1 src/hotspot/share/gc/g1/g1BiasedArray.hpp line 107: > 105: T* base() const { return (T*)G1BiasedMappedArrayBase::_base; } > 106: > 107: T* biased_base_at(idx_t index) const { return (T*)(G1BiasedMappedArrayBase::_biased_base + index * sizeof(T)); } [pre-existing] Here and elsewhere, I think `this->_biased_base` is the more usual idiom for accessing a member of a base class from a class template. The reason just `_biased_base` doesn't work has to do with the name lookup rules in templates. Your choice on this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25447#discussion_r2108637668 From tschatzl at openjdk.org Tue May 27 09:10:30 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 27 May 2025 09:10:30 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. > > The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. > > Testing: gha, tier1 Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * change size_t to uintptr_t in vmstructs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25447/files - new: https://git.openjdk.org/jdk/pull/25447/files/a9d80f5b..a87b3ed8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25447&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25447&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25447.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25447/head:pull/25447 PR: https://git.openjdk.org/jdk/pull/25447 From tschatzl at openjdk.org Tue May 27 09:10:31 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 27 May 2025 09:10:31 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation [v2] In-Reply-To: References: Message-ID: <6BDk9s7kXCIxqF1gXo5Ig0JnB-M9qPO5P_HQP9eoRi0=.0febf02d-4393-4694-b9b4-e46da0543cb7@github.com> On Tue, 27 May 2025 08:21:58 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * change size_t to uintptr_t in vmstructs > > src/hotspot/share/gc/g1/vmStructs_g1.hpp line 51: > >> 49: nonstatic_field(G1HeapRegionTable, _base, address) \ >> 50: nonstatic_field(G1HeapRegionTable, _length, size_t) \ >> 51: nonstatic_field(G1HeapRegionTable, _biased_base, size_t) \ > > Why `size_t` for `uintptr_t _biased_base;`? Fixed. Apparently `uintptr_t` is already in the SA database... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25447#discussion_r2108655876 From mdoerr at openjdk.org Tue May 27 09:12:05 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 27 May 2025 09:12:05 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v6] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <1yOzUEBYJFMe75r2nTQYJIyk4bEia_Tx4rfT3RAG6OU=.c8cf0470-05ae-4484-b533-ef6d37a85b07@github.com> On Tue, 27 May 2025 07:46:43 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Include address mode test in 'legitimize_address' > - Excluded IR checks for testLoadVolatile on PPC64 > > I guess it's not worth stepping over the memory barrier. Disabling this rule for PPC64 should be ok, too. > > Thanks for testing and reporting @TheRealMDoerr, I agree that it would be too much complexity for little return. I disabled the rule for PPC64 (commit [fdf34f9](https://github.com/openjdk/jdk/commit/fdf34f905fd1ee4dde27374d66b1b7fb251e1622)). Please let me know if that works as expected. Thanks! TestImplicitNullChecks has passed on PPC64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2911772562 From tschatzl at openjdk.org Tue May 27 09:25:38 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 27 May 2025 09:25:38 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation [v3] In-Reply-To: References: Message-ID: > Hi all, > > please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. > > The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. > > Testing: gha, tier1 Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * kbarrett review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25447/files - new: https://git.openjdk.org/jdk/pull/25447/files/a87b3ed8..ef4b6816 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25447&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25447&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25447.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25447/head:pull/25447 PR: https://git.openjdk.org/jdk/pull/25447 From tschatzl at openjdk.org Tue May 27 09:25:38 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 27 May 2025 09:25:38 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 08:58:52 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * kbarrett review > > src/hotspot/share/gc/g1/g1BiasedArray.hpp line 107: > >> 105: T* base() const { return (T*)G1BiasedMappedArrayBase::_base; } >> 106: >> 107: T* biased_base_at(idx_t index) const { return (T*)(G1BiasedMappedArrayBase::_biased_base + index * sizeof(T)); } > > [pre-existing] Here and elsewhere, I think `this->_biased_base` is the more usual idiom for accessing > a member of a base class from a class template. The reason just `_biased_base` doesn't work has to > do with the name lookup rules in templates. Your choice on this. Used `this->`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25447#discussion_r2108690676 From stefank at openjdk.org Tue May 27 09:25:52 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 27 May 2025 09:25:52 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v5] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 13:51:09 GMT, Axel Boldt-Christmas wrote: >>
Background (expandable section) >> ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. >> >> The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. >> >> Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. >> >> For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. >> >> But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. >> >> A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. >> >> For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. >> >> However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. >> >> In ZGC we call our memory regions pages or zpages. >>
>> >> ### Proposal >> Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. >> >> And ad... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Missing -XX:+UnlockDiagnosticVMOptions Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25381#pullrequestreview-2870342792 From aturbanov at openjdk.org Tue May 27 10:05:51 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Tue, 27 May 2025 10:05:51 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v5] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 13:51:09 GMT, Axel Boldt-Christmas wrote: >>
Background (expandable section) >> ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. >> >> The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. >> >> Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. >> >> For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. >> >> But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. >> >> A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. >> >> For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. >> >> However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. >> >> In ZGC we call our memory regions pages or zpages. >>
>> >> ### Proposal >> Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. >> >> And ad... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Missing -XX:+UnlockDiagnosticVMOptions test/hotspot/jtreg/gc/z/TestZMediumPageSizes.java line 111: > 109: } > 110: > 111: private static void runTestDefault() throws Exception { Suggestion: private static void runTestDefault() throws Exception { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2108775235 From shade at openjdk.org Tue May 27 10:36:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 27 May 2025 10:36:35 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v21] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: - Merge branch 'master' into JDK-8231269-compile-task-weaks - Switch to mutable - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - Spin lock induces false sharing - Merge branch 'master' into JDK-8231269-compile-task-weaks - Merge branch 'master' into JDK-8231269-compile-task-weaks - Rename CompilerTask::is_unloaded back to avoid losing comment context - Simplify select_for_compilation - Merge branch 'master' into JDK-8231269-compile-task-weaks - ... and 26 more: https://git.openjdk.org/jdk/compare/7cb6e5eb...d5e482ac ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=20 Stats: 429 lines in 11 files changed: 389 ins; 22 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From iwalulya at openjdk.org Tue May 27 10:45:56 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 27 May 2025 10:45:56 GMT Subject: RFR: 8357559: G1HeapRegionManager refactor rename functions related to the number of regions in different states In-Reply-To: <0TrV6SjwI9aJKP9ZXMLIFj-IAOn55_ntGEP4bqysCcQ=.59148004-c443-431c-b908-79cf1f6d55c4@github.com> References: <0TrV6SjwI9aJKP9ZXMLIFj-IAOn55_ntGEP4bqysCcQ=.59148004-c443-431c-b908-79cf1f6d55c4@github.com> Message-ID: On Fri, 23 May 2025 08:53:15 GMT, Ivan Walulya wrote: > Hi, > > Please review this refactoring of functions names in G1HeapRegionManager and removing duplicate methods. > > Testing: gha Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25408#issuecomment-2912045993 From iwalulya at openjdk.org Tue May 27 10:45:57 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 27 May 2025 10:45:57 GMT Subject: Integrated: 8357559: G1HeapRegionManager refactor rename functions related to the number of regions in different states In-Reply-To: <0TrV6SjwI9aJKP9ZXMLIFj-IAOn55_ntGEP4bqysCcQ=.59148004-c443-431c-b908-79cf1f6d55c4@github.com> References: <0TrV6SjwI9aJKP9ZXMLIFj-IAOn55_ntGEP4bqysCcQ=.59148004-c443-431c-b908-79cf1f6d55c4@github.com> Message-ID: On Fri, 23 May 2025 08:53:15 GMT, Ivan Walulya wrote: > Hi, > > Please review this refactoring of functions names in G1HeapRegionManager and removing duplicate methods. > > Testing: gha This pull request has now been integrated. Changeset: 67d4ed17 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/67d4ed173af325a5a28ee17765d491fd0cfe38c2 Stats: 105 lines in 15 files changed: 5 ins; 8 del; 92 mod 8357559: G1HeapRegionManager refactor rename functions related to the number of regions in different states Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/25408 From tschatzl at openjdk.org Tue May 27 10:47:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 27 May 2025 10:47:51 GMT Subject: RFR: 8357801: Parallel: Remove deprecated PSVirtualSpace methods In-Reply-To: References: Message-ID: On Mon, 26 May 2025 17:33:59 GMT, Albert Mingkun Yang wrote: > Simple removing some deprecated methods by changing to pointer-type for some fields. > > Test: tier1-3 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25459#pullrequestreview-2870576754 From jsikstro at openjdk.org Tue May 27 10:52:36 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 27 May 2025 10:52:36 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v4] In-Reply-To: References: Message-ID: <4WpgyYBYPm_cPnaY-aAubGqj-pgQzhscrqV96QEYOf0=.1724dcb7-0ee0-45f6-a220-7909661c0111@github.com> > Hello, > > This RFE improves utility for converting to/from, iterating over and defining structures that are indexed using the `ZPageAge` type. > > Converting to/from ZPageAge and its underlying type (uint8_t, often just uint) is currently done via using static_cast. This works fine because sane values are converted in all use cases. However, to make conversion safer (and also more readable), I propose we add a `to_zpageage` and a corresponding `untype` that checks that the conversion is valid. Such conversion methods should be used instead of calling `static_cast`. > > We currently define a value called `ZPageAgeMax`, which is defined as `static_cast(ZPageAge::old)`. The majority of places that use this value actualy use `ZPageAgeMax + 1`, which is equivalent to the number of ages. Instead, I propose we define and use a value that represents the number of possible ages, called `ZPageAgeCount`. > > Lastly, to make iterating over ages more accessible, I propose we create an intreface of enum iterators of ZPageAge. This will also create a foundation for generating values that require a ZPageAge in the future. Since the end of the enum iterators are exclusive, I've opted to use the following value as end for the iterators: > > constexpr ZPageAge ZPageAgeLastPlusOne = static_cast(ZPageAgeCount); > > > I see us using either this or a sentinel/dummy value at the end of the enum class, but I prefer having a value similar to `ZPageAgeLastPlusOne` over a dummy value. > > Testing: > * Oracle's tier 1-4 > * GHA Joel Sikstr?m has updated the pull request incrementally with two additional commits since the last revision: - Remove redundant access specifier - Include order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25251/files - new: https://git.openjdk.org/jdk/pull/25251/files/25ddc320..2ea85308 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25251&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25251&range=02-03 Stats: 4 lines in 2 files changed: 1 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25251.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25251/head:pull/25251 PR: https://git.openjdk.org/jdk/pull/25251 From ayang at openjdk.org Tue May 27 11:13:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 May 2025 11:13:52 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation [v3] In-Reply-To: References: Message-ID: <4RMJhPb8axd4r43MVo2tBBj9NoydTCZ_P7zFuySWzpk=.79b8dfce-0333-4226-8c08-cac22abb4c14@github.com> On Tue, 27 May 2025 09:25:38 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. >> >> The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. >> >> Testing: gha, tier1 > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * kbarrett review Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25447#pullrequestreview-2870645485 From aboldtch at openjdk.org Tue May 27 11:28:52 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 27 May 2025 11:28:52 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v6] In-Reply-To: References: Message-ID: >
Background (expandable section) > ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. > > The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. > > Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. > > For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. > > But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. > > A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. > > For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. > > However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. > > In ZGC we call our memory regions pages or zpages. >
> > ### Proposal > Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. > > And adds a "fast" medium page allocation path in the p... Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/gc/z/TestZMediumPageSizes.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25381/files - new: https://git.openjdk.org/jdk/pull/25381/files/f6069efb..8b8d5c89 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25381/head:pull/25381 PR: https://git.openjdk.org/jdk/pull/25381 From iwalulya at openjdk.org Tue May 27 11:29:57 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 27 May 2025 11:29:57 GMT Subject: RFR: 8357801: Parallel: Remove deprecated PSVirtualSpace methods In-Reply-To: References: Message-ID: On Mon, 26 May 2025 17:33:59 GMT, Albert Mingkun Yang wrote: > Simple removing some deprecated methods by changing to pointer-type for some fields. > > Test: tier1-3 Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25459#pullrequestreview-2870689721 From jsikstro at openjdk.org Tue May 27 11:45:56 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 27 May 2025 11:45:56 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic [v4] In-Reply-To: <84mT0fVEmePl7CsLeYrl_Mzc4Xln3-Vg7zu7YBk6GPo=.188c4be6-fc2b-46ac-b80a-2a60a7e42318@github.com> References: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> <84mT0fVEmePl7CsLeYrl_Mzc4Xln3-Vg7zu7YBk6GPo=.188c4be6-fc2b-46ac-b80a-2a60a7e42318@github.com> Message-ID: On Tue, 27 May 2025 06:08:10 GMT, Axel Boldt-Christmas wrote: >> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) required changing the way ZGC handle memory uncommitting (returning physical memory to the OS). Previously ZGC tracked how recently used memory was on a ZPage level. [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) did away with the ZPage abstraction for unused memory. But because of this ZGC does not have a convenient way of tracking the usage of a specific memory range. Instead [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) opted to keep a watermark in the cache unused mapped memory, to keep track of the amount of memory that was not used within the last ZUncommitDelay, and use this when deciding how much to uncommit. >> >> Because this measurement is not as granular as previously, and because uncommitting memory is something we want to do conservatively, as a response to low memory utilization, [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was written with the intent to spread out the uncommitting over some time interval. >> >> The actual implementation in [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) has a few issues which this RFE tries to address: >> * Missing wait, the uncommitting is not actually spread out, but happens all at once. >> * Reactivity, if the process starts using memory that was below the previous watermark, uncommitting should stop. >> * Structure, the current implementation has a lot of different dependencies and has state spread out over multiple classes. Refactor to keep the logic contained to the ZUncommitter, and provide better named facilitating functions on the ZPartition and ZMappedCache. And make the lifecycle of ZUncommitter more explicit. >> * Events, overhaul the JFR uncommit events to be sent (and track the time for) a chunk of uncommits without any waits. >> >> An alternative discussed has been to do uncommitting based on GC triggers rather than a periodically. So rather than using ZUncommitDelay, we could have our proactive GCs actually trigger and track uncommitting. This might be a future RFE, but it was not attempted here as it would change user facing APIs. [JDK-8329758](https://bugs.openjdk.org/browse/JDK-8329758) will more than likely overhaul the uncommit triggers as well, and the whole concept of ZUncommitDelay and having to tune how to uncommit will go away. > > Axel Boldt-Christmas has updated the pull request incrementally with six additional commits since the last revision: > > - Avoid excessive logging if ZUncommitDelay == 0 > - Move uncommit logic from MappedCache to Uncommitter + cleanup and comment > - Better cycle activate / deactivate scoping > - Remove newline > - Cleanup time logging > - Comment describing `uncommitted == 0` I really like how the Mapped Cache is now decoupled from the uncommitter, except for the notion of the watermark, which is perfectly fine IMO. Some more thoughts: src/hotspot/share/gc/z/zUncommitter.cpp line 212: > 210: // We are stopping > 211: return false; > 212: } I'm unsure if we need this check. From what I can see, `activate_uncommit_cycle` is called right after `while(wait(...)) {` in `ZUncommitter::run_thread`, where the `should_continue()` check is essentially done in `wait(...)` by returning `!_stop`. src/hotspot/share/gc/z/zUncommitter.cpp line 247: > 245: _uncommitted = 0; > 246: > 247: // Reset cache for next uncommit cycle Suggestion: // Reset watermark for next uncommit cycle ------------- PR Review: https://git.openjdk.org/jdk/pull/25198#pullrequestreview-2870706463 PR Review Comment: https://git.openjdk.org/jdk/pull/25198#discussion_r2108958713 PR Review Comment: https://git.openjdk.org/jdk/pull/25198#discussion_r2108945795 From stefank at openjdk.org Tue May 27 12:08:54 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 27 May 2025 12:08:54 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v4] In-Reply-To: <4WpgyYBYPm_cPnaY-aAubGqj-pgQzhscrqV96QEYOf0=.1724dcb7-0ee0-45f6-a220-7909661c0111@github.com> References: <4WpgyYBYPm_cPnaY-aAubGqj-pgQzhscrqV96QEYOf0=.1724dcb7-0ee0-45f6-a220-7909661c0111@github.com> Message-ID: On Tue, 27 May 2025 10:52:36 GMT, Joel Sikstr?m wrote: >> Hello, >> >> This RFE improves utility for converting to/from, iterating over and defining structures that are indexed using the `ZPageAge` type. >> >> Converting to/from ZPageAge and its underlying type (uint8_t, often just uint) is currently done via using static_cast. This works fine because sane values are converted in all use cases. However, to make conversion safer (and also more readable), I propose we add a `to_zpageage` and a corresponding `untype` that checks that the conversion is valid. Such conversion methods should be used instead of calling `static_cast`. >> >> We currently define a value called `ZPageAgeMax`, which is defined as `static_cast(ZPageAge::old)`. The majority of places that use this value actualy use `ZPageAgeMax + 1`, which is equivalent to the number of ages. Instead, I propose we define and use a value that represents the number of possible ages, called `ZPageAgeCount`. >> >> Lastly, to make iterating over ages more accessible, I propose we create an intreface of enum iterators of ZPageAge. This will also create a foundation for generating values that require a ZPageAge in the future. Since the end of the enum iterators are exclusive, I've opted to use the following value as end for the iterators: >> >> constexpr ZPageAge ZPageAgeLastPlusOne = static_cast(ZPageAgeCount); >> >> >> I see us using either this or a sentinel/dummy value at the end of the enum class, but I prefer having a value similar to `ZPageAgeLastPlusOne` over a dummy value. >> >> Testing: >> * Oracle's tier 1-4 >> * GHA > > Joel Sikstr?m has updated the pull request incrementally with two additional commits since the last revision: > > - Remove redundant access specifier > - Include order Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25251#pullrequestreview-2870799505 From aboldtch at openjdk.org Tue May 27 12:33:36 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 27 May 2025 12:33:36 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic [v5] In-Reply-To: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> References: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> Message-ID: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) required changing the way ZGC handle memory uncommitting (returning physical memory to the OS). Previously ZGC tracked how recently used memory was on a ZPage level. [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) did away with the ZPage abstraction for unused memory. But because of this ZGC does not have a convenient way of tracking the usage of a specific memory range. Instead [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) opted to keep a watermark in the cache unused mapped memory, to keep track of the amount of memory that was not used within the last ZUncommitDelay, and use this when deciding how much to uncommit. > > Because this measurement is not as granular as previously, and because uncommitting memory is something we want to do conservatively, as a response to low memory utilization, [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was written with the intent to spread out the uncommitting over some time interval. > > The actual implementation in [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) has a few issues which this RFE tries to address: > * Missing wait, the uncommitting is not actually spread out, but happens all at once. > * Reactivity, if the process starts using memory that was below the previous watermark, uncommitting should stop. > * Structure, the current implementation has a lot of different dependencies and has state spread out over multiple classes. Refactor to keep the logic contained to the ZUncommitter, and provide better named facilitating functions on the ZPartition and ZMappedCache. And make the lifecycle of ZUncommitter more explicit. > * Events, overhaul the JFR uncommit events to be sent (and track the time for) a chunk of uncommits without any waits. > > An alternative discussed has been to do uncommitting based on GC triggers rather than a periodically. So rather than using ZUncommitDelay, we could have our proactive GCs actually trigger and track uncommitting. This might be a future RFE, but it was not attempted here as it would change user facing APIs. [JDK-8329758](https://bugs.openjdk.org/browse/JDK-8329758) will more than likely overhaul the uncommit triggers as well, and the whole concept of ZUncommitDelay and having to tune how to uncommit will go away. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/gc/z/zUncommitter.cpp Co-authored-by: Joel Sikstr?m ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25198/files - new: https://git.openjdk.org/jdk/pull/25198/files/35856089..9e20eff7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25198/head:pull/25198 PR: https://git.openjdk.org/jdk/pull/25198 From ayang at openjdk.org Tue May 27 14:35:56 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 May 2025 14:35:56 GMT Subject: RFR: 8357801: Parallel: Remove deprecated PSVirtualSpace methods In-Reply-To: References: Message-ID: On Mon, 26 May 2025 17:33:59 GMT, Albert Mingkun Yang wrote: > Simple removing some deprecated methods by changing to pointer-type for some fields. > > Test: tier1-3 Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25459#issuecomment-2912751466 From ayang at openjdk.org Tue May 27 14:35:57 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 May 2025 14:35:57 GMT Subject: Integrated: 8357801: Parallel: Remove deprecated PSVirtualSpace methods In-Reply-To: References: Message-ID: On Mon, 26 May 2025 17:33:59 GMT, Albert Mingkun Yang wrote: > Simple removing some deprecated methods by changing to pointer-type for some fields. > > Test: tier1-3 This pull request has now been integrated. Changeset: cdff7b96 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/cdff7b963c0600e9a6fe9cd8561d7d04b30f190c Stats: 42 lines in 6 files changed: 2 ins; 25 del; 15 mod 8357801: Parallel: Remove deprecated PSVirtualSpace methods Reviewed-by: tschatzl, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/25459 From ayang at openjdk.org Tue May 27 14:39:28 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 May 2025 14:39:28 GMT Subject: RFR: 8357854: Parallel: Inline args of PSOldGen::initialize_performance_counters Message-ID: Trivial inlining some args in leaf-callee to avoid carrying them in the call-chain. ------------- Commit messages: - pgc-remove-unused-arg Changes: https://git.openjdk.org/jdk/pull/25466/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25466&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357854 Stats: 17 lines in 3 files changed: 0 ins; 3 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/25466.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25466/head:pull/25466 PR: https://git.openjdk.org/jdk/pull/25466 From ayang at openjdk.org Tue May 27 14:45:34 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 May 2025 14:45:34 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v7] In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - merge - Merge branch 'master' into pgc-size-policy - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - pgc-size-policy ------------- Changes: https://git.openjdk.org/jdk/pull/25000/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=06 Stats: 4367 lines in 31 files changed: 522 ins; 3446 del; 399 mod Patch: https://git.openjdk.org/jdk/pull/25000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25000/head:pull/25000 PR: https://git.openjdk.org/jdk/pull/25000 From ayang at openjdk.org Tue May 27 14:58:19 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 May 2025 14:58:19 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v8] In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: <3ql7-GvvAqFRsM98wt_N9NaVvQHsEeWYu-4_ZG52X_Y=.ff5af1dc-7dfa-48a4-86be-3ea4a287a5ba@github.com> > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: merge-fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25000/files - new: https://git.openjdk.org/jdk/pull/25000/files/09fdd8c6..702fadc5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=06-07 Stats: 10 lines in 1 file changed: 0 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25000/head:pull/25000 PR: https://git.openjdk.org/jdk/pull/25000 From tschatzl at openjdk.org Tue May 27 15:55:31 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 27 May 2025 15:55:31 GMT Subject: RFR: 8357307: VM GC operations should have a public gc_succeeded() Message-ID: <2g2S2zmFNw3CLhCa3IPJ76hG8dHVpmbk9gSzkvTmJNs=.1e720c79-26ce-405d-a330-cb1087cd1bcf@github.com> Hi all, please review this cleanup that changes `VM_GC_Operation` to use `gc_succeeded` instead of `prologue_succeeded()`to indicate that a GC has been executed (once a GC is started, it will always finish, so the only reason that a VM op does not get executed is that we decide in the prologue that there has already been a GC). After recent changes the change is/was mostly a renaming of the `prologue_succeeded`method - there is only one case for G1 where additional checks can cause no execution of the GC (e.g. because we started a low-priority concurrent mark and we are already currently marking. No point for doing a GC in that case). Testing: tier1-4, gha Thanks, Thomas ------------- Commit messages: - 8357307 Changes: https://git.openjdk.org/jdk/pull/25469/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25469&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357307 Stats: 7 lines in 5 files changed: 0 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25469.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25469/head:pull/25469 PR: https://git.openjdk.org/jdk/pull/25469 From jsikstro at openjdk.org Tue May 27 16:13:52 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 27 May 2025 16:13:52 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v6] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 11:28:52 GMT, Axel Boldt-Christmas wrote: >>
Background (expandable section) >> ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. >> >> The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. >> >> Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. >> >> For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. >> >> But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. >> >> A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. >> >> For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. >> >> However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. >> >> In ZGC we call our memory regions pages or zpages. >>
>> >> ### Proposal >> Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. >> >> And ad... > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/gc/z/TestZMediumPageSizes.java > > Co-authored-by: Andrey Turbanov I have a comment on the `_size` field, other than that I think this looks really good. I like the new `pre_filter_page`. src/hotspot/share/gc/z/zPageAllocator.cpp line 459: > 457: > 458: return _size; > 459: } I see that we don't use `_size` anywhere except for here, even so, I think we should rename `_size` to be something more descriptive, in case that changes in the future. Maybe somehthing like `_requested_size`? I don't have strong opinions on what the new name should be, just something to highlight that `_size` isn't always the size of the allocated page. ------------- PR Review: https://git.openjdk.org/jdk/pull/25381#pullrequestreview-2870796960 PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2109574036 From wkemper at openjdk.org Tue May 27 21:42:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 27 May 2025 21:42:56 GMT Subject: Integrated: 8354078: Implement JEP 521: Generational Shenandoah In-Reply-To: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> References: <2Ra5uic78TWWO2JVDnECH5Ve7kY0AeKtreLtqFQ2-4A=.0cae0e11-54c4-49eb-a7ef-cb9964ffbe03@github.com> Message-ID: On Fri, 16 May 2025 17:30:11 GMT, William Kemper wrote: > Testing: > > % ./build/linux-x86_64-server-fastdebug/jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational --version > openjdk 25 2025-09-16 > OpenJDK Runtime Environment (fastdebug build 25-make-genshen-non-experimental) > OpenJDK 64-Bit Server VM (fastdebug build 25-make-genshen-non-experimental, mixed mode) This pull request has now been integrated. Changeset: 2e8b195a Author: William Kemper URL: https://git.openjdk.org/jdk/commit/2e8b195a96e3b2a4ca27c64a923adc4334073128 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8354078: Implement JEP 521: Generational Shenandoah Reviewed-by: ysr ------------- PR: https://git.openjdk.org/jdk/pull/25270 From kbarrett at openjdk.org Wed May 28 01:11:54 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 28 May 2025 01:11:54 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 09:25:38 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. >> >> The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. >> >> Testing: gha, tier1 > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * kbarrett review Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25447#pullrequestreview-2873210429 From duke at openjdk.org Wed May 28 04:22:56 2025 From: duke at openjdk.org (duke) Date: Wed, 28 May 2025 04:22:56 GMT Subject: Withdrawn: 8351892: GenShen: Remove enforcement of generation sizes In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 23:50:41 GMT, William Kemper wrote: > * The option to configure minimum and maximum sizes for the young generation have been combined into `ShenandoahInitYoungPercentage`. > * The remaining functionality in `shGenerationSizer` wasn't enough to warrant being its own class, so the functionality was rolled into `shGenerationalHeap`. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24268 From aboldtch at openjdk.org Wed May 28 05:33:43 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 28 May 2025 05:33:43 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v7] In-Reply-To: References: Message-ID: >
Background (expandable section) > ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. > > The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. > > Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. > > For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. > > But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. > > A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. > > For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. > > However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. > > In ZGC we call our memory regions pages or zpages. >
> > ### Proposal > Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. > > And adds a "fast" medium page allocation path in the p... Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8357449 - Rename _size to _requested_size - Update test/hotspot/jtreg/gc/z/TestZMediumPageSizes.java Co-authored-by: Andrey Turbanov - Missing -XX:+UnlockDiagnosticVMOptions - Revert "NumPartitions is reserved by Shenandoah" This reverts commit f7619fd700ec6498948e5e84e8051be145683940. - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8357449 - Retype ZPageSizeSmallShift to int - Rename ZPageSizeMediumShift -> ZPageSizeMediumMaxShift - Update is_disabled comment - Apply suggestions from code review Co-authored-by: Stefan Karlsson - ... and 5 more: https://git.openjdk.org/jdk/compare/44e27408...9e1cc44a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25381/files - new: https://git.openjdk.org/jdk/pull/25381/files/8b8d5c89..9e1cc44a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25381&range=05-06 Stats: 13934 lines in 334 files changed: 8459 ins; 3375 del; 2100 mod Patch: https://git.openjdk.org/jdk/pull/25381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25381/head:pull/25381 PR: https://git.openjdk.org/jdk/pull/25381 From aboldtch at openjdk.org Wed May 28 05:44:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 28 May 2025 05:44:56 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v6] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 15:51:33 GMT, Joel Sikstr?m wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/gc/z/TestZMediumPageSizes.java >> >> Co-authored-by: Andrey Turbanov > > src/hotspot/share/gc/z/zPageAllocator.cpp line 459: > >> 457: >> 458: return _size; >> 459: } > > I see that we don't use `_size` anywhere except for here, even so, I think we should rename `_size` to be something more descriptive, in case that changes in the future. Maybe somehthing like `_requested_size`? > > I don't have strong opinions on what the new name should be, just something to highlight that `_size` isn't always the size of the allocated page. Done. I also experimented with adding two separate accessors / fields. One `requested_size` and one `claimed_size`. Where you set the claimed size after capacity has been claimed. And then change all `size()` calls to the appropriate information they are looking for. Some are the claimed_size and some are the requested_size. However was not really happy with having to sprinkle `set_claimed_size` in a few places. I think we can iterate on the ZPageAllocation in the future. I think having some type of `optional> _allocation` may be a more appropriate abstraction. And treat `ZMultiPartitionAllocation` more akin to a `ZMultiPartitionAllocationRequest`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25381#discussion_r2110969832 From aboldtch at openjdk.org Wed May 28 05:52:43 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 28 May 2025 05:52:43 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic [v6] In-Reply-To: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> References: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> Message-ID: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) required changing the way ZGC handle memory uncommitting (returning physical memory to the OS). Previously ZGC tracked how recently used memory was on a ZPage level. [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) did away with the ZPage abstraction for unused memory. But because of this ZGC does not have a convenient way of tracking the usage of a specific memory range. Instead [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) opted to keep a watermark in the cache unused mapped memory, to keep track of the amount of memory that was not used within the last ZUncommitDelay, and use this when deciding how much to uncommit. > > Because this measurement is not as granular as previously, and because uncommitting memory is something we want to do conservatively, as a response to low memory utilization, [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was written with the intent to spread out the uncommitting over some time interval. > > The actual implementation in [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) has a few issues which this RFE tries to address: > * Missing wait, the uncommitting is not actually spread out, but happens all at once. > * Reactivity, if the process starts using memory that was below the previous watermark, uncommitting should stop. > * Structure, the current implementation has a lot of different dependencies and has state spread out over multiple classes. Refactor to keep the logic contained to the ZUncommitter, and provide better named facilitating functions on the ZPartition and ZMappedCache. And make the lifecycle of ZUncommitter more explicit. > * Events, overhaul the JFR uncommit events to be sent (and track the time for) a chunk of uncommits without any waits. > > An alternative discussed has been to do uncommitting based on GC triggers rather than a periodically. So rather than using ZUncommitDelay, we could have our proactive GCs actually trigger and track uncommitting. This might be a future RFE, but it was not attempted here as it would change user facing APIs. [JDK-8329758](https://bugs.openjdk.org/browse/JDK-8329758) will more than likely overhaul the uncommit triggers as well, and the whole concept of ZUncommitDelay and having to tune how to uncommit will go away. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Reduce the amount of should_continue checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25198/files - new: https://git.openjdk.org/jdk/pull/25198/files/9e20eff7..59aaac66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=04-05 Stats: 11 lines in 1 file changed: 0 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25198/head:pull/25198 PR: https://git.openjdk.org/jdk/pull/25198 From mbaesken at openjdk.org Wed May 28 05:56:52 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 28 May 2025 05:56:52 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 09:25:38 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. >> >> The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. >> >> Testing: gha, tier1 > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * kbarrett review Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25447#pullrequestreview-2873685594 From jsikstro at openjdk.org Wed May 28 06:54:54 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 28 May 2025 06:54:54 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v7] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 05:33:43 GMT, Axel Boldt-Christmas wrote: >>
Background (expandable section) >> ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. >> >> The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. >> >> Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. >> >> For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. >> >> But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. >> >> A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. >> >> For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. >> >> However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. >> >> In ZGC we call our memory regions pages or zpages. >>
>> >> ### Proposal >> Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. >> >> And ad... > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8357449 > - Rename _size to _requested_size > - Update test/hotspot/jtreg/gc/z/TestZMediumPageSizes.java > > Co-authored-by: Andrey Turbanov > - Missing -XX:+UnlockDiagnosticVMOptions > - Revert "NumPartitions is reserved by Shenandoah" > > This reverts commit f7619fd700ec6498948e5e84e8051be145683940. > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8357449 > - Retype ZPageSizeSmallShift to int > - Rename ZPageSizeMediumShift -> ZPageSizeMediumMaxShift > - Update is_disabled comment > - Apply suggestions from code review > > Co-authored-by: Stefan Karlsson > - ... and 5 more: https://git.openjdk.org/jdk/compare/164265bc...9e1cc44a Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25381#pullrequestreview-2873809459 From jsikstro at openjdk.org Wed May 28 06:55:52 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 28 May 2025 06:55:52 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic [v6] In-Reply-To: References: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> Message-ID: On Wed, 28 May 2025 05:52:43 GMT, Axel Boldt-Christmas wrote: >> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) required changing the way ZGC handle memory uncommitting (returning physical memory to the OS). Previously ZGC tracked how recently used memory was on a ZPage level. [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) did away with the ZPage abstraction for unused memory. But because of this ZGC does not have a convenient way of tracking the usage of a specific memory range. Instead [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) opted to keep a watermark in the cache unused mapped memory, to keep track of the amount of memory that was not used within the last ZUncommitDelay, and use this when deciding how much to uncommit. >> >> Because this measurement is not as granular as previously, and because uncommitting memory is something we want to do conservatively, as a response to low memory utilization, [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was written with the intent to spread out the uncommitting over some time interval. >> >> The actual implementation in [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) has a few issues which this RFE tries to address: >> * Missing wait, the uncommitting is not actually spread out, but happens all at once. >> * Reactivity, if the process starts using memory that was below the previous watermark, uncommitting should stop. >> * Structure, the current implementation has a lot of different dependencies and has state spread out over multiple classes. Refactor to keep the logic contained to the ZUncommitter, and provide better named facilitating functions on the ZPartition and ZMappedCache. And make the lifecycle of ZUncommitter more explicit. >> * Events, overhaul the JFR uncommit events to be sent (and track the time for) a chunk of uncommits without any waits. >> >> An alternative discussed has been to do uncommitting based on GC triggers rather than a periodically. So rather than using ZUncommitDelay, we could have our proactive GCs actually trigger and track uncommitting. This might be a future RFE, but it was not attempted here as it would change user facing APIs. [JDK-8329758](https://bugs.openjdk.org/browse/JDK-8329758) will more than likely overhaul the uncommit triggers as well, and the whole concept of ZUncommitDelay and having to tune how to uncommit will go away. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Reduce the amount of should_continue checks Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25198#pullrequestreview-2873810656 From tschatzl at openjdk.org Wed May 28 06:57:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 28 May 2025 06:57:56 GMT Subject: RFR: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation In-Reply-To: <4ouxt3gZpoBua9ZyXHAN8hN3hSg7lHGJO0Ab5mlGJNs=.aec05ac2-c64d-45a5-b534-d45573ed9153@github.com> References: <4ouxt3gZpoBua9ZyXHAN8hN3hSg7lHGJO0Ab5mlGJNs=.aec05ac2-c64d-45a5-b534-d45573ed9153@github.com> Message-ID: On Tue, 27 May 2025 07:32:08 GMT, Matthias Baesken wrote: >> Hi all, >> >> please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. >> >> The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. >> >> Testing: gha, tier1 > > Seems some copyright info in headers needs adjustment, see vmStructs_g1.hpp . Thanks @MBaesken @kimbarrett @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/25447#issuecomment-2915182298 From tschatzl at openjdk.org Wed May 28 06:57:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 28 May 2025 06:57:56 GMT Subject: Integrated: 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation In-Reply-To: References: Message-ID: <555VaVfrUHkR1uA9kh3hU6wnnkLKWpSaxtP5IQK_en4=.7cfe955d-6d92-45c5-902c-887c7257cde0@github.com> On Mon, 26 May 2025 10:15:01 GMT, Thomas Schatzl wrote: > Hi all, > > please review this fix for an ubsan error related to pointer under- or overflows when using the biased array helper. > > The fix is, instead of using direct address calculations that can cause these errors, use `uintptr_t` where the overflow behavior is defined in C++. Only convert to pointer at the actual access. > > Testing: gha, tier1 This pull request has now been integrated. Changeset: db515566 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/db515566875b92bd4aff08cccc80d80b85f01514 Stats: 18 lines in 4 files changed: 0 ins; 5 del; 13 mod 8354428: [ubsan] g1BiasedArray.hpp: pointer overflow in address calculation Reviewed-by: ayang, kbarrett, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/25447 From ayang at openjdk.org Wed May 28 07:22:01 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 28 May 2025 07:22:01 GMT Subject: RFR: 8357944: Remove unused CollectedHeap::is_maximal_no_gc Message-ID: Removing effectively dead code/API for all GCs except G1. ------------- Commit messages: - remove-heap-api Changes: https://git.openjdk.org/jdk/pull/25482/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25482&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357944 Stats: 51 lines in 12 files changed: 0 ins; 50 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25482.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25482/head:pull/25482 PR: https://git.openjdk.org/jdk/pull/25482 From tschatzl at openjdk.org Wed May 28 07:53:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 28 May 2025 07:53:51 GMT Subject: RFR: 8357854: Parallel: Inline args of PSOldGen::initialize_performance_counters In-Reply-To: References: Message-ID: On Tue, 27 May 2025 14:34:22 GMT, Albert Mingkun Yang wrote: > Trivial inlining some args in leaf-callee to avoid carrying them in the call-chain. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25466#pullrequestreview-2873984062 From stefank at openjdk.org Wed May 28 08:17:56 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 28 May 2025 08:17:56 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v7] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 05:33:43 GMT, Axel Boldt-Christmas wrote: >>
Background (expandable section) >> ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. >> >> The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. >> >> Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. >> >> For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. >> >> But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. >> >> A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. >> >> For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. >> >> However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. >> >> In ZGC we call our memory regions pages or zpages. >>
>> >> ### Proposal >> Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. >> >> And ad... > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8357449 > - Rename _size to _requested_size > - Update test/hotspot/jtreg/gc/z/TestZMediumPageSizes.java > > Co-authored-by: Andrey Turbanov > - Missing -XX:+UnlockDiagnosticVMOptions > - Revert "NumPartitions is reserved by Shenandoah" > > This reverts commit f7619fd700ec6498948e5e84e8051be145683940. > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8357449 > - Retype ZPageSizeSmallShift to int > - Rename ZPageSizeMediumShift -> ZPageSizeMediumMaxShift > - Update is_disabled comment > - Apply suggestions from code review > > Co-authored-by: Stefan Karlsson > - ... and 5 more: https://git.openjdk.org/jdk/compare/b3637637...9e1cc44a Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25381#pullrequestreview-2874060601 From aboldtch at openjdk.org Wed May 28 08:24:06 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 28 May 2025 08:24:06 GMT Subject: RFR: 8357449: ZGC: Multiple medium page sizes [v7] In-Reply-To: References: Message-ID: <2uA55ZsVKXRAM8adtmQ55woWC4RJPAujezs-XT6WYdY=.ae109e30-f71c-4d5c-8504-dfc5fff8a7a8@github.com> On Wed, 28 May 2025 05:33:43 GMT, Axel Boldt-Christmas wrote: >>
Background (expandable section) >> ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. >> >> The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. >> >> Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. >> >> For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. >> >> But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. >> >> A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. >> >> For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. >> >> However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. >> >> In ZGC we call our memory regions pages or zpages. >>
>> >> ### Proposal >> Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. >> >> And ad... > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8357449 > - Rename _size to _requested_size > - Update test/hotspot/jtreg/gc/z/TestZMediumPageSizes.java > > Co-authored-by: Andrey Turbanov > - Missing -XX:+UnlockDiagnosticVMOptions > - Revert "NumPartitions is reserved by Shenandoah" > > This reverts commit f7619fd700ec6498948e5e84e8051be145683940. > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8357449 > - Retype ZPageSizeSmallShift to int > - Rename ZPageSizeMediumShift -> ZPageSizeMediumMaxShift > - Update is_disabled comment > - Apply suggestions from code review > > Co-authored-by: Stefan Karlsson > - ... and 5 more: https://git.openjdk.org/jdk/compare/10a8c9b6...9e1cc44a Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25381#issuecomment-2915412934 From aboldtch at openjdk.org Wed May 28 08:24:07 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 28 May 2025 08:24:07 GMT Subject: Integrated: 8357449: ZGC: Multiple medium page sizes In-Reply-To: References: Message-ID: On Thu, 22 May 2025 06:32:04 GMT, Axel Boldt-Christmas wrote: >
Background (expandable section) > ZGC uses three different types of memory regions (Small, Medium and Large) as a compromise between memory waste and relocation induced latencies. > > The allocated object size dictates which type of memory region it ends up in. These sizes are selected such that when an object allocation fails in a memory region because that object does not fit, the waste (unused bytes at the end) is at most 1/8th or 12.5%. This property is held for both the small and medium memory regions. > > Objects larger than medium object allocation gets placed in a large memory region, which only ever contains one object. And because all memory region sizes are multiples of 2M, we end up with a memory waste which is the difference between object size rounded up to the nearest multiple of 2M and the exact object size. > > For max heaps (Xmx) smaller than 1GB we use reduced medium memory region sizes at the cost of worse waste guarantees for large object allocation. > > But for max heaps 1GB or larger our current selected medium memory region size is 32M. This results in a max medium object size of 4M (32M * 12.5%), which is the max size we want an application thread to have to relocate. So we end up with a guarantee that the waste in large memory regions is at most 33%. > > A problem with medium pages is that they may cause allocation induced latencies. To reduce allocation latencies we track (cache) memory of memory regions which has been freed by the GC, so it can be reused for new memory regions used for allocations. > > For small memory regions, as long as there is cached memory, it can use it, because the size of a small memory region (2M) is always a multiple of any other memory region that has been freed. > > However for medium memory regions it may be that there is enough memory available in the cache, but it is only split into regions smaller than the medium memory regions size (32M). Currently this requires the allocating thread to remap multiple of these small memory regions into a new larger one, which involves calls into the operating system. > > In ZGC we call our memory regions pages or zpages. >
> > ### Proposal > Allow for medium pages to have multiple sizes. Specifically allow all power of two sizes between the smallest size that can contain one medium object and the max medium page size. For a max medium page size of 32M the sizes ends up being {4M, 8M, 16M, 32M}. > > And adds a "fast" medium page allocation path in the p... This pull request has now been integrated. Changeset: f74fbfe5 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/f74fbfe5de9dc5b90652956935642670c085938c Stats: 403 lines in 19 files changed: 345 ins; 11 del; 47 mod 8357449: ZGC: Multiple medium page sizes Reviewed-by: stefank, jsikstro ------------- PR: https://git.openjdk.org/jdk/pull/25381 From jsikstro at openjdk.org Wed May 28 08:30:51 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 28 May 2025 08:30:51 GMT Subject: RFR: 8357854: Parallel: Inline args of PSOldGen::initialize_performance_counters In-Reply-To: References: Message-ID: On Tue, 27 May 2025 14:34:22 GMT, Albert Mingkun Yang wrote: > Trivial inlining some args in leaf-callee to avoid carrying them in the call-chain. Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25466#pullrequestreview-2874098788 From thomas.schatzl at oracle.com Wed May 28 08:31:24 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 28 May 2025 10:31:24 +0200 Subject: GC and pointer masking In-Reply-To: <9A0FD4F5-6E94-419C-870F-68F37AB632BC@rivosinc.com> References: <9A0FD4F5-6E94-419C-870F-68F37AB632BC@rivosinc.com> Message-ID: <36212b9f-b2b0-44de-82f8-63a1e1f3e532@oracle.com> Hi, On 23.05.25 15:18, Tony Printezis wrote: > Hi all, > > Pointer masking is available for some architectures (including RISC-V!). This can allow us to mark the top bits of an object reference with what type of objects it is (young / old / humongous / etc.) without needing to clear those bits explicitly before we use the reference. This can be helpful both in the GC itself but also in the barriers (e.g., efficiently filter out young objects in barriers that are not needed on young objects). > > Has anyone already looked into taking advantage of pointer masking in HotSpot? I tried a couple of searches but I didn?t find anything. If there?s been a discussion on this before, can you please point me to it? > I think there have been internal tests for ZGC to use pointer masking long time ago. The result is that ZGC does not use HW pointer masking; I do not remember the main reason(s) but if I would guess: * the advantage wasn't that big (compared to the used software barriers) * only work on some archs, so one would need a fallback anyway * inflexible * masking out some bits on pointer access isn't that expensive nowadays; i.e. these ALU ops are much less expensive than the memory ops they cause. For Serial/Parallel/G1 collectors I do not remember any attempts of hw pointer masking use for barriers. The main reasons not to try were * only works on some archs, need fallback * the current/near current barriers are extremely small anyway (even for G1, see ?[1] or [2]. There is some effort to selectively generate the filters, reducing overhead further. G1 could then also just use Serial/Parallel equivalent barriers without any filters). * reduces compressed oops range as the bits need to be stored somewhere, which is probably the range of heap sizes most VMs run. This is a fairly large hurdle to overcome. * available time to investigate; G1 barriers did not substantially change since its inception (only some fixes here and there). It has been years after initial ideas to fix them until something has been productized. Do not know about Shenandoah barriers too much, or their attempts in that direction. I think having extra bits in the references could help with now would be (g1) concurrent marking where they could be used to avoid some overhead wrt to enqueuing (not sure right now). Other potential uses are even more sketchy, but these features would need to carry their weight. If you have any particular ideas, feel free to bring them up. Hth, Thomas [1] https://github.com/openjdk/jdk/pull/23739 [2] https://tschatzl.github.io/2025/02/21/new-write-barriers.html (Shameless plug ;)) From ayang at openjdk.org Wed May 28 08:51:00 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 28 May 2025 08:51:00 GMT Subject: RFR: 8357854: Parallel: Inline args of PSOldGen::initialize_performance_counters In-Reply-To: References: Message-ID: On Tue, 27 May 2025 14:34:22 GMT, Albert Mingkun Yang wrote: > Trivial inlining some args in leaf-callee to avoid carrying them in the call-chain. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25466#issuecomment-2915489673 From ayang at openjdk.org Wed May 28 08:51:01 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 28 May 2025 08:51:01 GMT Subject: Integrated: 8357854: Parallel: Inline args of PSOldGen::initialize_performance_counters In-Reply-To: References: Message-ID: On Tue, 27 May 2025 14:34:22 GMT, Albert Mingkun Yang wrote: > Trivial inlining some args in leaf-callee to avoid carrying them in the call-chain. This pull request has now been integrated. Changeset: 1e0caedb Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/1e0caedb9ab1c56e3986764ce260b94e423d4948 Stats: 17 lines in 3 files changed: 0 ins; 3 del; 14 mod 8357854: Parallel: Inline args of PSOldGen::initialize_performance_counters Reviewed-by: tschatzl, jsikstro ------------- PR: https://git.openjdk.org/jdk/pull/25466 From kevinw at openjdk.org Wed May 28 09:10:54 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 28 May 2025 09:10:54 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v4] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 14:09:37 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. >> >> For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. >> >> The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. > > Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into statsampler-removal > - removed last traces of hrt.ticks > - Merge branch 'master' into statsampler-removal > - feedback fixes > - removed the PerfDataSamplingInterval flag > - calculate timestamp in jstat instead of sampling > - StatSampler + sampling code removed Hi, looks good. If we are removing things from jcmd PerfCounter.print output, that could feature in the release note that you have planned. Anybody expecting these counters using jcmd PerfCounter.print or other methods, may not know that they are related to StatSampler and would not realise from the title that this is a relevant change. $ jcmd 203133 PerfCounter.print | grep hrt sun.os.hrt.frequency=1000000000 sun.os.hrt.ticks=132389861230 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24872#issuecomment-2915558106 From tschatzl at openjdk.org Wed May 28 09:30:06 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 28 May 2025 09:30:06 GMT Subject: RFR: 8334759: gc/g1/TestMixedGCLiveThreshold.java fails on Windows with JTREG_TEST_THREAD_FACTORY=Virtual due to extra memory allocation Message-ID: Hi all, please review this change that "fixes" the `TestMixedGCLiveThreshold.java` test by requiring the thread factory used for these tests to not use virtual threads instead of platform threads. That adds some additional memory consumption, and since the test is about testing G1 reaction due to particular known memory consumption, it can fail. The fix is to add the appropriate `@requires` tag that has been introduced in jtreg 7.5.1 (which is current default/requirement for builds, https://bugs.openjdk.org/browse/JDK-8334759). Testing: verified that the test is not run if the virtual thread test thread factory is used. Thanks, Thomas ------------- Commit messages: - 8334759 Changes: https://git.openjdk.org/jdk/pull/25486/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25486&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334759 Stats: 8 lines in 2 files changed: 3 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25486.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25486/head:pull/25486 PR: https://git.openjdk.org/jdk/pull/25486 From ayang at openjdk.org Wed May 28 09:46:50 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 28 May 2025 09:46:50 GMT Subject: RFR: 8357307: VM GC operations should have a public gc_succeeded() In-Reply-To: <2g2S2zmFNw3CLhCa3IPJ76hG8dHVpmbk9gSzkvTmJNs=.1e720c79-26ce-405d-a330-cb1087cd1bcf@github.com> References: <2g2S2zmFNw3CLhCa3IPJ76hG8dHVpmbk9gSzkvTmJNs=.1e720c79-26ce-405d-a330-cb1087cd1bcf@github.com> Message-ID: On Tue, 27 May 2025 15:43:10 GMT, Thomas Schatzl wrote: > Hi all, > > please review this cleanup that changes `VM_GC_Operation` to use `gc_succeeded` instead of `prologue_succeeded()`to indicate that a GC has been executed (once a GC is started, it will always finish, so the only reason that a VM op does not get executed is that we decide in the prologue that there has already been a GC). > > After recent changes the change is/was mostly a renaming of the `prologue_succeeded`method - there is only one case for G1 where additional checks can cause no execution of the GC (e.g. because we started a low-priority concurrent mark and we are already currently marking. No point for doing a GC in that case). > > Testing: tier1-4, gha > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25469#pullrequestreview-2874337430 From stuart.monteith at arm.com Wed May 28 10:14:17 2025 From: stuart.monteith at arm.com (Stuart Monteith) Date: Wed, 28 May 2025 11:14:17 +0100 Subject: GC and pointer masking In-Reply-To: <9A0FD4F5-6E94-419C-870F-68F37AB632BC@rivosinc.com> References: <9A0FD4F5-6E94-419C-870F-68F37AB632BC@rivosinc.com> Message-ID: <333c817e-5d59-4308-b0cb-f7b16a49ab86@arm.com> On 23/05/2025 14:18, Tony Printezis wrote: > Hi all, > > Pointer masking is available for some architectures (including RISC-V!). This can allow us to mark the top bits of an object reference with what type of objects it is (young / old / humongous / etc.) without needing to clear those bits explicitly before we use the reference. This can be helpful both in the GC itself but also in the barriers (e.g., efficiently filter out young objects in barriers that are not needed on young objects). > > Has anyone already looked into taking advantage of pointer masking in HotSpot? I tried a couple of searches but I didn?t find anything. If there?s been a discussion on this before, can you please point me to it? > > Thanks, > > Tony Hello Tony, I experimented with tagged pointers for the colours in ZGC, using the Aarch64 Top-Byte-Ignore (TBI) feature. This was based on the SPARC code that used its ADI feature. There was a discussion here: https://mail.openjdk.org/pipermail/aarch64-port-dev/2019-May/007293.html There was a slight problem with tagged pointers being passed to the Linux kernel. libawt was being passed tagged object references which ended up in the writev system call. The Linux kernel, on Aarch64, has a feature to accept tagged pointers in system calls, although I never went back and checked that. The fix at the time was a trivial change to jni.c to filter out the tags. BR, Stuart From iwalulya at openjdk.org Wed May 28 10:56:51 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 28 May 2025 10:56:51 GMT Subject: RFR: 8357307: VM GC operations should have a public gc_succeeded() In-Reply-To: <2g2S2zmFNw3CLhCa3IPJ76hG8dHVpmbk9gSzkvTmJNs=.1e720c79-26ce-405d-a330-cb1087cd1bcf@github.com> References: <2g2S2zmFNw3CLhCa3IPJ76hG8dHVpmbk9gSzkvTmJNs=.1e720c79-26ce-405d-a330-cb1087cd1bcf@github.com> Message-ID: <5tCdrHPXZm07KMG_kbs7l6Go7Rz15lubXzvqANnqHhw=.2f67549f-0f12-4e01-a033-a0eaa56d1932@github.com> On Tue, 27 May 2025 15:43:10 GMT, Thomas Schatzl wrote: > Hi all, > > please review this cleanup that changes `VM_GC_Operation` to use `gc_succeeded` instead of `prologue_succeeded()`to indicate that a GC has been executed (once a GC is started, it will always finish, so the only reason that a VM op does not get executed is that we decide in the prologue that there has already been a GC). > > After recent changes the change is/was mostly a renaming of the `prologue_succeeded`method - there is only one case for G1 where additional checks can cause no execution of the GC (e.g. because we started a low-priority concurrent mark and we are already currently marking. No point for doing a GC in that case). > > Testing: tier1-4, gha > > Thanks, > Thomas Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25469#pullrequestreview-2874544891 From cnorrbin at openjdk.org Wed May 28 11:09:59 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 28 May 2025 11:09:59 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v4] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 09:08:30 GMT, Kevin Walls wrote: > Hi, looks good. > If we are removing things from jcmd PerfCounter.print output, that could feature in the release note that you have planned. Anybody expecting these counters using jcmd PerfCounter.print or other methods, may not know that they are related to StatSampler and would not realise from the title that this is a relevant change. Thank you. I will add that to the release note, along with text on how to calculate it instead using offsets (similar to what I did with the jstat counter) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24872#issuecomment-2915907228 From jsjolen at openjdk.org Wed May 28 11:48:53 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 28 May 2025 11:48:53 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v4] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 14:09:37 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. >> >> For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. >> >> The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. > > Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into statsampler-removal > - removed last traces of hrt.ticks > - Merge branch 'master' into statsampler-removal > - feedback fixes > - removed the PerfDataSamplingInterval flag > - calculate timestamp in jstat instead of sampling > - StatSampler + sampling code removed Thank you for the review, @kevinjwalls. Let's integrate it :-). ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24872#pullrequestreview-2874686378 From cnorrbin at openjdk.org Wed May 28 11:51:53 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 28 May 2025 11:51:53 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v4] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 14:09:37 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. >> >> For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. >> >> The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. > > Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into statsampler-removal > - removed last traces of hrt.ticks > - Merge branch 'master' into statsampler-removal > - feedback fixes > - removed the PerfDataSamplingInterval flag > - calculate timestamp in jstat instead of sampling > - StatSampler + sampling code removed Thank you everyone for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24872#issuecomment-2916013283 From duke at openjdk.org Wed May 28 11:51:53 2025 From: duke at openjdk.org (duke) Date: Wed, 28 May 2025 11:51:53 GMT Subject: RFR: 8241678: Remove PerfData sampling via StatSampler [v4] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 14:09:37 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. >> >> For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. >> >> The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. > > Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into statsampler-removal > - removed last traces of hrt.ticks > - Merge branch 'master' into statsampler-removal > - feedback fixes > - removed the PerfDataSamplingInterval flag > - calculate timestamp in jstat instead of sampling > - StatSampler + sampling code removed @caspernorrbin Your change (at version e976ea681ac7990eea9cd82acc1783c47dd7e668) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24872#issuecomment-2916015185 From cnorrbin at openjdk.org Wed May 28 12:03:03 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 28 May 2025 12:03:03 GMT Subject: Integrated: 8241678: Remove PerfData sampling via StatSampler In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 10:38:39 GMT, Casper Norrbin wrote: > Hi everyone, > > This change removes the legacy `PerfData` sampling mechanism implemented through the `StatSampler` ? an always-on periodic task that runs every 50ms my default. The sampling feature was originally introduced to collect performance counters and timestamps, but has since seen very little use. > > For G1/ZGC, the only sampled value is a timestamp (`sun.os.hrt.ticks`). For Serial/Parallel, it also samples some heap space counters, but these are already updated after each GC cycle, making the sampling redundant. With sampling removed, the `PerfDataSamplingInterval` flag becomes obsoleted, as it no longer serves any purpose. > > The only thing relying on the sampled timestamps is `jstat`: running `jstat -t` prints an extra column with the time since VM start. To preserve this funcitonality, we can calculate the timestamps as an offset from the already existing `sun.rt.createVmBeginTime` instead. This pull request has now been integrated. Changeset: 6ebae6cd Author: Casper Norrbin Committer: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/6ebae6cded49f9b0b0d42899af3303647eca7848 Stats: 864 lines in 25 files changed: 150 ins; 655 del; 59 mod 8241678: Remove PerfData sampling via StatSampler Reviewed-by: jsjolen, ayang ------------- PR: https://git.openjdk.org/jdk/pull/24872 From erik.osterlund at oracle.com Wed May 28 13:30:34 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 28 May 2025 13:30:34 +0000 Subject: GC and pointer masking In-Reply-To: <9A0FD4F5-6E94-419C-870F-68F37AB632BC@rivosinc.com> References: <9A0FD4F5-6E94-419C-870F-68F37AB632BC@rivosinc.com> Message-ID: <87758ACD-AC19-4BC7-BDB7-1B28AA5721A3@oracle.com> It?s worth mentioning that other GCs than ZGC use side tables as part of their barriers that assumes you can take an (address >> x) + y to get to the table element location very efficiently. This no longer works if you start encoding metadata bits in the high order bits, so the barriers would have to add software unmasking to remove the bits as part of these calculations, which presents a sort of anti optimization for GC barriers and hence increases the bar for how useful this would have to be for the GC in order to make it even a net win. As for ZGC, we don?t have those issues really, but we have a classification scheme for color bits that is probably not widely known? we talk about persistent color bits vs transient color bits. Persistent bits are bits that stick around in the object address between loading the object reference to when it gets stored somewhere. Transient bits, conversely, are removed once loaded and restored when stored. In the non-generational initial version of ZGC, all bits were persistent bits, implemented with multi-mapped memory. Here, HW address masking could be applied. But there wasn?t really anything in the algorithm that made it necessary for the bits we used to be persistent or transient; either one would have worked really. As for generational ZGC, not only is the use of transient colors a preference because we got good barriers for it, but it has also allowed us to encode *field properties* instead of object properties. This is used by our remembered sets. You can have two fields with pointers to the same object, which is different in the remembered set bits, which say something about whether a store has been performed or not. These bits are important, but very impractical to encode as persistent bits as load barriers would then have to let bits for pointers pointing to the same object be different. Hence, encoding them as persistent bits would require all == (acmp etc) operations to dynamically check whether two pointers are ?roughly equal? instead of exactly equal. Since we ended up needing support for transient bits. And it turned out we didn?t really have any colors at all that actually needed to be persistent, so we made them all transient. Therefore, we have thus far not gone down this route of optimizing persistent bits. We might end up needing persistent bits in the future. But as of today, we need transient bits, and we don?t need persistent bits. Hope this helps. /Erik > On 23 May 2025, at 15:18, Tony Printezis wrote: > > Hi all, > > Pointer masking is available for some architectures (including RISC-V!). This can allow us to mark the top bits of an object reference with what type of objects it is (young / old / humongous / etc.) without needing to clear those bits explicitly before we use the reference. This can be helpful both in the GC itself but also in the barriers (e.g., efficiently filter out young objects in barriers that are not needed on young objects). > > Has anyone already looked into taking advantage of pointer masking in HotSpot? I tried a couple of searches but I didn?t find anything. If there?s been a discussion on this before, can you please point me to it? > > Thanks, > > Tony From tschatzl at openjdk.org Wed May 28 15:52:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 28 May 2025 15:52:58 GMT Subject: Integrated: 8357307: VM GC operations should have a public gc_succeeded() In-Reply-To: <2g2S2zmFNw3CLhCa3IPJ76hG8dHVpmbk9gSzkvTmJNs=.1e720c79-26ce-405d-a330-cb1087cd1bcf@github.com> References: <2g2S2zmFNw3CLhCa3IPJ76hG8dHVpmbk9gSzkvTmJNs=.1e720c79-26ce-405d-a330-cb1087cd1bcf@github.com> Message-ID: On Tue, 27 May 2025 15:43:10 GMT, Thomas Schatzl wrote: > Hi all, > > please review this cleanup that changes `VM_GC_Operation` to use `gc_succeeded` instead of `prologue_succeeded()`to indicate that a GC has been executed (once a GC is started, it will always finish, so the only reason that a VM op does not get executed is that we decide in the prologue that there has already been a GC). > > After recent changes the change is/was mostly a renaming of the `prologue_succeeded`method - there is only one case for G1 where additional checks can cause no execution of the GC (e.g. because we started a low-priority concurrent mark and we are already currently marking. No point for doing a GC in that case). > > Testing: tier1-4, gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: 2e6838a2 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/2e6838a20d52e9fa0a3b7322f2cb548e034b5d83 Stats: 7 lines in 5 files changed: 0 ins; 2 del; 5 mod 8357307: VM GC operations should have a public gc_succeeded() Reviewed-by: ayang, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/25469 From tschatzl at openjdk.org Wed May 28 15:52:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 28 May 2025 15:52:57 GMT Subject: RFR: 8357307: VM GC operations should have a public gc_succeeded() In-Reply-To: <5tCdrHPXZm07KMG_kbs7l6Go7Rz15lubXzvqANnqHhw=.2f67549f-0f12-4e01-a033-a0eaa56d1932@github.com> References: <2g2S2zmFNw3CLhCa3IPJ76hG8dHVpmbk9gSzkvTmJNs=.1e720c79-26ce-405d-a330-cb1087cd1bcf@github.com> <5tCdrHPXZm07KMG_kbs7l6Go7Rz15lubXzvqANnqHhw=.2f67549f-0f12-4e01-a033-a0eaa56d1932@github.com> Message-ID: On Wed, 28 May 2025 10:54:05 GMT, Ivan Walulya wrote: >> Hi all, >> >> please review this cleanup that changes `VM_GC_Operation` to use `gc_succeeded` instead of `prologue_succeeded()`to indicate that a GC has been executed (once a GC is started, it will always finish, so the only reason that a VM op does not get executed is that we decide in the prologue that there has already been a GC). >> >> After recent changes the change is/was mostly a renaming of the `prologue_succeeded`method - there is only one case for G1 where additional checks can cause no execution of the GC (e.g. because we started a low-priority concurrent mark and we are already currently marking. No point for doing a GC in that case). >> >> Testing: tier1-4, gha >> >> Thanks, >> Thomas > > Marked as reviewed by iwalulya (Reviewer). Thanks @walulyai @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/25469#issuecomment-2916827168 From eosterlund at openjdk.org Wed May 28 16:50:55 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 28 May 2025 16:50:55 GMT Subject: RFR: 8357443: ZGC: Optimize old page iteration in remap remembered phase [v2] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 12:45:09 GMT, Stefan Karlsson wrote: >> Before starting the relocation phase of a major collection we remap all pointers into the young generation so that we can disambiguate when an oop has bad bits for both the young generation and the old generation. See comment in remap_young_roots. >> >> One part of this is requires us to visit all old pages. To parallelize that part we have a class that distribute indices to the page table to the GC worker threads (See ZIndexDistributor). >> >> While looking into a potential, minor performance regression on Windows I noticed that the usage of constexpr in ZIndexDistributorClaimTree wasn't giving us the inlining we hoped for, which caused a noticeable worse performance on Windows compared to the other platforms. I created a patch for this that gave us the expected inlining. See https://github.com/openjdk/jdk/compare/master...stefank:jdk:8357443_zgc_optimize_remap_remembered >> >> While thinking about this a bit more I realized that we could use the "found old" optimization that we already use for the remset scanning. This finds the old pages without scanning the entire page table. This gives a significant enough boost that I propose that we do that instead. >> >> This mainly lowers the Major Collection times when you run a GC without any significant amount of objects in the old generation. So, most likely mostly important for micro benchmarks and small workloads. >> >> The below is the average time (ms) of the Concurrent Remap Roots phase from only running `System.gc()` 50 times before and after this PR. >> >> >> 4 GB MaxHeapSize >> Original Patch >> Default threads >> >> mac: 0.27812 0.0507 >> win: 0.9485 0.10452 >> linux-x64: 0.53858 0.092 >> linux-x64 NUMA: 0.89974 0.15452 >> linux-aarch64: 0.32574 0.15832 >> >> 4 threads >> >> mac: 0.19112 0.04916 >> win: 0.83346 0.08796 >> linux-x64: 0.57692 0.09526 >> linux-x64 NUMA: 1.23684 0.17008 >> linux-aarch64: 0.334 0.21918 >> >> 1 thread: >> >> mac: 0.19678 0.0589 >> win: 1.96496 0.09928 >> linux-x64: 1.00788 0.1381 >> linux-x64 NUMA: 2.77312 0.21134 >> linux-aarch64: 0.63696 0.31286 >> >> >> The second set of data is from using the extreme end of the supported heap size. This mimics how we previously used to have a large page table even for smaller heap size ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Axel Boldt-Christmas Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25345#pullrequestreview-2875785328 From shade at openjdk.org Wed May 28 18:56:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 May 2025 18:56:34 GMT Subject: RFR: 8357999: SA: FileMapInfo.metadataTypeArray initialization issue after JDK-8355003 Message-ID: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> SonarCloud reports an issue since [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) integration: duplicate index in `metadataTypeArray` initialization code. Looks like a simple typo, this PR fixes it. Additional testing: - [x] Linux x86_64 server fastdebug, `serviceability/sa` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25507/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25507&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357999 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25507/head:pull/25507 PR: https://git.openjdk.org/jdk/pull/25507 From ayang at openjdk.org Wed May 28 19:21:53 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 28 May 2025 19:21:53 GMT Subject: RFR: 8357999: SA: FileMapInfo.metadataTypeArray initialization issue after JDK-8355003 In-Reply-To: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> References: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> Message-ID: On Wed, 28 May 2025 18:51:09 GMT, Aleksey Shipilev wrote: > SonarCloud reports an issue since [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) integration: duplicate index in `metadataTypeArray` initialization code. Looks like a simple typo, this PR fixes it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `serviceability/sa` Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25507#pullrequestreview-2876180639 From iklam at openjdk.org Wed May 28 22:22:53 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 28 May 2025 22:22:53 GMT Subject: RFR: 8357999: SA: FileMapInfo.metadataTypeArray initialization issue after JDK-8355003 In-Reply-To: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> References: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> Message-ID: On Wed, 28 May 2025 18:51:09 GMT, Aleksey Shipilev wrote: > SonarCloud reports an issue since [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) integration: duplicate index in `metadataTypeArray` initialization code. Looks like a simple typo, this PR fixes it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `serviceability/sa` Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25507#pullrequestreview-2876606308 From kvn at openjdk.org Wed May 28 22:40:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 28 May 2025 22:40:50 GMT Subject: RFR: 8357999: SA: FileMapInfo.metadataTypeArray initialization issue after JDK-8355003 In-Reply-To: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> References: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> Message-ID: On Wed, 28 May 2025 18:51:09 GMT, Aleksey Shipilev wrote: > SonarCloud reports an issue since [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) integration: duplicate index in `metadataTypeArray` initialization code. Looks like a simple typo, this PR fixes it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `serviceability/sa` Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25507#pullrequestreview-2876627275 From sspitsyn at openjdk.org Wed May 28 23:14:50 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 28 May 2025 23:14:50 GMT Subject: RFR: 8357999: SA: FileMapInfo.metadataTypeArray initialization issue after JDK-8355003 In-Reply-To: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> References: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> Message-ID: <11-kYgYw9IrzkCs_Lr9fyjuPBoJayYYHVKb9CuoDNZ8=.ab4bed20-3b10-484f-a090-22452bc22651@github.com> On Wed, 28 May 2025 18:51:09 GMT, Aleksey Shipilev wrote: > SonarCloud reports an issue since [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) integration: duplicate index in `metadataTypeArray` initialization code. Looks like a simple typo, this PR fixes it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `serviceability/sa` Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25507#pullrequestreview-2876666642 From duke at openjdk.org Thu May 29 00:40:00 2025 From: duke at openjdk.org (duke) Date: Thu, 29 May 2025 00:40:00 GMT Subject: Withdrawn: 8352181: Shenandoah: Evacuate thread roots after early cleanup In-Reply-To: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> References: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> Message-ID: On Mon, 17 Mar 2025 21:37:14 GMT, William Kemper wrote: > Moving the evacuation of thread roots after early cleanup allows Shenandoah to recycle immediate garbage a bit sooner in the cycle. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24090 From shade at openjdk.org Thu May 29 15:10:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 29 May 2025 15:10:01 GMT Subject: Integrated: 8357999: SA: FileMapInfo.metadataTypeArray initialization issue after JDK-8355003 In-Reply-To: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> References: <_Phg54iSIDv37FdElr9Z7MT7nZg64K037DPPMsd5Qmc=.aad942ca-ba9e-4539-b436-8de34f06c54b@github.com> Message-ID: On Wed, 28 May 2025 18:51:09 GMT, Aleksey Shipilev wrote: > SonarCloud reports an issue since [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) integration: duplicate index in `metadataTypeArray` initialization code. Looks like a simple typo, this PR fixes it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `serviceability/sa` This pull request has now been integrated. Changeset: d8a78302 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/d8a783020d247d2c01834db14b44d239ad1f2bf4 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8357999: SA: FileMapInfo.metadataTypeArray initialization issue after JDK-8355003 Reviewed-by: ayang, iklam, kvn, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/25507 From qxing at openjdk.org Fri May 30 02:50:30 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 30 May 2025 02:50:30 GMT Subject: RFR: 8358104: Fix ZGC compilation error on GCC 10.2 Message-ID: JDK-8350441 introduced Mapped Cache for ZGC. However, the constructor of `ZMappedCache` uses brace-initialization for `_size_class_lists`, i.e. a `ZList` array. `ZList` is a class with a deleted copy constructor and an explicit destructor, and when its array is brace-initialized in the constructor, it triggers [GCC bug 63707](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63707). A short example to reproduce this bug: https://godbolt.org/z/3397bxc73 The bug causes compilation error of ZGC on GCC versions 10.1 to 10.2. Considering OpenJDK compilation is still requires GCC 10 or higher, this should be recorded as a bug. This patch uses value-initialization for `_size_class_lists` instead of brace-initialization, which should be semantically equivalent and work on all GCC versions. ------------- Commit messages: - Fix ZGC compilation error on GCC 10.2 Changes: https://git.openjdk.org/jdk/pull/25536/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25536&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358104 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25536.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25536/head:pull/25536 PR: https://git.openjdk.org/jdk/pull/25536 From kbarrett at openjdk.org Fri May 30 07:09:50 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 30 May 2025 07:09:50 GMT Subject: RFR: 8358104: Fix ZGC compilation error on GCC 10.2 In-Reply-To: References: Message-ID: On Fri, 30 May 2025 02:40:48 GMT, Qizheng Xing wrote: > JDK-8350441 introduced Mapped Cache for ZGC. However, the constructor of `ZMappedCache` uses brace-initialization for `_size_class_lists`, i.e. a `ZList` array. `ZList` is a class with a deleted copy constructor and an explicit destructor, and when its array is brace-initialized in the constructor, it triggers [GCC bug 63707](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63707). A short example to reproduce this bug: https://godbolt.org/z/3397bxc73 > > The bug causes compilation error of ZGC on GCC versions 10.1 to 10.2. Considering OpenJDK compilation is still requires GCC 10 or higher, this should be recorded as a bug. > > This patch uses value-initialization for `_size_class_lists` instead of brace-initialization, which should be semantically equivalent and work on all GCC versions. The change seems okay, though a bit disappointing that it's needed. The referenced gcc bug seems to have been fixed in gcc11, with backports to gcc10.3 and gcc9.4. The minimum supported compiler version has always been somewhat approximate, since there isn't regular testing reported for what are often rather old versions. Sometimes we just decide to force an update to the required minimum rather than working around an issue like this. I wonder why that version is still in use? ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25536#pullrequestreview-2880376496 From qxing at openjdk.org Fri May 30 07:40:50 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 30 May 2025 07:40:50 GMT Subject: RFR: 8358104: Fix ZGC compilation error on GCC 10.2 In-Reply-To: References: Message-ID: On Fri, 30 May 2025 07:07:04 GMT, Kim Barrett wrote: >> JDK-8350441 introduced Mapped Cache for ZGC. However, the constructor of `ZMappedCache` uses brace-initialization for `_size_class_lists`, i.e. a `ZList` array. `ZList` is a class with a deleted copy constructor and an explicit destructor, and when its array is brace-initialized in the constructor, it triggers [GCC bug 63707](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63707). A short example to reproduce this bug: https://godbolt.org/z/3397bxc73 >> >> The bug causes compilation error of ZGC on GCC versions 10.1 to 10.2. Considering OpenJDK compilation is still requires GCC 10 or higher, this should be recorded as a bug. >> >> This patch uses value-initialization for `_size_class_lists` instead of brace-initialization, which should be semantically equivalent and work on all GCC versions. > > The change seems okay, though a bit disappointing that it's needed. > > The referenced gcc bug seems to have been fixed in gcc11, with backports to > gcc10.3 and gcc9.4. The minimum supported compiler version has always been > somewhat approximate, since there isn't regular testing reported for what are > often rather old versions. Sometimes we just decide to force an update to the > required minimum rather than working around an issue like this. > > I wonder why that version is still in use? @kimbarrett Thanks for your review! > I wonder why that version is still in use? My development environment is an Alibaba Cloud ECS server, with Alibaba Cloud Linux 3.2104 LTS. The default C++ compiler in this OS happens to be GCC 10.2.1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25536#issuecomment-2921491503 From jsikstro at openjdk.org Fri May 30 08:14:34 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 30 May 2025 08:14:34 GMT Subject: RFR: 8357053: ZGC: Improved utility for ZPageAge [v5] In-Reply-To: References: Message-ID: <1OVBoguoS9JYxlamI1Ag5nbBWgl9YTXvzVD38J7BD9c=.47f93b60-0b67-4786-9706-cec318260c2b@github.com> > Hello, > > This RFE improves utility for converting to/from, iterating over and defining structures that are indexed using the `ZPageAge` type. > > Converting to/from ZPageAge and its underlying type (uint8_t, often just uint) is currently done via using static_cast. This works fine because sane values are converted in all use cases. However, to make conversion safer (and also more readable), I propose we add a `to_zpageage` and a corresponding `untype` that checks that the conversion is valid. Such conversion methods should be used instead of calling `static_cast`. > > We currently define a value called `ZPageAgeMax`, which is defined as `static_cast(ZPageAge::old)`. The majority of places that use this value actualy use `ZPageAgeMax + 1`, which is equivalent to the number of ages. Instead, I propose we define and use a value that represents the number of possible ages, called `ZPageAgeCount`. > > Lastly, to make iterating over ages more accessible, I propose we create an intreface of enum iterators of ZPageAge. This will also create a foundation for generating values that require a ZPageAge in the future. Since the end of the enum iterators are exclusive, I've opted to use the following value as end for the iterators: > > constexpr ZPageAge ZPageAgeLastPlusOne = static_cast(ZPageAgeCount); > > > I see us using either this or a sentinel/dummy value at the end of the enum class, but I prefer having a value similar to `ZPageAgeLastPlusOne` over a dummy value. > > Testing: > * Oracle's tier 1-4 > * GHA Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into JDK-8357053_zpageage_utility - Remove redundant access specifier - Include order - Style fix :) - Added operator+/- for ZPageAge - Fix include order in enumIterator.hpp - Use T instead of EnumType - Use ENUMERATOR_RANGE instead of ENUMERATOR_VALUE_RANGE - Copyright years - Simplify untype(ZPageAge age) - ... and 1 more: https://git.openjdk.org/jdk/compare/07f5b762...3243a67a ------------- Changes: https://git.openjdk.org/jdk/pull/25251/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25251&range=04 Stats: 151 lines in 13 files changed: 93 ins; 9 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/25251.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25251/head:pull/25251 PR: https://git.openjdk.org/jdk/pull/25251 From jsikstro at openjdk.org Fri May 30 09:35:51 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 30 May 2025 09:35:51 GMT Subject: RFR: 8358104: Fix ZGC compilation error on GCC 10.2 In-Reply-To: References: Message-ID: On Fri, 30 May 2025 02:40:48 GMT, Qizheng Xing wrote: > JDK-8350441 introduced Mapped Cache for ZGC. However, the constructor of `ZMappedCache` uses brace-initialization for `_size_class_lists`, i.e. a `ZList` array. `ZList` is a class with a deleted copy constructor and an explicit destructor, and when its array is brace-initialized in the constructor, it triggers [GCC bug 63707](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63707). A short example to reproduce this bug: https://godbolt.org/z/3397bxc73 > > The bug causes compilation error of ZGC on GCC versions 10.1 to 10.2. Considering OpenJDK compilation is still requires GCC 10 or higher, this should be recorded as a bug. > > This patch uses value-initialization for `_size_class_lists` instead of brace-initialization, which should be semantically equivalent and work on all GCC versions. Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25536#pullrequestreview-2880739474 From qxing at openjdk.org Fri May 30 09:43:58 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 30 May 2025 09:43:58 GMT Subject: RFR: 8358104: Fix ZGC compilation error on GCC 10.2 In-Reply-To: References: Message-ID: On Fri, 30 May 2025 09:33:36 GMT, Joel Sikstr?m wrote: >> JDK-8350441 introduced Mapped Cache for ZGC. However, the constructor of `ZMappedCache` uses brace-initialization for `_size_class_lists`, i.e. a `ZList` array. `ZList` is a class with a deleted copy constructor and an explicit destructor, and when its array is brace-initialized in the constructor, it triggers [GCC bug 63707](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63707). A short example to reproduce this bug: https://godbolt.org/z/3397bxc73 >> >> The bug causes compilation error of ZGC on GCC versions 10.1 to 10.2. Considering OpenJDK compilation is still requires GCC 10 or higher, this should be recorded as a bug. >> >> This patch uses value-initialization for `_size_class_lists` instead of brace-initialization, which should be semantically equivalent and work on all GCC versions. > > Marked as reviewed by jsikstro (Committer). @jsikstro Thanks for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25536#issuecomment-2921807205 From qxing at openjdk.org Fri May 30 09:43:58 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 30 May 2025 09:43:58 GMT Subject: Integrated: 8358104: Fix ZGC compilation error on GCC 10.2 In-Reply-To: References: Message-ID: On Fri, 30 May 2025 02:40:48 GMT, Qizheng Xing wrote: > JDK-8350441 introduced Mapped Cache for ZGC. However, the constructor of `ZMappedCache` uses brace-initialization for `_size_class_lists`, i.e. a `ZList` array. `ZList` is a class with a deleted copy constructor and an explicit destructor, and when its array is brace-initialized in the constructor, it triggers [GCC bug 63707](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63707). A short example to reproduce this bug: https://godbolt.org/z/3397bxc73 > > The bug causes compilation error of ZGC on GCC versions 10.1 to 10.2. Considering OpenJDK compilation is still requires GCC 10 or higher, this should be recorded as a bug. > > This patch uses value-initialization for `_size_class_lists` instead of brace-initialization, which should be semantically equivalent and work on all GCC versions. This pull request has now been integrated. Changeset: a0eb1900 Author: Qizheng Xing Committer: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/a0eb1900c91531db26d1086a3b251bce0cf7c141 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8358104: Fix ZGC compilation error on GCC 10.2 Reviewed-by: kbarrett, jsikstro ------------- PR: https://git.openjdk.org/jdk/pull/25536 From jsikstro at openjdk.org Fri May 30 09:48:54 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 30 May 2025 09:48:54 GMT Subject: RFR: 8358104: Fix ZGC compilation error on GCC 10.2 In-Reply-To: References: Message-ID: On Fri, 30 May 2025 09:38:14 GMT, Qizheng Xing wrote: >> Marked as reviewed by jsikstro (Committer). > > @jsikstro Thanks for your review! Thanks for the fix @MaxXSoft! Just FYI, we usually allow PRs to be open for at least 24 hours to make sure reviewers from different time zones have a chance to see a PR, but this is a very small change. I'm not sure it was my place (seeing I'm only a Commiter and not a Reviewer) to say this is trivial, but I'd strongly argue it is. https://openjdk.org/guide/index.html#life-of-a-pr ------------- PR Comment: https://git.openjdk.org/jdk/pull/25536#issuecomment-2921828026 From jsikstro at openjdk.org Fri May 30 11:48:50 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 30 May 2025 11:48:50 GMT Subject: RFR: 8357944: Remove unused CollectedHeap::is_maximal_no_gc In-Reply-To: References: Message-ID: <_ihPWw3QtrzZZg3ZVgMgO6D7-bBCVdYh2-_Wi5erIvk=.eb141df4-35dd-478f-856d-efd3e8a4a243@github.com> On Wed, 28 May 2025 07:17:14 GMT, Albert Mingkun Yang wrote: > Removing effectively dead code/API for all GCs except G1. Looks good. This code is quite old, but I wonder if it's also appropriate to rename `is_maximal_no_gc()` to something more descriptive, maybe `is_expandable()` and reverse the check, which I think looks better at the callers of `is_maximal_no_gc()`. Not sure if that should be done separately or not though, and I don't have strong opinions on G1 code. ------------- Marked as reviewed by jsikstro (Committer). PR Review: https://git.openjdk.org/jdk/pull/25482#pullrequestreview-2881111569 From ayang at openjdk.org Fri May 30 11:56:51 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 30 May 2025 11:56:51 GMT Subject: RFR: 8357944: Remove unused CollectedHeap::is_maximal_no_gc In-Reply-To: <_ihPWw3QtrzZZg3ZVgMgO6D7-bBCVdYh2-_Wi5erIvk=.eb141df4-35dd-478f-856d-efd3e8a4a243@github.com> References: <_ihPWw3QtrzZZg3ZVgMgO6D7-bBCVdYh2-_Wi5erIvk=.eb141df4-35dd-478f-856d-efd3e8a4a243@github.com> Message-ID: On Fri, 30 May 2025 11:45:56 GMT, Joel Sikstr?m wrote: > I wonder if it's also appropriate to rename is_maximal_no_gc() to something more descriptive ... I agree the current name is not very nice. I am leaned towards changing its return-type also: `uint num_inactive_regions() const { return _hrm.num_inactive_regions(); }`. However, I feel that discussion can get its own ticket. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25482#issuecomment-2922188313 From rcastanedalo at openjdk.org Fri May 30 12:32:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 May 2025 12:32:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <17yjI7ChbobGnY0TM9OWlMizcyOn4mWziUMKNG4F64A=.bc284e23-c5a0-4afd-b5d0-3d3064b0c193@github.com> On Sat, 17 May 2025 09:04:28 GMT, Andrew Haley wrote: >> OK. C2 does not currently support creating exception table entries with arbitrary offsets relative to the start address of the code emitted for a Mach node, so that support would have to be added. I prototyped this support [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks), see calls to `record_exception_pc_offset()`. I don't think it is, overall, simpler than the approach proposed in this PR - definitely not from a `PhaseOutput`/`C2_MacroAssembler` perspective. But if you still think it is worth exploring, I will create a new prototype with the `record_exception_pc_offset()` on top of this PR to make it easier to compare. > > I don't think you have to do that. I think you only have to mark both the lea and the memory access with an exception table entry. The segfault handler sees the two entries, deduces that this access is split into two instructions, and does the right thing. @theRealAph do you have time to look into this, or should I proceed with the PR in its current form? The main bulk of the change is orthogonal to this discussion, and we can always revisit this part in a separate RFE if necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2115840786 From aph at openjdk.org Fri May 30 12:50:56 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 30 May 2025 12:50:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Sat, 17 May 2025 09:04:28 GMT, Andrew Haley wrote: >> OK. C2 does not currently support creating exception table entries with arbitrary offsets relative to the start address of the code emitted for a Mach node, so that support would have to be added. I prototyped this support [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks), see calls to `record_exception_pc_offset()`. I don't think it is, overall, simpler than the approach proposed in this PR - definitely not from a `PhaseOutput`/`C2_MacroAssembler` perspective. But if you still think it is worth exploring, I will create a new prototype with the `record_exception_pc_offset()` on top of this PR to make it easier to compare. > > I don't think you have to do that. I think you only have to mark both the lea and the memory access with an exception table entry. The segfault handler sees the two entries, deduces that this access is split into two instructions, and does the right thing. > @theRealAph do you have time to look into this, or should I proceed with the PR in its current form? The main bulk of the change is orthogonal to this discussion, and we can always revisit this part in a separate RFE if necessary. Sure, go ahead. I would prefer this to be done a little more neatly, but I accept your point that it's perhaps not quite as straightforward as I thought. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2115867606 From aboldtch at openjdk.org Fri May 30 13:36:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 30 May 2025 13:36:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v6] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 27 May 2025 07:46:43 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Include address mode test in 'legitimize_address' > - Excluded IR checks for testLoadVolatile on PPC64 Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2881385478 From ysr at openjdk.org Sat May 31 03:00:03 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 31 May 2025 03:00:03 GMT Subject: RFR: 8357471: GenShen: Share collector reserves between young and old [v2] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 19:23:32 GMT, Kelvin Nilsen wrote: >> Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young). >> >> This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity. >> >> The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly. >> >> ![image](https://github.com/user-attachments/assets/782f7285-2b26-4f3b-ba3e-58465abb2c3a) >> >> With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100. >> >> ![image](https://github.com/user-attachments/assets/54fb5eae-2ae8-4679-ac78-c88bc5c16c2f) >> >> At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100. >> >> ![image](https://github.com/user-attachments/assets/50a48564-7f32-4c48-80e9-78e9a3a3d63c) >> >> The command line for these comparisons follows: >> >> >> ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jd... > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - respond to reviewer feedback > - Keep gc cycle times with heuristics for the relevant generation src/hotspot/share/gc/shenandoah/heuristics/shenandoahGlobalHeuristics.cpp line 183: > 181: } > 182: > 183: heap->young_generation()->set_evacuation_reserve((size_t) (young_evac_bytes * ShenandoahEvacWaste)); So we are using the amount to be evacuated out of young (suitably marked up to account for waste) from the collection set of a specific cycle to predict the same for the next cycle? And similarly for the promotion bytes. This seems reasonable, but how does that compare with using the live data identified in the most recent marking cycle instead? I can imagine that the former is more accurate under steady state assumptions and the latter is an overestimate to the extent that not all live data will be evacuated because it's in mostly live, i.e. densely live regions. However, it would be interesting to see how they compare and which tracks reality better. Since this is in the nature of a prediction/estimate, once can consider a control algorithm that tries to move the estimate closer based on minimizing some historical deviation between marked vs evacuated. This need not be done here, but can be considered a future enhancement/experiment. src/hotspot/share/gc/shenandoah/heuristics/shenandoahGlobalHeuristics.cpp line 185: > 183: heap->young_generation()->set_evacuation_reserve((size_t) (young_evac_bytes * ShenandoahEvacWaste)); > 184: heap->old_generation()->set_evacuation_reserve((size_t) (old_evac_bytes * ShenandoahOldEvacWaste)); > 185: heap->old_generation()->set_promoted_reserve((size_t) (promo_bytes * ShenandoahPromoEvacWaste)); Note that the census from the most recent mark provides both these bits of information, but doesn't account for other criteria (i.e. liveness denseness) that go into the exclusion of certain regions (and their objects) from the collection set. Therein lies the rub, but armed with historical numbers of each and reality, one might be able to predict this well (may be). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2117098162 PR Review Comment: https://git.openjdk.org/jdk/pull/25357#discussion_r2117103371