From duke at openjdk.org Thu May 1 04:01:50 2025 From: duke at openjdk.org (duke) Date: Thu, 1 May 2025 04:01:50 GMT Subject: Withdrawn: 8351137: ZGC: Improve ZValueStorage alignment support In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:34:36 GMT, Axel Boldt-Christmas wrote: > ZValueStorage only align the allocations to the alignment defined by the storage but ignores the alignment of the types. Right now all usages of our different storages all have types which have an alignment less than or equal to the alignment set by its storage. > > I wish to improve this so that types with greater alignment than the storage alignment can be used. > > The UB caused by using a type larger than the storage alignment is something I have seen materialise as returning bad address (and crashing) on Windows. > > As we use `utilities/align.hpp` for our alignment utilities we only support power of two alignment, I added extra asserts here because we use the fact that `lcm(x, y) = max(x, y)` if both are powers of two. > > Testing: > * tier 1 through tier 5 Oracle supported platforms > * GHA This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23887 From duke at openjdk.org Thu May 1 05:41:39 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 1 May 2025 05:41:39 GMT Subject: RFR: 8350860: Max GC memory overhead tests Message-ID: The G1 GC metadata has increased from JDK8 to the current tip. When upgrading JDK for an application from JDK8, applications might observe native memory increases. GC is one of the top contributors. Small applications tend to get impacted more significantly. See sample test in description in https://bugs.openjdk.org/browse/JDK-8350860, when heap is 128m, the native memory used by gc can be over 80m. In order to make sure we don't bring dramatic native memory increase while developing G1, adding this metadata guardrail test. The test calculates the native memory based on existing GC usages and provides some headroom. When there are significant increase, the test would fail and we should look back to see if the added native memory make sense. ------------- Commit messages: - Remove trailing whitespaces - 8350860: Max GC memory overhead tests Changes: https://git.openjdk.org/jdk/pull/24981/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24981&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350860 Stats: 174 lines in 1 file changed: 174 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24981.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24981/head:pull/24981 PR: https://git.openjdk.org/jdk/pull/24981 From manc at google.com Thu May 1 07:07:03 2025 From: manc at google.com (Man Cao) Date: Thu, 1 May 2025 00:07:03 -0700 Subject: Request for Feedback and Testing on G1 Heap Resizing Prototype In-Reply-To: <91b4d64f-261c-4355-b6d3-279af4583b1a@oracle.com> References: <6B0649C0-8188-47AB-8EA1-B4A48172898C@oracle.com> <91b4d64f-261c-4355-b6d3-279af4583b1a@oracle.com> Message-ID: Great progress! Thank you, Ivan. Optimistically, many of these changes could make it to JDK 25. I'm happy to do some experiments and provide feedback. Does [1] contain all necessary changes? Will that branch be updated as parts of it merge into master? Some early questions/feedback below. > We also increase the default GCTimeRatio from 12 to 24 [3] (we are choosing 24 but open to suggestions). The existing default causes the heap to shrink too aggressively under the new policy in order to maintain the target GCTimeRatio. A higher default provides a better balance and avoids shrinking heap. This changes the pause overhead target from ~8% (1/13) to 4% (1/25). Would it make G1 expand the heap more aggressively after incremental collections compared to existing behavior? Could you share some early/rough performance numbers about 12 vs 24 with the prototype, such as actual heap sizes, throughput differences? > Additionally, we are removing the heap resizing at the end of the Remark pause which was based on MinHeapFreeRatio and MaxHeapFreeRatio. This resizing of the heap ignores current application behaviour and may lead to pathological cases of repeated concurrent mark cycles: In the new prototype, does the pathological case happen with the default MinHeapFreeRatio=40 MaxHeapFreeRatio=70 value? Or mainly with smaller user-defined values for MinHeapFreeRatio/MaxHeapFreeRatio? Re Thomas's comments: > So if one were to make GCTimeRatio manageable (just for testing > purposes), and made it a float (for better control), changes to it > should reflect on the used heap size in the next few GCs automatically. Making GCTimeRatio manageable sounds like a good idea. Do we plan to do this eventually (why "just for testing purposes")? > A SoftMaxHeapSize implementation based on the discussion in the PR [0] > that only guides IHOP with changes in > ?G1AdaptiveIHOPControl::actual_target_threshold() should be effective > now, but there may be issues with this GCTimeRatio based heap sizing > that would be interesting to explore. If G1 strives to respect GCTimeRatio as the prototype proposes, our existing use cases probably no longer need to set SoftMaxHeapSize (and maintains a separate algorithm to calculate values for SoftMaxHeapSize). Our use case still needs CurrentMaxHeapSize, but it could be followed up in https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-April/051996.html. [1] https://github.com/openjdk/jdk/compare/master...walulyai:jdk:G1HeapResizePolicy -Man On Tue, Apr 29, 2025 at 4:34?AM Thomas Schatzl wrote: > Hi Ivan, > > thanks for working on this! > > Some comments for people (Man, Monica, Kirk) potentially taking this for > a spin: > > On 29.04.25 12:46, Ivan Walulya wrote: > > As part of our preparations for AHS, we are prototyping changes to the > > G1 heap resizing policy to improve the effectiveness of the GCTimeRatio > > [1]. The GCTimeRatio is set to manage the balance between GC time and > > Application execution time. G1's current implementation of GCTimeRatio > > appears to have drifted from its intended purpose over time. It may no > > longer accurately guide heap sizing in response to GC overhead. > > Therefore, we need to change this mechanism with the goal that G1 better > > manages heap sizes without the need for additional tuning knobs. > > > The prototype allows both expansion and shrinking of the heap at > the end > > of any GC, as opposed to the current behavior where shrinking is only > > allowed at Remark or Full GC pauses [2]. We also increase the default > > GCTimeRatio from 12 to 24 [3] (we are choosing 24 but open to > > suggestions). The existing default causes the heap to shrink too > > aggressively under the new policy in order to maintain the target > > GCTimeRatio. A higher default provides a better balance and avoids > > shrinking heap. > > So if one were to make GCTimeRatio manageable (just for testing > purposes), and made it a float (for better control), changes to it > should reflect on the used heap size in the next few GCs automatically. > > A SoftMaxHeapSize implementation based on the discussion in the PR [0] > that only guides IHOP with changes in > ?G1AdaptiveIHOPControl::actual_target_threshold() should be effective > now, but there may be issues with this GCTimeRatio based heap sizing > that would be interesting to explore. > > Additionally, we are removing the heap resizing at the end of the Remark > > pause which was based on MinHeapFreeRatio and MaxHeapFreeRatio. This > > resizing of the heap ignores current application behaviour and may lead > > to pathological cases of repeated concurrent mark cycles: > > > > * we shrink the heap at remark, > > * a smaller heap triggers a concurrent marking in the subsequent > > GCs as well as expanding the heap > > * the concurrent cycle ends in another remark pause where the > > cycle restarts. > > > > > > We keep this MinHeapFreeRatio-MaxHeapFreeRatio based resizing logic at > > the end of Full GC. > > The use case for this might be ones similar to CraC to temporarily > compact the heap as much as possible; however it might be better to have > explicit control for that (e.g. a jcmd). > > Ultimately there may be need to remove it as well for full gcs, > replacing it with something else. > > As a result of these changes, applications may settle at more > > appropriate and in some cases smaller heap sizes for a given > > GCTimeRatio. While this may show as regression in some benchmarks that > > are sensitive to heap size, it is still improved control over GC > behaviour. > > > > We are requesting for feedback or testing of these changes before > > propose to merge them with mainline. > > > > Some of the changes that are independent of the GCTimeRatio are already > > out for review [4, 5], other minor fixes will be split out and pushed > > independently. > > > [0] https://github.com/openjdk/jdk/pull/24211 > > Hth, > Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iwalulya at openjdk.org Thu May 1 08:27:32 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 1 May 2025 08:27:32 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: Message-ID: > Hi, > > Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Albert review - Merge branch 'master' into 8355681-find_contiguous_allow_expand - init ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24915/files - new: https://git.openjdk.org/jdk/pull/24915/files/5e8e4a73..5085e54b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24915&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24915&range=00-01 Stats: 13278 lines in 385 files changed: 9400 ins; 1696 del; 2182 mod Patch: https://git.openjdk.org/jdk/pull/24915.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24915/head:pull/24915 PR: https://git.openjdk.org/jdk/pull/24915 From iwalulya at openjdk.org Thu May 1 08:27:33 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 1 May 2025 08:27:33 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: <8OFJ2lP5ECUqK6bh56ThD1jUJfXGb6UHXh0rrD6XptU=.4ad9e344-dffe-4ed8-8188-ea470fb4cb4a@github.com> Message-ID: On Wed, 30 Apr 2025 13:57:45 GMT, Albert Mingkun Yang wrote: >> I added this instead of an assert on failing `expand_and_allocate` for humongous objects, but then figured we could just skip the `expand_and_allocate` attempt which is guaranteed to fail. > > Not sure what to write in a ticket. Those are just some questions I had while reading the coed. Anyway, if this part is not supper related to the actual functional change, can it be dealt with in its own PR? I have removed that check. Will be done with the clean up of `attempt_allocation_at_safepoint` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2069982724 From thomas.schatzl at oracle.com Thu May 1 09:55:00 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 1 May 2025 11:55:00 +0200 Subject: Request for Feedback and Testing on G1 Heap Resizing Prototype In-Reply-To: References: <6B0649C0-8188-47AB-8EA1-B4A48172898C@oracle.com> <91b4d64f-261c-4355-b6d3-279af4583b1a@oracle.com> Message-ID: Hi Man, On 01.05.25 09:07, Man Cao wrote: > Great progress! Thank you, Ivan. Optimistically, many of these changes > could make it to JDK 25. > > I'm happy to do some experiments and provide feedback. Does [1] contain > all necessary changes? Afaik yes. > Will that branch be updated as parts of it merge > into master? Ivan can answer that. > > Some early questions/feedback?below. > > > We also increase the default GCTimeRatio from 12 to 24 [3] (we are > > choosing 24 but open to suggestions). The existing default causes the > > heap to shrink too aggressively under the new policy in order to > > maintain the target GCTimeRatio. A higher default provides a better > > balance and avoids shrinking heap. >> This changes the pause overhead target from ~8% (1/13) to 4% (1/25). > Would it make?G1 expand the heap more aggressively after incremental > collections compared to existing behavior? Could you share some early/ > rough performance numbers about 12 vs 24 with the prototype, such as > actual heap sizes, throughput?differences? That value has been found to create roughly same heap sizes at around the same performance +/- 1-2% throughput across our set of benchmarks that run out-of-box (iirc). Again, Ivan may chime in here. Part of this request for feedback is about getting a larger coverage on this aspect. (The increase in GCTimeRatio has actually been something that is long overdue regardless of this change, since G1's overhead decreased a lot in recent years.) [...] > Re Thomas's comments: > > > So if one were to make GCTimeRatio manageable (just for testing > > purposes), and made it a float (for better control), changes to it > > should reflect on the used heap size in the next few GCs automatically. > > Making GCTimeRatio manageable sounds like a good idea. Do we plan to do > this eventually (why "just for testing purposes")? > It's just not implemented in that branch :) Currently we think that GCTimeRatio will eventually get manageable and likely using floats as the integers are too coarse as divisors as the . Probably as a follow-up. There is still the question whether to deprecate it in favor of some GCCPUUsagePercent or whatever it is going to be called to have more direct control. > > A SoftMaxHeapSize implementation based on the discussion in the PR [0] > > that only guides IHOP with changes in > > ?G1AdaptiveIHOPControl::actual_target_threshold() should be effective > > now, but there may be issues with this GCTimeRatio based heap sizing > > that would be interesting to explore. > > If G1 strives to respect GCTimeRatio as the prototype proposes, our > existing use cases probably no longer need to set SoftMaxHeapSize (and > maintains a separate algorithm to calculate values for SoftMaxHeapSize). The purpose of this request is also to understand whether SoftMaxHeapSize is still necessary :) Sizing based on cpu usage may be more inconvenient and less exact at reducing to a particular target heap size (without OOME'ing) than a direct target heap size. (i.e. I can imagine the case where while the threshold is kind of a continuum, for a collector small changes to heap sizes can lead to radical changes in CPU usage, so G1 might flap back and forth all the time). We also do not have real use cases with real applications where we would temporarily want to keep the heap below a certain value like we think you suggested. Ignoring CraC like use cases where specific functionality would serve that use case better, one current use of SoftMaxHeapSize here is for tuning ZGC performance (since SoftMaxHeapSize is only implemented there), but we do not have seen uses for G1 (obviously as it's not implemented there, and one can use G1ReservePercent to some degree). Note that G1 already has this G1ReservePercent that somewhat already acts like that, so there is a certain overlap that might need some resolving (and its default is too high for large heaps anyway). Or just changed to be adaptive. SoftMaxHeapSize may also be necessary in some cases where there is more information available from the outside than AHS can ever know. So it can be worthwhile experimenting with SMHS anyway. I will update that umbrella CR with new thoughts in the next few days. Thanks, Thomas From ayang at openjdk.org Thu May 1 09:48:47 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 May 2025 09:48:47 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 08:27:32 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Albert review > - Merge branch 'master' into 8355681-find_contiguous_allow_expand > - init Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24915#pullrequestreview-2809538391 From tschatzl at openjdk.org Thu May 1 09:58:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 1 May 2025 09:58:46 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 08:27:32 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Albert review > - Merge branch 'master' into 8355681-find_contiguous_allow_expand > - init Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24915#pullrequestreview-2809548851 From wkemper at openjdk.org Thu May 1 17:43:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 1 May 2025 17:43:59 GMT Subject: Integrated: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: Message-ID: <3bV-rGkGRHjkUNAEElE0_aSdO8t81oMrd88bjWmZY6Y=.0df6a3b1-29db-4592-aa42-7d3d15684455@github.com> On Fri, 25 Apr 2025 20:40:09 GMT, William Kemper wrote: > Add a test case for `-XX:+UseCompactObjectHeaders`, increase pressure on old generation. I ran the test (which includes a compact object headers case now) fifty times without failure. This pull request has now been integrated. Changeset: 9e26b9fa Author: William Kemper URL: https://git.openjdk.org/jdk/commit/9e26b9facba09c4d6f516e8032b876c6d9e95e9e Stats: 24 lines in 1 file changed: 15 ins; 8 del; 1 mod 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/24888 From mbaesken at openjdk.org Fri May 2 06:39:52 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 2 May 2025 06:39:52 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: <_6MD1OrkbiBPcjVkKGXvlH4xGplX11i7L_FAYKXZls8=.1a8d7276-7eac-443c-aa74-a45a3ef65e17@github.com> Message-ID: On Wed, 30 Apr 2025 15:36:32 GMT, William Kemper wrote: > have you had a chance to retest after PR#24940 was integrated? Did not see the issue again after this (of course this is no 'proof' that they will never come back), so I would say looks good ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24888#issuecomment-2846479688 From iwalulya at openjdk.org Fri May 2 12:56:51 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 2 May 2025 12:56:51 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 09:46:17 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Albert review >> - Merge branch 'master' into 8355681-find_contiguous_allow_expand >> - init > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/24915#issuecomment-2847135344 From iwalulya at openjdk.org Fri May 2 12:56:52 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 2 May 2025 12:56:52 GMT Subject: Integrated: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 10:57:48 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. > > Testing: Tier 1-3 This pull request has now been integrated. Changeset: 995d5416 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/995d54161fed657f38753813f55d0591e77a42e3 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/24915 From ayang at openjdk.org Fri May 2 18:41:53 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 2 May 2025 18:41:53 GMT Subject: RFR: 8350621: Code cache stops scheduling GC In-Reply-To: References: Message-ID: On Sun, 16 Feb 2025 18:39:29 GMT, Alexandre Jacob wrote: > The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`. > > This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling. > > Unfortunately this can't work properly under certain circumstances. > For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)). > > I have observed this behavior on JVM in version 21 that were migrated recently from java 17. > Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen. > > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. > > In order to reproduce this issue, I found a very simple and convenient way: > > > public class CodeCacheMain { > public static void main(String[] args) throws InterruptedException { > while (true) { > Thread.sleep(100); > } > } > } > > > Run this simple app with the following JVM flags: > > > -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15 > > > - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC > - low `ReservedCodeCacheSize` to put pressure on code cache quickly > - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction > > Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will: > - allows us to monitor code cache > - indirectly generate activity on the code cache, just what we need to reproduce the bug > > Some logs related to code cache will show up at some point with GC activity: > > > [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory > > > And then it will stop and we'll end up with the following message: > > > [672.714s][info][codecache ] Code cache is full - disabling compilation > > > L... I have a question regarding the existing code/logic. // In case the GC is concurrent, we make sure only one thread requests the GC. if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { log_info(codecache)("Triggering aggressive GC due to having only %.3f%% free memory", free_ratio * 100.0); Universe::heap()->collect(GCCause::_codecache_GC_aggressive); } Why making sure only one thread calls `collect(...)`? I believe this API can be invoked concurrently. Would removing `_unloading_threshold_gc_requested` resolve this problem? > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. For ParallelGC, `ParallelScavengeHeap::collect` contains the following to ensure `System.gc` gccause and similar ones guarantee a full-gc. if (!GCCause::is_explicit_full_gc(cause)) { return; } However, the current logic that a young-gc can cancel a full-gc (`_codecache_GC_aggressive` in this case) also seems surprising. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2847860414 From aboldtch at openjdk.org Mon May 5 07:52:46 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 5 May 2025 07:52:46 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation [v2] In-Reply-To: References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: <_Mn9Z5l3XKaL0wmF0p2Zj4xzonQU1RDJt-AKhufIoaM=.2bdbcb5b-bee9-4b3e-822d-9ff177e4ac54@github.com> On Tue, 29 Apr 2025 23:58:36 GMT, Tom Rodriguez wrote: >> JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into tkr-zgc-deoptimize-allocation > - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24957#pullrequestreview-2814052474 From shade at openjdk.org Mon May 5 09:49:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 09:49:50 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Looking for more Reviewers, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2850471947 From iwalulya at openjdk.org Mon May 5 09:50:33 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 5 May 2025 09:50:33 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v3] In-Reply-To: References: Message-ID: > Hi, > > Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - use align_up_to_region_byte_size - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount - Thomas Review - nit - refactor full collection ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24944/files - new: https://git.openjdk.org/jdk/pull/24944/files/6ef77f71..4cde5315 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24944&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24944&range=01-02 Stats: 19999 lines in 573 files changed: 15129 ins; 2714 del; 2156 mod Patch: https://git.openjdk.org/jdk/pull/24944.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24944/head:pull/24944 PR: https://git.openjdk.org/jdk/pull/24944 From jsikstro at openjdk.org Mon May 5 09:50:55 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 5 May 2025 09:50:55 GMT Subject: RFR: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests Message-ID: Hello, There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. ------------- Commit messages: - Also rename ZTestEntryCompare - 8356083: ZGC: Duplicate ZTestEntry symbols in gtests Changes: https://git.openjdk.org/jdk/pull/25029/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25029&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356083 Stats: 52 lines in 1 file changed: 0 ins; 0 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/25029.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25029/head:pull/25029 PR: https://git.openjdk.org/jdk/pull/25029 From ayang at openjdk.org Mon May 5 10:39:54 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 5 May 2025 10:39:54 GMT Subject: RFR: 8356157: Remove retry loop in collect of SerialHeap and ParallelScavengeHeap Message-ID: Simple removing unnecessary retrying logic because an gc-operation will run-to-completion, guaranteeing the increment of corresponding counters. Test: tier1-3 ------------- Commit messages: - remove-systemgc-loop Changes: https://git.openjdk.org/jdk/pull/25032/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25032&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356157 Stats: 33 lines in 2 files changed: 0 ins; 26 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25032/head:pull/25032 PR: https://git.openjdk.org/jdk/pull/25032 From aboldtch at openjdk.org Mon May 5 11:21:46 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 5 May 2025 11:21:46 GMT Subject: RFR: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:43:50 GMT, Joel Sikstr?m wrote: > Hello, > > There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. > > To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. > > I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25029#pullrequestreview-2814492407 From tschatzl at openjdk.org Mon May 5 11:34:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 5 May 2025 11:34:52 GMT Subject: RFR: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:43:50 GMT, Joel Sikstr?m wrote: > Hello, > > There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. > > To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. > > I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25029#pullrequestreview-2814518370 From tschatzl at openjdk.org Mon May 5 11:34:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 5 May 2025 11:34:55 GMT Subject: RFR: 8355756: G1HeapSizingPolicy::full_collection_resize_amount should consider allocation size [v3] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:50:33 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - use align_up_to_region_byte_size > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - Thomas Review > - nit > - refactor full collection Maybe update that suggested comment (sorry, missed pointing that out earlier), but good. src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 484: > 482: // compacting collection, leaving no dead wood. > 483: // - if allocation_word_size is set, then this allocation size will > 484: // be accounted for in case shrinking of the heap happens. Suggestion: // - allocation_word_size is the size allocation that caused this collection. // To be considered when resizing the heap at the end of the full collection. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24944#pullrequestreview-2814512672 PR Review Comment: https://git.openjdk.org/jdk/pull/24944#discussion_r2073273737 From aboldtch at openjdk.org Mon May 5 12:18:45 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 5 May 2025 12:18:45 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 02:29:34 GMT, Quan Anh Mai wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > What I meant is that we should map a relocation to BOTH the instruction start and the patch site. APX has not even released yet so I think it is more efficient to make a better fix than to make a quicker one. I think @merykitty solution with two different relocations based on wether we support APX or not. And only emit the after and nop when `VM_Version::supports_apx_f()` is true. On the other hand maybe we can solve this with a minimal change by simply looking for the REX2 prefix when we patch the code. Something along the line of: diff --git a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp index 9cdf0b229c0..4a956b450bd 100644 --- a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp @@ -1328,7 +1328,13 @@ void ZBarrierSetAssembler::patch_barrier_relocation(address addr, int format) { const uint16_t value = patch_barrier_relocation_value(format); uint8_t* const patch_addr = (uint8_t*)addr + offset; if (format == ZBarrierRelocationFormatLoadGoodBeforeShl) { - *patch_addr = (uint8_t)value; + if (VM_Version::supports_apx_f()) { + NativeInstruction* instruction = nativeInstruction_at(addr); + uint8_t* const rex2_patch_addr = patch_addr + (instruction->has_rex2_prefix() ? 1 : 0); + *rex2_patch_addr = (uint8_t)value; + } else { + *patch_addr = (uint8_t)value; + } } else { *(uint16_t*)patch_addr = value; } As for the solution to have the relocation point at the entry. While they were not designed to be used this way, It looks like it works. (At least from a barrier patching point of view, as we only want to iterate over all relocations, never map a PC to an relocation). But changing invariants are scary. And is probably better to evaluate as a part of the [JDK-8355341](https://bugs.openjdk.org/browse/JDK-8355341) RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2850807205 From coleenp at openjdk.org Mon May 5 12:26:26 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 5 May 2025 12:26:26 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom Message-ID: Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. Tested with tier5-7 with vmTestbase tests that use this package. ------------- Commit messages: - 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom Changes: https://git.openjdk.org/jdk/pull/25034/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25034&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330022 Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25034/head:pull/25034 PR: https://git.openjdk.org/jdk/pull/25034 From eosterlund at openjdk.org Mon May 5 13:14:53 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 5 May 2025 13:14:53 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation [v2] In-Reply-To: References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: On Tue, 29 Apr 2025 23:58:36 GMT, Tom Rodriguez wrote: >> JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into tkr-zgc-deoptimize-allocation > - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Good stuff. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24957#pullrequestreview-2814769629 From iwalulya at openjdk.org Mon May 5 14:02:54 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 5 May 2025 14:02:54 GMT Subject: RFR: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms Message-ID: Hi, Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). Testing: gha, manual testing as below: Mainline: [3.740s][info ][gc,init ] Heap Min Capacity: 150G [3.740s][info ][gc,init ] Heap Initial Capacity: 150G [3.740s][info ][gc,init ] Heap Max Capacity: 150G . . [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B . . [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms With patch (No shrinking when -Xms == -Xms): [3.753s][info ][gc,init ] Heap Min Capacity: 150G [3.753s][info ][gc,init ] Heap Initial Capacity: 150G [3.753s][info ][gc,init ] Heap Max Capacity: 150G . . [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms With patch (Shrinking when -Xms != -Xms): [3.755s][info ][gc,init ] Heap Min Capacity: 153568M [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M [3.755s][info ][gc,init ] Heap Max Capacity: 150G . . [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) . . [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms ------------- Commit messages: - init Changes: https://git.openjdk.org/jdk/pull/25036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308854 Stats: 16 lines in 1 file changed: 11 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25036.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25036/head:pull/25036 PR: https://git.openjdk.org/jdk/pull/25036 From duke at openjdk.org Mon May 5 16:03:59 2025 From: duke at openjdk.org (duke) Date: Mon, 5 May 2025 16:03:59 GMT Subject: Withdrawn: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 09:59:44 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands, which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::has_initial_implicit_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (measured in percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo chopin benchmarks: > > ![C2-inc-hit-rate-jdk-25+1-vs-jdk-25+1-with-8345067](https://github.com/user-attachments/assets/8d114058-c6b2-4254-a374-0d0b220af718) > > The larger number of implicit null checks results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > A further extension of the optimization to arbitrary memory access instructions (including e.g. G1 object stores, which emit multiple memory accesses at arbitrary address offsets) will be investigated separately as part of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627). > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22678 From kvn at openjdk.org Mon May 5 16:14:49 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 16:14:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: <1TLtkRe2ydHcPB5lnREFbmF4hlQ4rOBHyNXbplFujM0=.427f9764-dda9-41e4-a228-95f47426cf25@github.com> On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Looks fine to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2815333573 From shade at openjdk.org Mon May 5 16:55:49 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 16:55:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Thank you! I'll wait a bit if @kimbarrett is able to confirm this matches the idea he had back in JBS comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2851636080 From never at openjdk.org Mon May 5 17:28:54 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 5 May 2025 17:28:54 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation [v2] In-Reply-To: References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: On Tue, 29 Apr 2025 23:58:36 GMT, Tom Rodriguez wrote: >> JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into tkr-zgc-deoptimize-allocation > - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24957#issuecomment-2851729694 From never at openjdk.org Mon May 5 17:28:54 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 5 May 2025 17:28:54 GMT Subject: Integrated: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation In-Reply-To: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: On Tue, 29 Apr 2025 23:46:51 GMT, Tom Rodriguez wrote: > JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. This pull request has now been integrated. Changeset: cc34135f Author: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/cc34135fff7650ad44c910dca0fd47e9cbd56b68 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Reviewed-by: aboldtch, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24957 From tschatzl at openjdk.org Tue May 6 08:17:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 6 May 2025 08:17:20 GMT Subject: RFR: 8356157: Remove retry loop in collect of SerialHeap and ParallelScavengeHeap In-Reply-To: References: Message-ID: <9hvwdTVHqbOVeikfixaFuzpbVpRW4Lxc0rdVbTBb7yE=.4bc00f1b-933d-4920-946e-93e3d06411a4@github.com> On Mon, 5 May 2025 10:36:11 GMT, Albert Mingkun Yang wrote: > Simple removing unnecessary retrying logic because an gc-operation will run-to-completion, guaranteeing the increment of corresponding counters. > > Test: tier1-3 lgtm ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25032#pullrequestreview-2817313107 From tschatzl at openjdk.org Tue May 6 08:19:17 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 6 May 2025 08:19:17 GMT Subject: RFR: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:29:02 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. > > This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). > > Testing: gha, manual testing as below: > > Mainline: > > > [3.740s][info ][gc,init ] Heap Min Capacity: 150G > [3.740s][info ][gc,init ] Heap Initial Capacity: 150G > [3.740s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B > . > . > [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms > > With patch (No shrinking when -Xms == -Xms): > > > [3.753s][info ][gc,init ] Heap Min Capacity: 150G > [3.753s][info ][gc,init ] Heap Initial Capacity: 150G > [3.753s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms > > With patch (Shrinking when -Xms != -Xms): > > > [3.755s][info ][gc,init ] Heap Min Capacity: 153568M > [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M > [3.755s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) > . > . > [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25036#pullrequestreview-2817316832 From tschatzl at openjdk.org Tue May 6 08:21:17 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 6 May 2025 08:21:17 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25034#pullrequestreview-2817322299 From sjohanss at openjdk.org Tue May 6 08:58:31 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 6 May 2025 08:58:31 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v2] In-Reply-To: References: Message-ID: <-65uM_iyhoKOhBmmcPjSJHkeab7WGzclKGNkfWGcl_c=.b17052b2-4212-44ad-b302-1eb52293bc49@github.com> > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Default TLAB size of 8k, avoid 0 updates and reasonable starting values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/0c1f6eed..76c79f5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=00-01 Stats: 17 lines in 4 files changed: 13 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From jsikstro at openjdk.org Tue May 6 09:53:21 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 6 May 2025 09:53:21 GMT Subject: RFR: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:43:50 GMT, Joel Sikstr?m wrote: > Hello, > > There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. > > To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. > > I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25029#issuecomment-2853936758 From jsikstro at openjdk.org Tue May 6 09:53:21 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 6 May 2025 09:53:21 GMT Subject: Integrated: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:43:50 GMT, Joel Sikstr?m wrote: > Hello, > > There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. > > To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. > > I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. This pull request has now been integrated. Changeset: ecfaf354 Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/ecfaf354d761bc7034ea8783f4428157ea450207 Stats: 52 lines in 1 file changed: 0 ins; 0 del; 52 mod 8356083: ZGC: Duplicate ZTestEntry symbols in gtests Reviewed-by: aboldtch, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/25029 From jbhateja at openjdk.org Tue May 6 09:55:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 09:55:21 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > Member Hi @xmas92, Your suggestion looks good to me for this bugfix. I think we can improve upon the existing implementation as part of JDK-8355341 since its a bigger change and also include graal byein. There is still a possibility of incorrect relocation sharing with subsequent relocatable instructions in other cases, e.g. OR instruction for which we bookkeep the relocation address from the end of the instruction, and it's the last instruction in the pointer coloring primitive. For this bug fix, your suggestion looks fine to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2853945841 From jbhateja at openjdk.org Tue May 6 10:21:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 10:21:54 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24919/files - new: https://git.openjdk.org/jdk/pull/24919/files/1f9c84c8..fc3b61e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24919&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24919&range=00-01 Stats: 25 lines in 4 files changed: 11 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24919/head:pull/24919 PR: https://git.openjdk.org/jdk/pull/24919 From rcastanedalo at openjdk.org Tue May 6 16:42:31 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 16:42:31 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads Message-ID: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. #### Testing - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). ------------- Commit messages: - Format - Remove extra line - Further clarify zLoadP candidate predicate and no-preceding-lea assertion - Rename machine node property to ins_is_late_expanded_null_check_candidate for clarity, and make it a total function - Update copyright year - Revert unnecessary changes - Move check to original location - Enable zLoadP as implicit null check candidates on riscv and ppc - Refactor assertion - Simplify test - ... and 15 more: https://git.openjdk.org/jdk/compare/e2ae50d8...dc5aa4fc Changes: https://git.openjdk.org/jdk/pull/25066/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345067 Stats: 385 lines in 15 files changed: 338 ins; 37 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From matsaave at openjdk.org Tue May 6 18:05:17 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 6 May 2025 18:05:17 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. LGTM! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25034#pullrequestreview-2819191469 From kvn at openjdk.org Tue May 6 18:10:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 May 2025 18:10:20 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Why the attribute is not set for `zLoadP` on x64? ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2819201282 From rcastanedalo at openjdk.org Tue May 6 19:00:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 19:00:18 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 18:07:17 GMT, Vladimir Kozlov wrote: > Why the attribute is not set for `zLoadP` on x64? `ins_is_late_expanded_null_check_candidate` is set to `true` for `zLoadP` in [src/hotspot/cpu/x86/gc/z/z_x86_64.ad (line 121)](https://github.com/openjdk/jdk/pull/25066/files#diff-183d5784f9317f5582b267d82e7afa4e23ae137671fab8ba9cb5b502dae52b3dR121), or did I misunderstand your question? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2855603683 From coleenp at openjdk.org Tue May 6 19:04:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 6 May 2025 19:04:19 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. Thanks for reviewing, Thomas and Matias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25034#issuecomment-2855609574 From coleenp at openjdk.org Tue May 6 19:04:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 6 May 2025 19:04:19 GMT Subject: Integrated: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. This pull request has now been integrated. Changeset: 4977588d Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/4977588d5e3424282f40209590737a487747095d Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom Co-authored-by: David Leopoldseder Reviewed-by: tschatzl, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/25034 From aboldtch at openjdk.org Wed May 7 06:15:14 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 7 May 2025 06:15:14 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding As I cannot test this on APX enabled hardware, I will leave the testing and verifying that this approach works up to you. But the change looks good, and it maintains the original behaviour for none APX enabled hardware. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24919#pullrequestreview-2820461864 From jbhateja at openjdk.org Wed May 7 06:19:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 May 2025 06:19:17 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Hi @TobiHartmann , @eme64 , can you kindly run this version through your test infra. This is an APX-specific issue. I have verified its correctness using SDE, both following tests are now passing. https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/compiler/c2/irTests/gc ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2857197887 From thartmann at openjdk.org Wed May 7 07:48:16 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 May 2025 07:48:16 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: <1gGtDEUALoWyrLQwwRD9bo2wb55O5Lh2DTnWTXQ8Oe8=.45ef5737-2ea6-4179-a998-79d8d51aca13@github.com> On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Sure, I'll run it through testing and report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2857462391 From sjohanss at openjdk.org Wed May 7 10:41:57 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 7 May 2025 10:41:57 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v3] In-Reply-To: References: Message-ID: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with three additional commits since the last revision: - Problemlist heap sampling test - Keep all TLAB tracking in TLABUsage - Revert initial value for TLABUsage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/76c79f5c..f361fc5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=01-02 Stats: 97 lines in 9 files changed: 33 ins; 37 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From coleenp at openjdk.org Wed May 7 20:33:56 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 7 May 2025 20:33:56 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops This is a cleaner way to do this. I believe it's what we discussed with Kim. He can confirm. Some questions and comments and a small nit. src/hotspot/share/compiler/compileBroker.cpp line 1697: > 1695: JavaThread* thread = JavaThread::current(); > 1696: > 1697: methodHandle method(thread, task->method()); I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 35: > 33: #include "oops/weakHandle.inline.hpp" > 34: > 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { This should initialize method in the ctor initializer list. src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 51: > 49: // Method holder class cannot be unloaded. > 50: return nullptr; > 51: } This is nice that this doesn't require creating a jni handle for unloadable class loaders with this change. src/hotspot/share/runtime/vmStructs.cpp line 1266: > 1264: declare_toplevel_type(CDSFileMapRegion) \ > 1265: declare_toplevel_type(UpcallStub::FrameData) \ > 1266: declare_toplevel_type(UnloadableMethodHandle) \ So are these left for the async profiler? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2823027214 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078430169 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078443576 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078379288 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078446115 From aboldtch at openjdk.org Thu May 8 05:25:01 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 8 May 2025 05:25:01 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree Message-ID: [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. ------------- Commit messages: - 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree Changes: https://git.openjdk.org/jdk/pull/25112/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356455 Stats: 2158 lines in 5 files changed: 97 ins; 2026 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/25112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25112/head:pull/25112 PR: https://git.openjdk.org/jdk/pull/25112 From eosterlund at openjdk.org Thu May 8 07:55:50 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 8 May 2025 07:55:50 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree In-Reply-To: References: Message-ID: On Thu, 8 May 2025 05:21:20 GMT, Axel Boldt-Christmas wrote: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2824168024 From jsikstro at openjdk.org Thu May 8 09:14:52 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 8 May 2025 09:14:52 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree In-Reply-To: References: Message-ID: On Thu, 8 May 2025 05:21:20 GMT, Axel Boldt-Christmas wrote: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2824384128 From sjohanss at openjdk.org Thu May 8 10:06:41 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 8 May 2025 10:06:41 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v4] In-Reply-To: References: Message-ID: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Handle inc and dec in alloc/undo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/f361fc5d..ba7cb673 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=02-03 Stats: 60 lines in 6 files changed: 37 ins; 14 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From epeter at openjdk.org Thu May 8 11:29:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:01 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). @roberto Thanks a lot for taking the time to explain how implicit null checks work, and giving me some background for the PR :) Below, I have mostly code style / naming suggestions, that you are welcome to use as inspiration. But you do not have to apply any of them, it is totally up to you :) I'm definitely not an expert here, but your approach seems reasonable to me. The opt-in annotation `ins_is_late_expanded_null_check_candidate` makes sure we only do the optimization when we are sure it is ok. It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations. In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. But I suppose that cannot happen, because the GC only moves the pointer, so if the old pointer was non-null, the new pointer must be non-null as well. Maybe that was so trivial that you did not even understand my question there ? But it could be helpful to write that down somewhere, just to make sure people are aware of this. I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer is still valid in the barrier, and if not, we somehow directly translate the old pointer to a new pointer? Is that what the oop map is used for? src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 130: > 128: Address::offset_ok_for_immed(ref_addr.offset(), exact_log2(size)), > 129: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); > 130: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); For context: 132 /* Sometimes we get misaligned loads and stores, usually from Unsafe 133 accesses, and these can exceed the offset range. */ 134 Address legitimize_address(const Address &a, int size, Register scratch) { 135 if (a.getMode() == Address::base_plus_offset) { 136 if (! Address::offset_ok_for_immed(a.offset(), exact_log2(size))) { 137 block_comment("legitimize_address {"); 138 lea(scratch, a); 139 block_comment("} legitimize_address"); 140 return Address(scratch); 141 } 142 } 143 return a; 144 } I wonder if it might be worth to create a `legitimize_address_requires_lea` that does the checks. Then you could refactor `legitimize_address` with it, and also use it here. Not sure if it is worth it, but it could ensure that the checks stay in sync. Up to you. src/hotspot/share/opto/block.hpp line 468: > 466: > 467: // If necessary, hoist orphan node n into the end of block b. > 468: void maybe_hoist_into(Node* n, Block* b); Hmm. It is "if necessary" or "if possible"? I wonder if we could come up with a name that is a little longer and expresses this condition? src/hotspot/share/opto/lcm.cpp line 79: > 77: } > 78: > 79: void PhaseCFG::move_into(Node* n, Block* b) { Suggestion: void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { src/hotspot/share/opto/lcm.cpp line 89: > 87: if (!out->is_MachProj()) { > 88: continue; > 89: } What about the `MachTemp`? Also: how specific to implicit null checks are your methods `move_into` and `maybe_hoist_into`? If they are not reusable elsewhere, it may be good to give them a more specific name. src/hotspot/share/opto/lcm.cpp line 105: > 103: "need for recursive hoisting not expected"); > 104: move_into(n, b); > 105: } Do I understand this right: You are looking at some input `n` here, and want to make sure that it is located at `b` or before? Suggestion to make it a bit more clear: Suggestion: // We want to ensure that n happens at b or before, i.e. at a block that dominates b. void PhaseCFG::ensure_node_is_at_block_or_before(Node* n, Block* b) { Block* current = get_block_for_node(n); if (current->dominates(b)) { return; // n already happens before b, do nothing. } // We only expect nodes without further inputs, like MachTemp or load Base. assert(n->req() == 0 || (n->req() == 1 && n->in(0) == (Node*)C->root()), "need for recursive hoisting not expected"); assert(b->dominates(current), "precondition: can only move n to b if b dominates n"); move_node_and_its_projections_to_block(n, b); } I did not understand what this meant: `sanity check: temp node placement`... Ah, I suppose we are assuming that `n` is a `MachTemp`, and this would have to be placed in a block dominated by b? But could `n` not also be a `load Base`? Could that be a `MachProj`? Just a little confused here. Maybe moving the `b->dominates(current)` assert down helps give good context? But in a sense, it is also a precondition, we can only move `n` up to `b` if `b` dominates `n`... Do you have a better idea? src/hotspot/share/opto/lcm.cpp line 356: > 354: if (mach->in(j)->is_MachTemp()) { > 355: assert(mach->in(j)->outcnt() == 1, "MachTemp nodes should not be shared"); > 356: // Ignore MachTemp inputs, they can be safely hoisted with the candidate. Suggestion: // Ignore MachTemp inputs, they can be safely hoisted with the candidate. // MachTemp have no inputs themselves and are only there to reserve a scratch // register for the GC barrier of the memory operation. That was what you told me in our offline meeting, I thought it was helpful context information. src/hotspot/share/opto/lcm.cpp line 428: > 426: maybe_hoist_into(val->in(i), block); > 427: } > 428: move_into(val, block); Suggestion: // Inputs of val may already be early enough, but if not move them together with val. ensure_node_is_at_block_or_before(val->in(i), block); } move_node_and_its_projections_to_block(val, block); src/hotspot/share/opto/lcm.cpp line 437: > 435: if (n == nullptr || !n->is_MachTemp()) { > 436: continue; > 437: } Do you want to check that all other nodes already dominate `block`? src/hotspot/share/opto/lcm.cpp line 439: > 437: } > 438: maybe_hoist_into(n, block); > 439: } It seems to me this is definitely new code, ensuring that we move the `MachTemp`. We did not do that before, at least not here. Correct? src/hotspot/share/opto/lcm.cpp line 441: > 439: map_node_to_block(n, block); > 440: } > 441: } This now happens in `move_into`, right? src/hotspot/share/opto/machnode.hpp line 391: > 389: > 390: // Whether this node is expanded during code emission into a sequence of > 391: // instructions and the first instruction can perform an implicit null check. You may want to put a warning / reasoning here, in case there are multiple loads. You explained to me offline that a `zLoadP` may have a load at the beginning, but then need to load again if the GC moved the object. I suppose if it was moved, then it cannot be null, and so that should be safe... maybe that is a sufficient argument, what do you think? test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 51: > 49: * @requires vm.gc.Z > 50: * @run driver compiler.gcbarriers.TestImplicitNullChecks Z > 51: */ Do you think there would be any value in having a run without requirements? Just for general result verification... i.e. that we get the correct NullPointerException. Of course, you would have to probably add `applyIf` to the `@IR` rules. test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 119: > 117: testLoad(o); > 118: } catch (NullPointerException e) { nullPointerException = true; } > 119: Asserts.assertTrue(nullPointerException); Suggestion: try { testLoad(o); throw new RuntimeException("Should have thrown NullPointerException"); } catch (NullPointerException e) { /* expected */} Could be a shorter alternative. Up to you. Maybe there is a benefit to `Asserts.assertTrue` I am also not aware of? But totally optional, as your approach works anyway :) test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 140: > 138: // G1 and ZGC stores cannot be currently used to implement implicit null > 139: // checks, because they expand into multiple memory access instructions that > 140: // are not necessarily located at the initial instruction start address. Very random idea, no idea if it is any good: Why not do the implicit null-check with a fake Load? No idea on the implications here. I suppose it would be extra code, but at least not branching code? ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2824535603 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079357655 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079437197 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079476518 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079430920 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079473986 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079420601 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079480978 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079486097 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079509053 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079488019 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079491319 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079493683 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079500275 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079505342 From epeter at openjdk.org Thu May 8 11:29:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 10:21:14 GMT, Emanuel Peter wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > src/hotspot/share/opto/block.hpp line 468: > >> 466: >> 467: // If necessary, hoist orphan node n into the end of block b. >> 468: void maybe_hoist_into(Node* n, Block* b); > > Hmm. It is "if necessary" or "if possible"? > I wonder if we could come up with a name that is a little longer and expresses this condition? Ah no, I'm starting to understand that it is rather a `if necessary`... > src/hotspot/share/opto/lcm.cpp line 428: > >> 426: maybe_hoist_into(val->in(i), block); >> 427: } >> 428: move_into(val, block); > > Suggestion: > > // Inputs of val may already be early enough, but if not move them together with val. > ensure_node_is_at_block_or_before(val->in(i), block); > } > move_node_and_its_projections_to_block(val, block); It's a little hard to see here: did you just refactor this code, or make any changes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079450181 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079507708 From epeter at openjdk.org Thu May 8 11:29:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 10:29:17 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/block.hpp line 468: >> >>> 466: >>> 467: // If necessary, hoist orphan node n into the end of block b. >>> 468: void maybe_hoist_into(Node* n, Block* b); >> >> Hmm. It is "if necessary" or "if possible"? >> I wonder if we could come up with a name that is a little longer and expresses this condition? > > Ah no, I'm starting to understand that it is rather a `if necessary`... See further comments at `maybe_hoist_into` and my suggestions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079512983 From thartmann at openjdk.org Thu May 8 12:17:57 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 12:17:57 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: <7XtX737NV9bjyQWKxZK0rjNzQ1ye2IpbsuWTtI8Rh1s=.7e6bb289-50a1-45e2-906a-44348848a281@github.com> On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2862849381 From shade at openjdk.org Thu May 8 12:39:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:39:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 20:30:00 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/runtime/vmStructs.cpp line 1266: > >> 1264: declare_toplevel_type(CDSFileMapRegion) \ >> 1265: declare_toplevel_type(UpcallStub::FrameData) \ >> 1266: declare_toplevel_type(UnloadableMethodHandle) \ > > So are these left for the async profiler? Yes, see https://github.com/async-profiler/async-profiler/issues/1260 that is filed already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079634276 From shade at openjdk.org Thu May 8 12:42:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:42:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 20:28:10 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 35: > >> 33: #include "oops/weakHandle.inline.hpp" >> 34: >> 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { > > This should initialize method in the ctor initializer list. Maybe, but the field is not `const`, so there seem to be no point? We also assign after assert checks `method` for us. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079637960 From shade at openjdk.org Thu May 8 12:50:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:50:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 19:54:04 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 51: > >> 49: // Method holder class cannot be unloaded. >> 50: return nullptr; >> 51: } > > This is nice that this doesn't require creating a jni handle for unloadable class loaders with this change. Right? Wasteful to even go through all this dance for compiling JDK methods :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079651140 From aboldtch at openjdk.org Thu May 8 13:01:07 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 8 May 2025 13:01:07 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v2] In-Reply-To: References: Message-ID: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: - Use private inheritance - Separate tree logic to own class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25112/files - new: https://git.openjdk.org/jdk/pull/25112/files/4bc5cf09..3c3e22bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=00-01 Stats: 253 lines in 2 files changed: 122 ins; 93 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/25112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25112/head:pull/25112 PR: https://git.openjdk.org/jdk/pull/25112 From aboldtch at openjdk.org Thu May 8 13:03:53 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 8 May 2025 13:03:53 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree In-Reply-To: References: Message-ID: On Thu, 8 May 2025 05:21:20 GMT, Axel Boldt-Christmas wrote: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. @stefank had some comments about having to much logic inlined. So abstracted the extra tree logic into its own inner class. Currently re-running tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25112#issuecomment-2862969347 From shade at openjdk.org Thu May 8 14:33:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 14:33:02 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> On Wed, 7 May 2025 20:18:29 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/compiler/compileBroker.cpp line 1697: > >> 1695: JavaThread* thread = JavaThread::current(); >> 1696: >> 1697: methodHandle method(thread, task->method()); > > I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. Ah, that reminds me, thanks. I removed this because I caught method to be in unsafe (unloaded) state, so `method()` asserted on me. `compiler/c1/TestConcurrentPatching.java` seems to intermittently crash on it. On this code path, I think we might be plausibly waiting on unloaded compile task, and we "only" wait for notification that task got purged from the queue. Handelizing broken `Method*` is awkward, to say the least! Then again, I am not sure if removing this handle is safe enough. So out of abundance of caution, we can actually handelize `Method*` after checking for task status. But now that I do this: methodHandle method(thread, task->is_unloaded() ? nullptr : task->method()); ...the test still fails on the same assert! Which makes no sense to me, as we are supposed to be guarded by `is_unloaded` check before it. Something is off, I'll investigate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079838894 From kvn at openjdk.org Thu May 8 15:14:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 15:14:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <40ZOuLCtxa6ytFKxGHY5mHY_SI_e1AxrXSUrpmNB9Lk=.17f141ca-5b1e-4ead-8416-86f5b7382598@github.com> On Tue, 6 May 2025 18:57:14 GMT, Roberto Casta?eda Lozano wrote: > > Why the attribute is not set for `zLoadP` on x64? > > `ins_is_late_expanded_null_check_candidate` is set to `true` for `zLoadP` in [src/hotspot/cpu/x86/gc/z/z_x86_64.ad (line 121)](https://github.com/openjdk/jdk/pull/25066/files#diff-183d5784f9317f5582b267d82e7afa4e23ae137671fab8ba9cb5b502dae52b3dR121), or did I misunderstand your question? Somehow I missed this change. Good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2863416833 From kvn at openjdk.org Thu May 8 15:24:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 15:24:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). src/hotspot/share/opto/lcm.cpp line 95: > 93: } > 94: > 95: void PhaseCFG::maybe_hoist_into(Node* n, Block* b) { Consider adding asserts into these 2 new methods to make sure that they operate only on **data** and not control nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079942627 From stefank at openjdk.org Thu May 8 16:06:04 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 8 May 2025 16:06:04 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v4] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 10:06:41 GMT, Stefan Johansson wrote: >> Please review this change to improve TLAB handling in ZGC. >> >> **Summary** >> In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. >> >> The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: >> >> bool update_allocation_history = used > 0.5 * capacity; >> ``` >> >> So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. >> >> Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. >> >> How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. >> >> This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. >> >> **Testing** >> * Functional testin... > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Handle inc and dec in alloc/undo I like this change. I've added a few comments below. src/hotspot/share/gc/z/zTLABUsage.cpp line 32: > 30: _used_history() {} > 31: > 32: Suggestion: src/hotspot/share/gc/z/zTLABUsage.cpp line 39: > 37: void ZTLABUsage::decrease_used(size_t size) { > 38: precond(size <= _used); > 39: Atomic::sub(&_used, size, memory_order_relaxed); Suggestion: precond(size <= _used); Atomic::sub(&_used, size, memory_order_relaxed); src/hotspot/share/gc/z/zTLABUsage.cpp line 43: > 41: > 42: void ZTLABUsage::reset() { > 43: const size_t current_used = Atomic::xchg(&_used, (size_t) 0); Does this work instead? Suggestion: const size_t current_used = Atomic::xchg(&_used, 0u); src/hotspot/share/gc/z/zTLABUsage.cpp line 51: > 49: > 50: // Save the old values for logging > 51: const size_t old_used = used(); It's not immediately obvious what `_used` is compared to `used()` Could one of these be renamed so that readers don't mistakenly assume that `used()` returns `_used`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24814#pullrequestreview-2825630207 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080009139 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080009572 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080010741 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080017958 From sviswanathan at openjdk.org Thu May 8 22:20:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 8 May 2025 22:20:52 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24919#pullrequestreview-2826479403 From Monica.Beckwith at microsoft.com Thu May 8 22:47:25 2025 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Thu, 8 May 2025 22:47:25 +0000 Subject: G1 AHS + Request for Feedback and Testing on G1 Heap Resizing Prototype Message-ID: Hi all, Thanks to everyone for the ongoing AHS discussions across 8236073, 8238686/87, and umbrella JDK-8353716. >From the Microsoft side, we have been reviewing logs from a range of prod-like use cases across the broader MSFT environment, including first-party Java services (both Azure-hosted and non-Azure), as well as OSS-based deployments (Cassandra, Kafka, etc). We've also been benchmarking with various combinations (ReservePercent, GCTimeRatio, periodic GC, etc) and exploring early models to help gauge expected shrink/grow behavior under service conditions. These observations have shaped our perspective and contributions to upstream design discussions. Here's?where we currently stand: ------------------------------------------------------------------------ 1. ?SoftMaxHeapSize semantics and placement ------------------------------------------------------------------------ We continue to support the current SoftMax proposal as a **soft upper bound** on heap usage?one that the GC controller respects, but may temporarily exceed if necessary. Our analysis of logs shows that an effective SoftMax, even when static, would help reduce RSS under light traffic without requiring aggressive full GCs. We also plan to evaluate the controller changes under PR #24211 once they?re merged, and we?d like to keep the option of a `jcmd GC.set_soft_max` interface, consistent with ZGC and future container signals (e.g. memory.high). ------------------------------------------------------------------------ 2. ?GCTimeRatio as a feedback driver ------------------------------------------------------------------------ We support the move to a higher default value for `GCTimeRatio` as it aligns well with throughput goals in our measured workloads, including SPECjbb2015, DBs, and Spring-based services. We plan to continue stepped testing across representative service patterns. ?We'd also support exposing an alias like `-XX:GCCPUPercent` to improve ergonomics for operators.? ------------------------------------------------------------------------ 3. ?Reserve floor and shrink control ------------------------------------------------------------------------ We strongly recommend retaining `G1ReservePercent` as a configurable minimum, particularly in low-latency scenarios or when allocation bursts are expected immediately after idle phases. We?d also be open to exploring future adaptive variants of the reserve floor as the AHS loop matures. ------------------------------------------------------------------------ 4. ?Periodic GC fallback and field heuristics ------------------------------------------------------------------------ Until AHS-driven shrink behavior is well understood and widely adopted, we recommend retaining a periodic GC safety net?especially for services with extended idle phases. As AHS matures, we?ll continue to evaluate whether this fallback remains necessary in production. ------------------------------------------------------------------------ 5. ?Role of externally-supplied limits ------------------------------------------------------------------------ Internally, we?ve discussed how AHS should behave in managed container environments such as AKS. In most cases we expect the JVM to operate within cgroup-defined memory.max and possibly memory.high bounds. We don?t?currently envision supporting non-cgroup (custom/embedded) environments on day one. We also believe that memory.high or RSS-based constraints could eventually serve as complementary signals for guiding heap elasticity, especially for AKS customers. These use cases are still exploratory, but we hope they can be accommodated within the direction of AHS without adding undue complexity to the core loop. ------------------------------------------------------------------------ 6. ?Design notes and alignment ------------------------------------------------------------------------ For reference, our current AHS evaluation and alignment write-up (including control flow diagrams and tuning strategy) is here: ? ? https://github.com/microsoft/openjdk-workstreams/tree/main/G1-AHS We?ll?continue to update that as PRs land and more data becomes available. We welcome any feedback on the write-up or our alignment approach and would be happy to incorporate community input via PRs. We are also open to hosting the write-up within an OpenJDK project repo if that's deemed appropriate. Thanks again to everyone driving this effort forward?happy to continue refining as the pieces come together. Best regards, ? Monica From jbhateja at openjdk.org Fri May 9 05:31:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 05:31:57 GMT Subject: Integrated: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. This pull request has now been integrated. Changeset: 53ad4b2a Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/53ad4b2ad2664e5056c113543dfaa26647d6ce26 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Co-authored-by: Axel Boldt-Christmas Reviewed-by: aboldtch, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/24919 From sjohanss at openjdk.org Fri May 9 06:07:53 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 May 2025 06:07:53 GMT Subject: RFR: 8350596: [Linux] Increase default MaxRAMPercentage for containerized workloads In-Reply-To: References: Message-ID: On Wed, 7 May 2025 09:29:16 GMT, Severin Gehwolf wrote: > Please take a look at this proposal to fix the "Java needs so much memory" perception in containers. The idea would be to bump the default `MaxRAMPercentage` to a higher value. The patch proposes 75%, but we could just as well use 50% if people feel more comfortable about it. Right now the default deployment in containers with resource limits in place (common for Kubernetes deployments) where a single process runs in the container isn't well catered for today for an application that just uses the default configuration. Only 25% of the container memory will be used for the Java heap, arguably wasting much of the remaining memory that has been granted to the container by a memory limit (that the JVM would detect and use as physical memory). > > I've filed a CSR for this as well for which I'm looking for reviewers too and intend to write a release note as well about this change as it has some risk associated with it, although the escape hatch is pretty simple: set `-XX:MaxRAMPercentage=25.0` to go back to the old behavour. > > Testing: > - [x] GHA - tier 1 (windows failures seem infra related) > - [x] hotspot and jdk container tests on cg v2 and cg v1 including the two new tests. > > Thoughts? Opinions? Thanks for looking into this Severin. Thinking back to the discussions we had around this at OCW, I remember there were some concerns regarding different types of deployments. I think this really makes sense in the cases where we divide a machines memory using containers, but what if containers are just used to divide other resources. One use-case that was raised was containerized applications on Linux. I'm not sure if such an application would report true for `is_containerized()`, but it would be nice to have some data around this. Have you done any testing with containerized apps? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25086#issuecomment-2865246427 From sjohanss at openjdk.org Fri May 9 07:52:53 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 May 2025 07:52:53 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v4] In-Reply-To: References: Message-ID: <2uwu7EoW1H6F6v0FlZsop7jiQhePYWnXNzePf_4pQBc=.52f2dde4-dadc-4b07-af0b-8fd52f0765f0@github.com> On Thu, 8 May 2025 15:57:19 GMT, Stefan Karlsson wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Handle inc and dec in alloc/undo > > src/hotspot/share/gc/z/zTLABUsage.cpp line 43: > >> 41: >> 42: void ZTLABUsage::reset() { >> 43: const size_t current_used = Atomic::xchg(&_used, (size_t) 0); > > Does this work instead? > Suggestion: > > const size_t current_used = Atomic::xchg(&_used, 0u); No, `0ul` works on Linux, but Windows fails with that. > src/hotspot/share/gc/z/zTLABUsage.cpp line 51: > >> 49: >> 50: // Save the old values for logging >> 51: const size_t old_used = used(); > > It's not immediately obvious what `_used` is compared to `used()` Could one of these be renamed so that readers don't mistakenly assume that `used()` returns `_used`. Talked a bit about this offline, will add some comments and rename `used()` and `capacity()` to `tlab_used()` and `tlab_capacity()` to make it a bit more clear that they are not directly connected and also better match the `ZHeap` interface. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2081127733 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2081130690 From sjohanss at openjdk.org Fri May 9 08:17:13 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 May 2025 08:17:13 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v5] In-Reply-To: References: Message-ID: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: StefanK review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/ba7cb673..2f5742fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=03-04 Stats: 22 lines in 3 files changed: 4 ins; 1 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From stefank at openjdk.org Fri May 9 08:29:53 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 9 May 2025 08:29:53 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v5] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 08:17:13 GMT, Stefan Johansson wrote: >> Please review this change to improve TLAB handling in ZGC. >> >> **Summary** >> In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. >> >> The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: >> >> bool update_allocation_history = used > 0.5 * capacity; >> ``` >> >> So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. >> >> Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. >> >> How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. >> >> This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. >> >> **Testing** >> * Functional testin... > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > StefanK review Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24814#pullrequestreview-2827450137 From aboldtch at openjdk.org Fri May 9 09:31:59 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 9 May 2025 09:31:59 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v5] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 08:17:13 GMT, Stefan Johansson wrote: >> Please review this change to improve TLAB handling in ZGC. >> >> **Summary** >> In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. >> >> The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: >> >> bool update_allocation_history = used > 0.5 * capacity; >> ``` >> >> So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. >> >> Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. >> >> How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. >> >> This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. >> >> **Testing** >> * Functional testin... > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > StefanK review lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24814#pullrequestreview-2827676721 From sgehwolf at openjdk.org Fri May 9 10:06:50 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 9 May 2025 10:06:50 GMT Subject: RFR: 8350596: [Linux] Increase default MaxRAMPercentage for containerized workloads In-Reply-To: References: Message-ID: <3zkVeUEqr_avGG1v8Q0Dp_0_FiZrXLxHJeU4KQ556sg=.77fbebb1-8bcf-40bb-95c0-664120321cbf@github.com> On Wed, 7 May 2025 09:29:16 GMT, Severin Gehwolf wrote: > Please take a look at this proposal to fix the "Java needs so much memory" perception in containers. The idea would be to bump the default `MaxRAMPercentage` to a higher value. The patch proposes 75%, but we could just as well use 50% if people feel more comfortable about it. Right now the default deployment in containers with resource limits in place (common for Kubernetes deployments) where a single process runs in the container isn't well catered for today for an application that just uses the default configuration. Only 25% of the container memory will be used for the Java heap, arguably wasting much of the remaining memory that has been granted to the container by a memory limit (that the JVM would detect and use as physical memory). > > I've filed a CSR for this as well for which I'm looking for reviewers too and intend to write a release note as well about this change as it has some risk associated with it, although the escape hatch is pretty simple: set `-XX:MaxRAMPercentage=25.0` to go back to the old behavour. > > Testing: > - [x] GHA - tier 1 (windows failures seem infra related) > - [x] hotspot and jdk container tests on cg v2 and cg v1 including the two new tests. > > Thoughts? Opinions? Thanks for looking at this, Stefan. > Thinking back to the discussions we had around this at OCW, I remember there were some concerns regarding different types of deployments. I think this really makes sense in the cases where we divide a machines memory using containers, but what if containers are just used to divide other resources. One use-case that was raised was containerized applications on Linux. Currently there is only the generic `is_containerized()` API which has been documented in the bug that fixed that: [JDK-8261242](https://bugs.openjdk.org/browse/JDK-8261242?focusedId=14685743&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14685743) So yes, this would update the RAM percentage for a) unprivileged container (no limits), b) some other container tech which sets the cgroup CPU limit for example. The JVM currently only looks at memory/cpu limits for privileged containers and takes that into consideration for `is_containerized()`. If there is consensus, we could add an API that returns true if only a memory limit is present. That doesn't exist yet, though. Happy to propose something going into that direction. The infra is already there. > I'm not sure if such an application would report true for `is_containerized()`, but it would be nice to have some data around this. It would return true for any non-privileged container. I can see that this might be a concern. > Have you done any testing with containerized apps? I have done some basic testing so far, but would be happy to do more. What specific testing would you be interested in? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25086#issuecomment-2865954385 From shade at openjdk.org Fri May 9 11:23:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 11:23:55 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> References: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> Message-ID: <_8y_DYl9Q4P1scTtA_J8ilWw_GP0kdSL37bAmYb4dEM=.ea34a76f-0236-459f-b99c-a8d6129c3a67@github.com> On Thu, 8 May 2025 14:29:56 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/compiler/compileBroker.cpp line 1697: >> >>> 1695: JavaThread* thread = JavaThread::current(); >>> 1696: >>> 1697: methodHandle method(thread, task->method()); >> >> I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. > > Ah, that reminds me, thanks. > > I removed this because I caught method to be in unsafe (unloaded) state, so `method()` asserted on me. `compiler/c1/TestConcurrentPatching.java` seems to intermittently crash on it. On this code path, I think we might be plausibly waiting on unloaded compile task, and we "only" wait for notification that task got purged from the queue. Handelizing broken `Method*` is awkward, to say the least! > > Then again, I am not sure if removing this handle is safe enough. So out of abundance of caution, we can actually handelize `Method*` after checking for task status. But now that I do this: > > > methodHandle method(thread, task->is_unloaded() ? nullptr : task->method()); > > > ...the test still fails on the same assert! Which makes no sense to me, as we are supposed to be guarded by `is_unloaded` check before it. Something is off, I'll investigate. I understand now. There are TOCTOU-s under concurrent `block_unloading`. The most egregious one is here: `is_unloaded` checks in two steps: `!_weak_handle.is_empty() && _weak_handle.peek() == nullptr;`. So when `block_unloading` comes in concurrently and resets weak to empty (since we have strong handle now), it might be possible that first predicate is still `true`, but evaluation of second predicate calls `peek` on empty `_weak_handle`, oops. We could technically claim that `UnloadableMethodHandle` is not thread-safe, but it does not solve current compiler uses, and it is very unsatisfactory for the utility class. I'll look into ways to make it resilient under concurrent updates. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2081467353 From shade at openjdk.org Fri May 9 17:08:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 17:08:42 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v12] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Tracking UMH state more accurately - Rework for safer concurrency - Merge branch 'master' into JDK-8231269-compile-task-weaks - Move to oops - Improve get_method_blocker - Simplify a bit - Merge branch 'master' into JDK-8231269-compile-task-weaks - Do not accept nullptr methods - Attempt at phasing doc - Merge branch 'master' into JDK-8231269-compile-task-weaks - ... and 12 more: https://git.openjdk.org/jdk/compare/ad07426f...1cdbed2b ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=11 Stats: 393 lines in 11 files changed: 331 ins; 25 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Fri May 9 17:08:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 17:08:42 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops So... Following up on one forgotten `methodHandle` removal (https://github.com/openjdk/jdk/pull/24018#discussion_r2081467353) got me into a rabbit hole of making new utility class thread-safe. Otherwise, there are TOCTOU issues checking `(Weak)Handle` status, which gets us in trouble real quick. This is normally happens in current tests when external thread goes into `CompilerBroker::wait_for_compilation()` and compiler thread starts moving the `UMH` state for compilation. Relying on un-synchronized `Weak(Handle)` state is not nice either. The answer to all these problems is to track the `UMH` state more accurately, and thus trust `WeakHandle` only sporadically. This is now done in new commit. This also allows for more explicit state checks. And, this allows clearly catching when we try to access `method()` after `release()` -- that is surprisingly happens for `hot_method()` that is not re-initialized always. Chasing this bug also made my head hurt a bit about double-negating `!is_unloaded` checks. It is technically a safety check, so I renamed methods to reflect that: `is_safe`, `make_always_safe`. I will schedule weekend tests for this PR on various machines to see if more bugs fall out once I shake that particular tree. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2867309949 From ayang at openjdk.org Sun May 11 16:42:50 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sun, 11 May 2025 16:42:50 GMT Subject: RFR: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:29:02 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. > > This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). > > Testing: gha, manual testing as below: > > Mainline: > > > [3.740s][info ][gc,init ] Heap Min Capacity: 150G > [3.740s][info ][gc,init ] Heap Initial Capacity: 150G > [3.740s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B > . > . > [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms > > With patch (No shrinking when -Xms == -Xms): > > > [3.753s][info ][gc,init ] Heap Min Capacity: 150G > [3.753s][info ][gc,init ] Heap Initial Capacity: 150G > [3.753s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms > > With patch (Shrinking when -Xms != -Xms): > > > [3.755s][info ][gc,init ] Heap Min Capacity: 153568M > [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M > [3.755s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) > . > . > [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25036#pullrequestreview-2831444743 From gli at openjdk.org Sun May 11 19:33:58 2025 From: gli at openjdk.org (Guoxiong Li) Date: Sun, 11 May 2025 19:33:58 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Fri, 2 May 2025 10:23:25 GMT, Albert Mingkun Yang wrote: > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 I review about 1/3 code now. But I want to save the thoughts, so I submit it. Sorry for the noise if it bothers you. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 343: > 341: if (is_gc_overhead_limit_reached()) { > 342: return nullptr; > 343: } It seems the parameter `gc_overhead_limit_was_exceeded` and the field `MemAllocator::Allocation::_overhead_limit_exceeded` are not used by all GCs now. Should we keep the parameter and set it as `true` under the condition `is_gc_overhead_limit_reached()`? For example: if (op.prologue_succeeded()) { assert(is_in_or_null(op.result()), "result not in heap"); if (is_gc_overhead_limit_reached()) { *gc_overhead_limit_was_exceeded = true; return nullptr; } return op.result(); } Or we should remove the parameter and the field in another PR. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 825: > 823: // If MinHeapFreeRatio is at its default value; shrink cautiously. Otherwise, users expect prompt shrinking. > 824: if (FLAG_IS_DEFAULT(MinHeapFreeRatio) && MinHeapFreeRatio == 0) { > 825: if (desired_capacity < current_capacity) { I think curiously a lot about the condition `MinHeapFreeRatio == 0` and then I find the following code in `parallelArguments.cpp`. May it be better to use `UseAdaptiveSizePolicy && FLAG_IS_DEFAULT(MinHeapFreeRatio)` here instead of `FLAG_IS_DEFAULT(MinHeapFreeRatio) && MinHeapFreeRatio == 0`? if (UseAdaptiveSizePolicy) { // We don't want to limit adaptive heap sizing's freedom to adjust the heap // unless the user actually sets these flags. if (FLAG_IS_DEFAULT(MinHeapFreeRatio)) { FLAG_SET_DEFAULT(MinHeapFreeRatio, 0); } if (FLAG_IS_DEFAULT(MaxHeapFreeRatio)) { FLAG_SET_DEFAULT(MaxHeapFreeRatio, 100); } } src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 862: > 860: resize_old_gen_after_full_gc(); > 861: young_gen()->resize_after_full_gc(); > 862: } The `PSYoungGen` has its methods `resize_after_full_gc` and `resize_after_young_gc`. I think such design is good. What about moving the method `resize_old_gen_after_full_gc` (and the related method `calculate_desired_old_gen_capacity`) to `PSOldGen` and renaming it as `resize_after_full_gc`? src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 141: > 139: // Invoked at gc-pause-end > 140: void gc_epilogue(bool full); > 141: It is strange that Parallel GC didn't have its prologue and epilogue before. And currently, the concrete work categories (such as increasing the GC count) of the prologue and epilogue in all the GCs are not unified. It seems an issue left over by history, so it need more investigation in the future. src/hotspot/share/gc/parallel/psAdaptiveSizePolicy.cpp line 45: > 43: _avg_promoted(new AdaptivePaddedNoZeroDevAverage(AdaptiveSizePolicyWeight, PromotedPadding)), > 44: _space_alignment(space_alignment), > 45: _young_gen_size_increment_supplement(YoungGenerationSizeSupplement) {} Typos in `gc_globals.hpp`(shown below): `YoungedGenerationSizeIncrement` and `YoungedGenerationSizeSupplement`. It should be fixed in another PR. product(uint, YoungGenerationSizeIncrement, 20, \ "Adaptive size percentage change in young generation") \ range(0, 100) \ \ product(uint, YoungGenerationSizeSupplement, 80, \ "Supplement to YoungedGenerationSizeIncrement used at startup") \ // <--- here range(0, 100) \ \ product(uintx, YoungGenerationSizeSupplementDecay, 8, \ "Decay factor to YoungedGenerationSizeSupplement") \ // <--- here range(1, max_uintx) \ src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1104: > 1102: heap->post_full_gc_dump(&_gc_timer); > 1103: > 1104: size_policy->record_gc_pause_end_instant(); What about moving this invocation into `major_collection_end`? Just like the `record_gc_pause_start_instant` and `major_collection_begin`. src/hotspot/share/gc/shared/adaptiveSizePolicy.hpp line 179: > 177: _gc_distance_timer.reset(); > 178: _gc_distance_timer.start(); > 179: } The method name `record_gc_pause_end_instant` is about `gc pause`, but the code is about `gc distance`. May we need a clearer name? src/hotspot/share/gc/shared/adaptiveSizePolicy.hpp line 184: > 182: _gc_distance_timer.stop(); > 183: _gc_distance_seconds_seq.add(_gc_distance_timer.seconds()); > 184: } The method name `record_gc_pause_start_instant` is about `gc pause`, but the code is about `gc distance`. May we need a clearer name? ------------- Changes requested by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25000#pullrequestreview-2831414868 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083540517 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083573645 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083574866 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083578247 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083595212 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083596481 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083582694 PR Review Comment: https://git.openjdk.org/jdk/pull/25000#discussion_r2083581870