From duke at openjdk.org Thu May 1 04:01:50 2025 From: duke at openjdk.org (duke) Date: Thu, 1 May 2025 04:01:50 GMT Subject: Withdrawn: 8351137: ZGC: Improve ZValueStorage alignment support In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:34:36 GMT, Axel Boldt-Christmas wrote: > ZValueStorage only align the allocations to the alignment defined by the storage but ignores the alignment of the types. Right now all usages of our different storages all have types which have an alignment less than or equal to the alignment set by its storage. > > I wish to improve this so that types with greater alignment than the storage alignment can be used. > > The UB caused by using a type larger than the storage alignment is something I have seen materialise as returning bad address (and crashing) on Windows. > > As we use `utilities/align.hpp` for our alignment utilities we only support power of two alignment, I added extra asserts here because we use the fact that `lcm(x, y) = max(x, y)` if both are powers of two. > > Testing: > * tier 1 through tier 5 Oracle supported platforms > * GHA This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23887 From duke at openjdk.org Thu May 1 05:41:39 2025 From: duke at openjdk.org (Rui Li) Date: Thu, 1 May 2025 05:41:39 GMT Subject: RFR: 8350860: Max GC memory overhead tests Message-ID: The G1 GC metadata has increased from JDK8 to the current tip. When upgrading JDK for an application from JDK8, applications might observe native memory increases. GC is one of the top contributors. Small applications tend to get impacted more significantly. See sample test in description in https://bugs.openjdk.org/browse/JDK-8350860, when heap is 128m, the native memory used by gc can be over 80m. In order to make sure we don't bring dramatic native memory increase while developing G1, adding this metadata guardrail test. The test calculates the native memory based on existing GC usages and provides some headroom. When there are significant increase, the test would fail and we should look back to see if the added native memory make sense. ------------- Commit messages: - Remove trailing whitespaces - 8350860: Max GC memory overhead tests Changes: https://git.openjdk.org/jdk/pull/24981/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24981&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350860 Stats: 174 lines in 1 file changed: 174 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24981.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24981/head:pull/24981 PR: https://git.openjdk.org/jdk/pull/24981 From manc at google.com Thu May 1 07:07:03 2025 From: manc at google.com (Man Cao) Date: Thu, 1 May 2025 00:07:03 -0700 Subject: Request for Feedback and Testing on G1 Heap Resizing Prototype In-Reply-To: <91b4d64f-261c-4355-b6d3-279af4583b1a@oracle.com> References: <6B0649C0-8188-47AB-8EA1-B4A48172898C@oracle.com> <91b4d64f-261c-4355-b6d3-279af4583b1a@oracle.com> Message-ID: Great progress! Thank you, Ivan. Optimistically, many of these changes could make it to JDK 25. I'm happy to do some experiments and provide feedback. Does [1] contain all necessary changes? Will that branch be updated as parts of it merge into master? Some early questions/feedback below. > We also increase the default GCTimeRatio from 12 to 24 [3] (we are choosing 24 but open to suggestions). The existing default causes the heap to shrink too aggressively under the new policy in order to maintain the target GCTimeRatio. A higher default provides a better balance and avoids shrinking heap. This changes the pause overhead target from ~8% (1/13) to 4% (1/25). Would it make G1 expand the heap more aggressively after incremental collections compared to existing behavior? Could you share some early/rough performance numbers about 12 vs 24 with the prototype, such as actual heap sizes, throughput differences? > Additionally, we are removing the heap resizing at the end of the Remark pause which was based on MinHeapFreeRatio and MaxHeapFreeRatio. This resizing of the heap ignores current application behaviour and may lead to pathological cases of repeated concurrent mark cycles: In the new prototype, does the pathological case happen with the default MinHeapFreeRatio=40 MaxHeapFreeRatio=70 value? Or mainly with smaller user-defined values for MinHeapFreeRatio/MaxHeapFreeRatio? Re Thomas's comments: > So if one were to make GCTimeRatio manageable (just for testing > purposes), and made it a float (for better control), changes to it > should reflect on the used heap size in the next few GCs automatically. Making GCTimeRatio manageable sounds like a good idea. Do we plan to do this eventually (why "just for testing purposes")? > A SoftMaxHeapSize implementation based on the discussion in the PR [0] > that only guides IHOP with changes in > ?G1AdaptiveIHOPControl::actual_target_threshold() should be effective > now, but there may be issues with this GCTimeRatio based heap sizing > that would be interesting to explore. If G1 strives to respect GCTimeRatio as the prototype proposes, our existing use cases probably no longer need to set SoftMaxHeapSize (and maintains a separate algorithm to calculate values for SoftMaxHeapSize). Our use case still needs CurrentMaxHeapSize, but it could be followed up in https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-April/051996.html. [1] https://github.com/openjdk/jdk/compare/master...walulyai:jdk:G1HeapResizePolicy -Man On Tue, Apr 29, 2025 at 4:34?AM Thomas Schatzl wrote: > Hi Ivan, > > thanks for working on this! > > Some comments for people (Man, Monica, Kirk) potentially taking this for > a spin: > > On 29.04.25 12:46, Ivan Walulya wrote: > > As part of our preparations for AHS, we are prototyping changes to the > > G1 heap resizing policy to improve the effectiveness of the GCTimeRatio > > [1]. The GCTimeRatio is set to manage the balance between GC time and > > Application execution time. G1's current implementation of GCTimeRatio > > appears to have drifted from its intended purpose over time. It may no > > longer accurately guide heap sizing in response to GC overhead. > > Therefore, we need to change this mechanism with the goal that G1 better > > manages heap sizes without the need for additional tuning knobs. > > > The prototype allows both expansion and shrinking of the heap at > the end > > of any GC, as opposed to the current behavior where shrinking is only > > allowed at Remark or Full GC pauses [2]. We also increase the default > > GCTimeRatio from 12 to 24 [3] (we are choosing 24 but open to > > suggestions). The existing default causes the heap to shrink too > > aggressively under the new policy in order to maintain the target > > GCTimeRatio. A higher default provides a better balance and avoids > > shrinking heap. > > So if one were to make GCTimeRatio manageable (just for testing > purposes), and made it a float (for better control), changes to it > should reflect on the used heap size in the next few GCs automatically. > > A SoftMaxHeapSize implementation based on the discussion in the PR [0] > that only guides IHOP with changes in > ?G1AdaptiveIHOPControl::actual_target_threshold() should be effective > now, but there may be issues with this GCTimeRatio based heap sizing > that would be interesting to explore. > > Additionally, we are removing the heap resizing at the end of the Remark > > pause which was based on MinHeapFreeRatio and MaxHeapFreeRatio. This > > resizing of the heap ignores current application behaviour and may lead > > to pathological cases of repeated concurrent mark cycles: > > > > * we shrink the heap at remark, > > * a smaller heap triggers a concurrent marking in the subsequent > > GCs as well as expanding the heap > > * the concurrent cycle ends in another remark pause where the > > cycle restarts. > > > > > > We keep this MinHeapFreeRatio-MaxHeapFreeRatio based resizing logic at > > the end of Full GC. > > The use case for this might be ones similar to CraC to temporarily > compact the heap as much as possible; however it might be better to have > explicit control for that (e.g. a jcmd). > > Ultimately there may be need to remove it as well for full gcs, > replacing it with something else. > > As a result of these changes, applications may settle at more > > appropriate and in some cases smaller heap sizes for a given > > GCTimeRatio. While this may show as regression in some benchmarks that > > are sensitive to heap size, it is still improved control over GC > behaviour. > > > > We are requesting for feedback or testing of these changes before > > propose to merge them with mainline. > > > > Some of the changes that are independent of the GCTimeRatio are already > > out for review [4, 5], other minor fixes will be split out and pushed > > independently. > > > [0] https://github.com/openjdk/jdk/pull/24211 > > Hth, > Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iwalulya at openjdk.org Thu May 1 08:27:32 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 1 May 2025 08:27:32 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: Message-ID: > Hi, > > Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Albert review - Merge branch 'master' into 8355681-find_contiguous_allow_expand - init ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24915/files - new: https://git.openjdk.org/jdk/pull/24915/files/5e8e4a73..5085e54b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24915&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24915&range=00-01 Stats: 13278 lines in 385 files changed: 9400 ins; 1696 del; 2182 mod Patch: https://git.openjdk.org/jdk/pull/24915.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24915/head:pull/24915 PR: https://git.openjdk.org/jdk/pull/24915 From iwalulya at openjdk.org Thu May 1 08:27:33 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 1 May 2025 08:27:33 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References: <8OFJ2lP5ECUqK6bh56ThD1jUJfXGb6UHXh0rrD6XptU=.4ad9e344-dffe-4ed8-8188-ea470fb4cb4a@github.com>

Message-ID: On Wed, 30 Apr 2025 13:57:45 GMT, Albert Mingkun Yang wrote: >> I added this instead of an assert on failing `expand_and_allocate` for humongous objects, but then figured we could just skip the `expand_and_allocate` attempt which is guaranteed to fail. > > Not sure what to write in a ticket. Those are just some questions I had while reading the coed. Anyway, if this part is not supper related to the actual functional change, can it be dealt with in its own PR? I have removed that check. Will be done with the clean up of `attempt_allocation_at_safepoint` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24915#discussion_r2069982724 From thomas.schatzl at oracle.com Thu May 1 09:55:00 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 1 May 2025 11:55:00 +0200 Subject: Request for Feedback and Testing on G1 Heap Resizing Prototype In-Reply-To: References: <6B0649C0-8188-47AB-8EA1-B4A48172898C@oracle.com> <91b4d64f-261c-4355-b6d3-279af4583b1a@oracle.com> Message-ID: Hi Man, On 01.05.25 09:07, Man Cao wrote: > Great progress! Thank you, Ivan. Optimistically, many of these changes > could make it to JDK 25. > > I'm happy to do some experiments and provide feedback. Does [1] contain > all necessary changes? Afaik yes. > Will that branch be updated as parts of it merge > into master? Ivan can answer that. > > Some early questions/feedback?below. > > > We also increase the default GCTimeRatio from 12 to 24 [3] (we are > > choosing 24 but open to suggestions). The existing default causes the > > heap to shrink too aggressively under the new policy in order to > > maintain the target GCTimeRatio. A higher default provides a better > > balance and avoids shrinking heap. >> This changes the pause overhead target from ~8% (1/13) to 4% (1/25). > Would it make?G1 expand the heap more aggressively after incremental > collections compared to existing behavior? Could you share some early/ > rough performance numbers about 12 vs 24 with the prototype, such as > actual heap sizes, throughput?differences? That value has been found to create roughly same heap sizes at around the same performance +/- 1-2% throughput across our set of benchmarks that run out-of-box (iirc). Again, Ivan may chime in here. Part of this request for feedback is about getting a larger coverage on this aspect. (The increase in GCTimeRatio has actually been something that is long overdue regardless of this change, since G1's overhead decreased a lot in recent years.) [...] > Re Thomas's comments: > > > So if one were to make GCTimeRatio manageable (just for testing > > purposes), and made it a float (for better control), changes to it > > should reflect on the used heap size in the next few GCs automatically. > > Making GCTimeRatio manageable sounds like a good idea. Do we plan to do > this eventually (why "just for testing purposes")? > It's just not implemented in that branch :) Currently we think that GCTimeRatio will eventually get manageable and likely using floats as the integers are too coarse as divisors as the . Probably as a follow-up. There is still the question whether to deprecate it in favor of some GCCPUUsagePercent or whatever it is going to be called to have more direct control. > > A SoftMaxHeapSize implementation based on the discussion in the PR [0] > > that only guides IHOP with changes in > > ?G1AdaptiveIHOPControl::actual_target_threshold() should be effective > > now, but there may be issues with this GCTimeRatio based heap sizing > > that would be interesting to explore. > > If G1 strives to respect GCTimeRatio as the prototype proposes, our > existing use cases probably no longer need to set SoftMaxHeapSize (and > maintains a separate algorithm to calculate values for SoftMaxHeapSize). The purpose of this request is also to understand whether SoftMaxHeapSize is still necessary :) Sizing based on cpu usage may be more inconvenient and less exact at reducing to a particular target heap size (without OOME'ing) than a direct target heap size. (i.e. I can imagine the case where while the threshold is kind of a continuum, for a collector small changes to heap sizes can lead to radical changes in CPU usage, so G1 might flap back and forth all the time). We also do not have real use cases with real applications where we would temporarily want to keep the heap below a certain value like we think you suggested. Ignoring CraC like use cases where specific functionality would serve that use case better, one current use of SoftMaxHeapSize here is for tuning ZGC performance (since SoftMaxHeapSize is only implemented there), but we do not have seen uses for G1 (obviously as it's not implemented there, and one can use G1ReservePercent to some degree). Note that G1 already has this G1ReservePercent that somewhat already acts like that, so there is a certain overlap that might need some resolving (and its default is too high for large heaps anyway). Or just changed to be adaptive. SoftMaxHeapSize may also be necessary in some cases where there is more information available from the outside than AHS can ever know. So it can be worthwhile experimenting with SMHS anyway. I will update that umbrella CR with new thoughts in the next few days. Thanks, Thomas From ayang at openjdk.org Thu May 1 09:48:47 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 May 2025 09:48:47 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References:

Message-ID: On Thu, 1 May 2025 08:27:32 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Albert review > - Merge branch 'master' into 8355681-find_contiguous_allow_expand > - init Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24915#pullrequestreview-2809548851 From wkemper at openjdk.org Thu May 1 17:43:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 1 May 2025 17:43:59 GMT Subject: Integrated: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References: Message-ID: <3bV-rGkGRHjkUNAEElE0_aSdO8t81oMrd88bjWmZY6Y=.0df6a3b1-29db-4592-aa42-7d3d15684455@github.com> On Fri, 25 Apr 2025 20:40:09 GMT, William Kemper wrote: > Add a test case for `-XX:+UseCompactObjectHeaders`, increase pressure on old generation. I ran the test (which includes a compact object headers case now) fifty times without failure. This pull request has now been integrated. Changeset: 9e26b9fa Author: William Kemper URL: https://git.openjdk.org/jdk/commit/9e26b9facba09c4d6f516e8032b876c6d9e95e9e Stats: 24 lines in 1 file changed: 15 ins; 8 del; 1 mod 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/24888 From mbaesken at openjdk.org Fri May 2 06:39:52 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 2 May 2025 06:39:52 GMT Subject: RFR: 8355372: GenShen: Test gc/shenandoah/generational/TestOldGrowthTriggers.java fails with UseCompactObjectHeaders enabled In-Reply-To: References:

<_6MD1OrkbiBPcjVkKGXvlH4xGplX11i7L_FAYKXZls8=.1a8d7276-7eac-443c-aa74-a45a3ef65e17@github.com> Message-ID: On Wed, 30 Apr 2025 15:36:32 GMT, William Kemper wrote: > have you had a chance to retest after PR#24940 was integrated? Did not see the issue again after this (of course this is no 'proof' that they will never come back), so I would say looks good ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24888#issuecomment-2846479688 From iwalulya at openjdk.org Fri May 2 12:56:51 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 2 May 2025 12:56:51 GMT Subject: RFR: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation [v2] In-Reply-To: References:

Message-ID: On Thu, 1 May 2025 09:46:17 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Albert review >> - Merge branch 'master' into 8355681-find_contiguous_allow_expand >> - init > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/24915#issuecomment-2847135344 From iwalulya at openjdk.org Fri May 2 12:56:52 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 2 May 2025 12:56:52 GMT Subject: Integrated: 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 10:57:48 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to account for free regions when checking if we have enough regions to satisfy an allocation request. Currently, we have that a `_hrm.expand_and_allocate_humongous` call fails where an `expand_and_allocate` call succeeds for the same allocation request. > > Testing: Tier 1-3 This pull request has now been integrated. Changeset: 995d5416 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/995d54161fed657f38753813f55d0591e77a42e3 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8355681: G1HeapRegionManager::find_contiguous_allow_expand ignores free regions when checking regions available for allocation Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/24915 From ayang at openjdk.org Fri May 2 18:41:53 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 2 May 2025 18:41:53 GMT Subject: RFR: 8350621: Code cache stops scheduling GC In-Reply-To: References: Message-ID: On Sun, 16 Feb 2025 18:39:29 GMT, Alexandre Jacob wrote: > The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`. > > This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling. > > Unfortunately this can't work properly under certain circumstances. > For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)). > > I have observed this behavior on JVM in version 21 that were migrated recently from java 17. > Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen. > > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. > > In order to reproduce this issue, I found a very simple and convenient way: > > > public class CodeCacheMain { > public static void main(String[] args) throws InterruptedException { > while (true) { > Thread.sleep(100); > } > } > } > > > Run this simple app with the following JVM flags: > > > -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15 > > > - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC > - low `ReservedCodeCacheSize` to put pressure on code cache quickly > - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction > > Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will: > - allows us to monitor code cache > - indirectly generate activity on the code cache, just what we need to reproduce the bug > > Some logs related to code cache will show up at some point with GC activity: > > > [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory > > > And then it will stop and we'll end up with the following message: > > > [672.714s][info][codecache ] Code cache is full - disabling compilation > > > L... I have a question regarding the existing code/logic. // In case the GC is concurrent, we make sure only one thread requests the GC. if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { log_info(codecache)("Triggering aggressive GC due to having only %.3f%% free memory", free_ratio * 100.0); Universe::heap()->collect(GCCause::_codecache_GC_aggressive); } Why making sure only one thread calls `collect(...)`? I believe this API can be invoked concurrently. Would removing `_unloading_threshold_gc_requested` resolve this problem? > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. For ParallelGC, `ParallelScavengeHeap::collect` contains the following to ensure `System.gc` gccause and similar ones guarantee a full-gc. if (!GCCause::is_explicit_full_gc(cause)) { return; } However, the current logic that a young-gc can cancel a full-gc (`_codecache_GC_aggressive` in this case) also seems surprising. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2847860414 From aboldtch at openjdk.org Mon May 5 07:52:46 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 5 May 2025 07:52:46 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation [v2] In-Reply-To: References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: <_Mn9Z5l3XKaL0wmF0p2Zj4xzonQU1RDJt-AKhufIoaM=.2bdbcb5b-bee9-4b3e-822d-9ff177e4ac54@github.com> On Tue, 29 Apr 2025 23:58:36 GMT, Tom Rodriguez wrote: >> JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into tkr-zgc-deoptimize-allocation > - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24957#pullrequestreview-2814052474 From shade at openjdk.org Mon May 5 09:49:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 09:49:50 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References:

Message-ID: On Mon, 5 May 2025 09:50:33 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to account for pending allocations when deciding how much to shrink the heap after a full gc. Otherwise, we shrink the heap only to trigger an expansion to satisfy the allocation request that triggered the full gc. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - use align_up_to_region_byte_size > - Merge remote-tracking branch 'upstream/master' into full_collection_resize_amount > - Thomas Review > - nit > - refactor full collection Maybe update that suggested comment (sorry, missed pointing that out earlier), but good. src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 484: > 482: // compacting collection, leaving no dead wood. > 483: // - if allocation_word_size is set, then this allocation size will > 484: // be accounted for in case shrinking of the heap happens. Suggestion: // - allocation_word_size is the size allocation that caused this collection. // To be considered when resizing the heap at the end of the full collection. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24944#pullrequestreview-2814512672 PR Review Comment: https://git.openjdk.org/jdk/pull/24944#discussion_r2073273737 From aboldtch at openjdk.org Mon May 5 12:18:45 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 5 May 2025 12:18:45 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References:

Message-ID: On Wed, 30 Apr 2025 02:29:34 GMT, Quan Anh Mai wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > What I meant is that we should map a relocation to BOTH the instruction start and the patch site. APX has not even released yet so I think it is more efficient to make a better fix than to make a quicker one. I think @merykitty solution with two different relocations based on wether we support APX or not. And only emit the after and nop when `VM_Version::supports_apx_f()` is true. On the other hand maybe we can solve this with a minimal change by simply looking for the REX2 prefix when we patch the code. Something along the line of: diff --git a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp index 9cdf0b229c0..4a956b450bd 100644 --- a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp @@ -1328,7 +1328,13 @@ void ZBarrierSetAssembler::patch_barrier_relocation(address addr, int format) { const uint16_t value = patch_barrier_relocation_value(format); uint8_t* const patch_addr = (uint8_t*)addr + offset; if (format == ZBarrierRelocationFormatLoadGoodBeforeShl) { - *patch_addr = (uint8_t)value; + if (VM_Version::supports_apx_f()) { + NativeInstruction* instruction = nativeInstruction_at(addr); + uint8_t* const rex2_patch_addr = patch_addr + (instruction->has_rex2_prefix() ? 1 : 0); + *rex2_patch_addr = (uint8_t)value; + } else { + *patch_addr = (uint8_t)value; + } } else { *(uint16_t*)patch_addr = value; } As for the solution to have the relocation point at the entry. While they were not designed to be used this way, It looks like it works. (At least from a barrier patching point of view, as we only want to iterate over all relocations, never map a PC to an relocation). But changing invariants are scary. And is probably better to evaluate as a part of the [JDK-8355341](https://bugs.openjdk.org/browse/JDK-8355341) RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2850807205 From coleenp at openjdk.org Mon May 5 12:26:26 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 5 May 2025 12:26:26 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom Message-ID: Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. Tested with tier5-7 with vmTestbase tests that use this package. ------------- Commit messages: - 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom Changes: https://git.openjdk.org/jdk/pull/25034/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25034&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330022 Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25034/head:pull/25034 PR: https://git.openjdk.org/jdk/pull/25034 From eosterlund at openjdk.org Mon May 5 13:14:53 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 5 May 2025 13:14:53 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation [v2] In-Reply-To: References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: On Tue, 29 Apr 2025 23:58:36 GMT, Tom Rodriguez wrote: >> JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into tkr-zgc-deoptimize-allocation > - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Good stuff. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24957#pullrequestreview-2814769629 From iwalulya at openjdk.org Mon May 5 14:02:54 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 5 May 2025 14:02:54 GMT Subject: RFR: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms Message-ID: Hi, Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). Testing: gha, manual testing as below: Mainline: [3.740s][info ][gc,init ] Heap Min Capacity: 150G [3.740s][info ][gc,init ] Heap Initial Capacity: 150G [3.740s][info ][gc,init ] Heap Max Capacity: 150G . . [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B . . [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms With patch (No shrinking when -Xms == -Xms): [3.753s][info ][gc,init ] Heap Min Capacity: 150G [3.753s][info ][gc,init ] Heap Initial Capacity: 150G [3.753s][info ][gc,init ] Heap Max Capacity: 150G . . [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms With patch (Shrinking when -Xms != -Xms): [3.755s][info ][gc,init ] Heap Min Capacity: 153568M [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M [3.755s][info ][gc,init ] Heap Max Capacity: 150G . . [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) . . [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms ------------- Commit messages: - init Changes: https://git.openjdk.org/jdk/pull/25036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308854 Stats: 16 lines in 1 file changed: 11 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25036.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25036/head:pull/25036 PR: https://git.openjdk.org/jdk/pull/25036 From duke at openjdk.org Mon May 5 16:03:59 2025 From: duke at openjdk.org (duke) Date: Mon, 5 May 2025 16:03:59 GMT Subject: Withdrawn: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 09:59:44 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands, which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::has_initial_implicit_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (measured in percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo chopin benchmarks: > > ![C2-inc-hit-rate-jdk-25+1-vs-jdk-25+1-with-8345067](https://github.com/user-attachments/assets/8d114058-c6b2-4254-a374-0d0b220af718) > > The larger number of implicit null checks results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > A further extension of the optimization to arbitrary memory access instructions (including e.g. G1 object stores, which emit multiple memory accesses at arbitrary address offsets) will be investigated separately as part of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627). > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22678 From kvn at openjdk.org Mon May 5 16:14:49 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 16:14:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References:

Message-ID: <1TLtkRe2ydHcPB5lnREFbmF4hlQ4rOBHyNXbplFujM0=.427f9764-dda9-41e4-a228-95f47426cf25@github.com> On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Looks fine to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2815333573 From shade at openjdk.org Mon May 5 16:55:49 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 16:55:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References:

Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Thank you! I'll wait a bit if @kimbarrett is able to confirm this matches the idea he had back in JBS comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2851636080 From never at openjdk.org Mon May 5 17:28:54 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 5 May 2025 17:28:54 GMT Subject: RFR: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation [v2] In-Reply-To: References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: On Tue, 29 Apr 2025 23:58:36 GMT, Tom Rodriguez wrote: >> JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into tkr-zgc-deoptimize-allocation > - 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24957#issuecomment-2851729694 From never at openjdk.org Mon May 5 17:28:54 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 5 May 2025 17:28:54 GMT Subject: Integrated: 8343158: [JVMCI] ZGC should deoptimize on old gen allocation In-Reply-To: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> References: <_Ulddj20AKoEmxWDeQckA_Rqp6LKln43acHxFFqZuKY=.30bd040b-7b46-43a5-8312-e9dbeec37ad2@github.com> Message-ID: On Tue, 29 Apr 2025 23:46:51 GMT, Tom Rodriguez wrote: > JVMCI also needs the special handling that ZGC performs for C2 for slow path allocations that are performed in old gen. This pull request has now been integrated. Changeset: cc34135f Author: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/cc34135fff7650ad44c910dca0fd47e9cbd56b68 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8343158: [JVMCI] ZGC should deoptimize on old gen allocation Reviewed-by: aboldtch, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24957 From tschatzl at openjdk.org Tue May 6 08:17:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 6 May 2025 08:17:20 GMT Subject: RFR: 8356157: Remove retry loop in collect of SerialHeap and ParallelScavengeHeap In-Reply-To: References: Message-ID: <9hvwdTVHqbOVeikfixaFuzpbVpRW4Lxc0rdVbTBb7yE=.4bc00f1b-933d-4920-946e-93e3d06411a4@github.com> On Mon, 5 May 2025 10:36:11 GMT, Albert Mingkun Yang wrote: > Simple removing unnecessary retrying logic because an gc-operation will run-to-completion, guaranteeing the increment of corresponding counters. > > Test: tier1-3 lgtm ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25032#pullrequestreview-2817313107 From tschatzl at openjdk.org Tue May 6 08:19:17 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 6 May 2025 08:19:17 GMT Subject: RFR: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:29:02 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. > > This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). > > Testing: gha, manual testing as below: > > Mainline: > > > [3.740s][info ][gc,init ] Heap Min Capacity: 150G > [3.740s][info ][gc,init ] Heap Initial Capacity: 150G > [3.740s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B > . > . > [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms > > With patch (No shrinking when -Xms == -Xms): > > > [3.753s][info ][gc,init ] Heap Min Capacity: 150G > [3.753s][info ][gc,init ] Heap Initial Capacity: 150G > [3.753s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms > > With patch (Shrinking when -Xms != -Xms): > > > [3.755s][info ][gc,init ] Heap Min Capacity: 153568M > [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M > [3.755s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) > . > . > [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25036#pullrequestreview-2817316832 From tschatzl at openjdk.org Tue May 6 08:21:17 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 6 May 2025 08:21:17 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25034#pullrequestreview-2817322299 From sjohanss at openjdk.org Tue May 6 08:58:31 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 6 May 2025 08:58:31 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v2] In-Reply-To: References: Message-ID: <-65uM_iyhoKOhBmmcPjSJHkeab7WGzclKGNkfWGcl_c=.b17052b2-4212-44ad-b302-1eb52293bc49@github.com> > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Default TLAB size of 8k, avoid 0 updates and reasonable starting values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/0c1f6eed..76c79f5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=00-01 Stats: 17 lines in 4 files changed: 13 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From jsikstro at openjdk.org Tue May 6 09:53:21 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 6 May 2025 09:53:21 GMT Subject: RFR: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:43:50 GMT, Joel Sikstr?m wrote: > Hello, > > There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. > > To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. > > I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25029#issuecomment-2853936758 From jsikstro at openjdk.org Tue May 6 09:53:21 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 6 May 2025 09:53:21 GMT Subject: Integrated: 8356083: ZGC: Duplicate ZTestEntry symbols in gtests In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:43:50 GMT, Joel Sikstr?m wrote: > Hello, > > There are duplicate definitions of ZTestEntry (one in test_zList.cpp and one in test_zIntrusiveRBTree.cpp). This results in a crash when running the ZList tests on slowdebug, where the ZTestEntry symbol from test_zIntrusiveRBTre.cpp is used over the one in test_zList.cpp. > > To remove the collision, I've renamed ZTestEntry in test_zIntrusiveRBTre.cpp to ZRBTestEntry, and ZTestEntryCompare to ZRBTestEntryCompare to reflect this change. > > I've verified that the gtests run and pass by running them locally on release, fastdebug and slowdebug. This pull request has now been integrated. Changeset: ecfaf354 Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/ecfaf354d761bc7034ea8783f4428157ea450207 Stats: 52 lines in 1 file changed: 0 ins; 0 del; 52 mod 8356083: ZGC: Duplicate ZTestEntry symbols in gtests Reviewed-by: aboldtch, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/25029 From jbhateja at openjdk.org Tue May 6 09:55:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 09:55:21 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > Member Hi @xmas92, Your suggestion looks good to me for this bugfix. I think we can improve upon the existing implementation as part of JDK-8355341 since its a bigger change and also include graal byein. There is still a possibility of incorrect relocation sharing with subsequent relocatable instructions in other cases, e.g. OR instruction for which we bookkeep the relocation address from the end of the instruction, and it's the last instruction in the pointer coloring primitive. For this bug fix, your suggestion looks fine to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2853945841 From jbhateja at openjdk.org Tue May 6 10:21:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 10:21:54 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24919/files - new: https://git.openjdk.org/jdk/pull/24919/files/1f9c84c8..fc3b61e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24919&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24919&range=00-01 Stats: 25 lines in 4 files changed: 11 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24919/head:pull/24919 PR: https://git.openjdk.org/jdk/pull/24919 From rcastanedalo at openjdk.org Tue May 6 16:42:31 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 16:42:31 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads Message-ID: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. #### Testing - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). ------------- Commit messages: - Format - Remove extra line - Further clarify zLoadP candidate predicate and no-preceding-lea assertion - Rename machine node property to ins_is_late_expanded_null_check_candidate for clarity, and make it a total function - Update copyright year - Revert unnecessary changes - Move check to original location - Enable zLoadP as implicit null check candidates on riscv and ppc - Refactor assertion - Simplify test - ... and 15 more: https://git.openjdk.org/jdk/compare/e2ae50d8...dc5aa4fc Changes: https://git.openjdk.org/jdk/pull/25066/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345067 Stats: 385 lines in 15 files changed: 338 ins; 37 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From matsaave at openjdk.org Tue May 6 18:05:17 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 6 May 2025 18:05:17 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. LGTM! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25034#pullrequestreview-2819191469 From kvn at openjdk.org Tue May 6 18:10:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 May 2025 18:10:20 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Why the attribute is not set for `zLoadP` on x64? ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2819201282 From rcastanedalo at openjdk.org Tue May 6 19:00:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 19:00:18 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 18:07:17 GMT, Vladimir Kozlov wrote: > Why the attribute is not set for `zLoadP` on x64? `ins_is_late_expanded_null_check_candidate` is set to `true` for `zLoadP` in [src/hotspot/cpu/x86/gc/z/z_x86_64.ad (line 121)](https://github.com/openjdk/jdk/pull/25066/files#diff-183d5784f9317f5582b267d82e7afa4e23ae137671fab8ba9cb5b502dae52b3dR121), or did I misunderstand your question? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2855603683 From coleenp at openjdk.org Tue May 6 19:04:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 6 May 2025 19:04:19 GMT Subject: RFR: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. Thanks for reviewing, Thomas and Matias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25034#issuecomment-2855609574 From coleenp at openjdk.org Tue May 6 19:04:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 6 May 2025 19:04:19 GMT Subject: Integrated: 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:21:36 GMT, Coleen Phillimore wrote: > Apply patch suggested by David Leopoldseder for checking the ultimate cause for OOM, which is what the test is looking for. > Tested with tier5-7 with vmTestbase tests that use this package. This pull request has now been integrated. Changeset: 4977588d Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/4977588d5e3424282f40209590737a487747095d Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod 8330022: Failure test/hotspot/jtreg/vmTestbase/nsk/sysdict/share/BTreeTest.java: Could not initialize class java.util.concurrent.ThreadLocalRandom Co-authored-by: David Leopoldseder Reviewed-by: tschatzl, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/25034 From aboldtch at openjdk.org Wed May 7 06:15:14 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 7 May 2025 06:15:14 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References:

Message-ID: <1gGtDEUALoWyrLQwwRD9bo2wb55O5Lh2DTnWTXQ8Oe8=.45ef5737-2ea6-4179-a998-79d8d51aca13@github.com> On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Sure, I'll run it through testing and report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2857462391 From sjohanss at openjdk.org Wed May 7 10:41:57 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 7 May 2025 10:41:57 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v3] In-Reply-To: References: Message-ID: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with three additional commits since the last revision: - Problemlist heap sampling test - Keep all TLAB tracking in TLABUsage - Revert initial value for TLABUsage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/76c79f5c..f361fc5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=01-02 Stats: 97 lines in 9 files changed: 33 ins; 37 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From coleenp at openjdk.org Wed May 7 20:33:56 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 7 May 2025 20:33:56 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References:

Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops This is a cleaner way to do this. I believe it's what we discussed with Kim. He can confirm. Some questions and comments and a small nit. src/hotspot/share/compiler/compileBroker.cpp line 1697: > 1695: JavaThread* thread = JavaThread::current(); > 1696: > 1697: methodHandle method(thread, task->method()); I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 35: > 33: #include "oops/weakHandle.inline.hpp" > 34: > 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { This should initialize method in the ctor initializer list. src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 51: > 49: // Method holder class cannot be unloaded. > 50: return nullptr; > 51: } This is nice that this doesn't require creating a jni handle for unloadable class loaders with this change. src/hotspot/share/runtime/vmStructs.cpp line 1266: > 1264: declare_toplevel_type(CDSFileMapRegion) \ > 1265: declare_toplevel_type(UpcallStub::FrameData) \ > 1266: declare_toplevel_type(UnloadableMethodHandle) \ So are these left for the async profiler? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2823027214 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078430169 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078443576 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078379288 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078446115 From aboldtch at openjdk.org Thu May 8 05:25:01 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 8 May 2025 05:25:01 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree Message-ID: [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. ------------- Commit messages: - 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree Changes: https://git.openjdk.org/jdk/pull/25112/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356455 Stats: 2158 lines in 5 files changed: 97 ins; 2026 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/25112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25112/head:pull/25112 PR: https://git.openjdk.org/jdk/pull/25112 From eosterlund at openjdk.org Thu May 8 07:55:50 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 8 May 2025 07:55:50 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree In-Reply-To: References: Message-ID: On Thu, 8 May 2025 05:21:20 GMT, Axel Boldt-Christmas wrote: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2824168024 From jsikstro at openjdk.org Thu May 8 09:14:52 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 8 May 2025 09:14:52 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree In-Reply-To: References: Message-ID: On Thu, 8 May 2025 05:21:20 GMT, Axel Boldt-Christmas wrote: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2824384128 From sjohanss at openjdk.org Thu May 8 10:06:41 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 8 May 2025 10:06:41 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v4] In-Reply-To: References: Message-ID: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Handle inc and dec in alloc/undo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/f361fc5d..ba7cb673 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=02-03 Stats: 60 lines in 6 files changed: 37 ins; 14 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From epeter at openjdk.org Thu May 8 11:29:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:01 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). @roberto Thanks a lot for taking the time to explain how implicit null checks work, and giving me some background for the PR :) Below, I have mostly code style / naming suggestions, that you are welcome to use as inspiration. But you do not have to apply any of them, it is totally up to you :) I'm definitely not an expert here, but your approach seems reasonable to me. The opt-in annotation `ins_is_late_expanded_null_check_candidate` makes sure we only do the optimization when we are sure it is ok. It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations. In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. But I suppose that cannot happen, because the GC only moves the pointer, so if the old pointer was non-null, the new pointer must be non-null as well. Maybe that was so trivial that you did not even understand my question there ? But it could be helpful to write that down somewhere, just to make sure people are aware of this. I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer is still valid in the barrier, and if not, we somehow directly translate the old pointer to a new pointer? Is that what the oop map is used for? src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 130: > 128: Address::offset_ok_for_immed(ref_addr.offset(), exact_log2(size)), > 129: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); > 130: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); For context: 132 /* Sometimes we get misaligned loads and stores, usually from Unsafe 133 accesses, and these can exceed the offset range. */ 134 Address legitimize_address(const Address &a, int size, Register scratch) { 135 if (a.getMode() == Address::base_plus_offset) { 136 if (! Address::offset_ok_for_immed(a.offset(), exact_log2(size))) { 137 block_comment("legitimize_address {"); 138 lea(scratch, a); 139 block_comment("} legitimize_address"); 140 return Address(scratch); 141 } 142 } 143 return a; 144 } I wonder if it might be worth to create a `legitimize_address_requires_lea` that does the checks. Then you could refactor `legitimize_address` with it, and also use it here. Not sure if it is worth it, but it could ensure that the checks stay in sync. Up to you. src/hotspot/share/opto/block.hpp line 468: > 466: > 467: // If necessary, hoist orphan node n into the end of block b. > 468: void maybe_hoist_into(Node* n, Block* b); Hmm. It is "if necessary" or "if possible"? I wonder if we could come up with a name that is a little longer and expresses this condition? src/hotspot/share/opto/lcm.cpp line 79: > 77: } > 78: > 79: void PhaseCFG::move_into(Node* n, Block* b) { Suggestion: void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { src/hotspot/share/opto/lcm.cpp line 89: > 87: if (!out->is_MachProj()) { > 88: continue; > 89: } What about the `MachTemp`? Also: how specific to implicit null checks are your methods `move_into` and `maybe_hoist_into`? If they are not reusable elsewhere, it may be good to give them a more specific name. src/hotspot/share/opto/lcm.cpp line 105: > 103: "need for recursive hoisting not expected"); > 104: move_into(n, b); > 105: } Do I understand this right: You are looking at some input `n` here, and want to make sure that it is located at `b` or before? Suggestion to make it a bit more clear: Suggestion: // We want to ensure that n happens at b or before, i.e. at a block that dominates b. void PhaseCFG::ensure_node_is_at_block_or_before(Node* n, Block* b) { Block* current = get_block_for_node(n); if (current->dominates(b)) { return; // n already happens before b, do nothing. } // We only expect nodes without further inputs, like MachTemp or load Base. assert(n->req() == 0 || (n->req() == 1 && n->in(0) == (Node*)C->root()), "need for recursive hoisting not expected"); assert(b->dominates(current), "precondition: can only move n to b if b dominates n"); move_node_and_its_projections_to_block(n, b); } I did not understand what this meant: `sanity check: temp node placement`... Ah, I suppose we are assuming that `n` is a `MachTemp`, and this would have to be placed in a block dominated by b? But could `n` not also be a `load Base`? Could that be a `MachProj`? Just a little confused here. Maybe moving the `b->dominates(current)` assert down helps give good context? But in a sense, it is also a precondition, we can only move `n` up to `b` if `b` dominates `n`... Do you have a better idea? src/hotspot/share/opto/lcm.cpp line 356: > 354: if (mach->in(j)->is_MachTemp()) { > 355: assert(mach->in(j)->outcnt() == 1, "MachTemp nodes should not be shared"); > 356: // Ignore MachTemp inputs, they can be safely hoisted with the candidate. Suggestion: // Ignore MachTemp inputs, they can be safely hoisted with the candidate. // MachTemp have no inputs themselves and are only there to reserve a scratch // register for the GC barrier of the memory operation. That was what you told me in our offline meeting, I thought it was helpful context information. src/hotspot/share/opto/lcm.cpp line 428: > 426: maybe_hoist_into(val->in(i), block); > 427: } > 428: move_into(val, block); Suggestion: // Inputs of val may already be early enough, but if not move them together with val. ensure_node_is_at_block_or_before(val->in(i), block); } move_node_and_its_projections_to_block(val, block); src/hotspot/share/opto/lcm.cpp line 437: > 435: if (n == nullptr || !n->is_MachTemp()) { > 436: continue; > 437: } Do you want to check that all other nodes already dominate `block`? src/hotspot/share/opto/lcm.cpp line 439: > 437: } > 438: maybe_hoist_into(n, block); > 439: } It seems to me this is definitely new code, ensuring that we move the `MachTemp`. We did not do that before, at least not here. Correct? src/hotspot/share/opto/lcm.cpp line 441: > 439: map_node_to_block(n, block); > 440: } > 441: } This now happens in `move_into`, right? src/hotspot/share/opto/machnode.hpp line 391: > 389: > 390: // Whether this node is expanded during code emission into a sequence of > 391: // instructions and the first instruction can perform an implicit null check. You may want to put a warning / reasoning here, in case there are multiple loads. You explained to me offline that a `zLoadP` may have a load at the beginning, but then need to load again if the GC moved the object. I suppose if it was moved, then it cannot be null, and so that should be safe... maybe that is a sufficient argument, what do you think? test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 51: > 49: * @requires vm.gc.Z > 50: * @run driver compiler.gcbarriers.TestImplicitNullChecks Z > 51: */ Do you think there would be any value in having a run without requirements? Just for general result verification... i.e. that we get the correct NullPointerException. Of course, you would have to probably add `applyIf` to the `@IR` rules. test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 119: > 117: testLoad(o); > 118: } catch (NullPointerException e) { nullPointerException = true; } > 119: Asserts.assertTrue(nullPointerException); Suggestion: try { testLoad(o); throw new RuntimeException("Should have thrown NullPointerException"); } catch (NullPointerException e) { /* expected */} Could be a shorter alternative. Up to you. Maybe there is a benefit to `Asserts.assertTrue` I am also not aware of? But totally optional, as your approach works anyway :) test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 140: > 138: // G1 and ZGC stores cannot be currently used to implement implicit null > 139: // checks, because they expand into multiple memory access instructions that > 140: // are not necessarily located at the initial instruction start address. Very random idea, no idea if it is any good: Why not do the implicit null-check with a fake Load? No idea on the implications here. I suppose it would be extra code, but at least not branching code? ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2824535603 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079357655 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079437197 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079476518 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079430920 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079473986 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079420601 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079480978 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079486097 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079509053 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079488019 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079491319 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079493683 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079500275 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079505342 From epeter at openjdk.org Thu May 8 11:29:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 10:21:14 GMT, Emanuel Peter wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > src/hotspot/share/opto/block.hpp line 468: > >> 466: >> 467: // If necessary, hoist orphan node n into the end of block b. >> 468: void maybe_hoist_into(Node* n, Block* b); > > Hmm. It is "if necessary" or "if possible"? > I wonder if we could come up with a name that is a little longer and expresses this condition? Ah no, I'm starting to understand that it is rather a `if necessary`... > src/hotspot/share/opto/lcm.cpp line 428: > >> 426: maybe_hoist_into(val->in(i), block); >> 427: } >> 428: move_into(val, block); > > Suggestion: > > // Inputs of val may already be early enough, but if not move them together with val. > ensure_node_is_at_block_or_before(val->in(i), block); > } > move_node_and_its_projections_to_block(val, block); It's a little hard to see here: did you just refactor this code, or make any changes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079450181 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079507708 From epeter at openjdk.org Thu May 8 11:29:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com>

Message-ID: On Thu, 8 May 2025 10:29:17 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/block.hpp line 468: >> >>> 466: >>> 467: // If necessary, hoist orphan node n into the end of block b. >>> 468: void maybe_hoist_into(Node* n, Block* b); >> >> Hmm. It is "if necessary" or "if possible"? >> I wonder if we could come up with a name that is a little longer and expresses this condition? > > Ah no, I'm starting to understand that it is rather a `if necessary`... See further comments at `maybe_hoist_into` and my suggestions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079512983 From thartmann at openjdk.org Thu May 8 12:17:57 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 12:17:57 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References:

Message-ID: <7XtX737NV9bjyQWKxZK0rjNzQ1ye2IpbsuWTtI8Rh1s=.7e6bb289-50a1-45e2-906a-44348848a281@github.com> On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2862849381 From shade at openjdk.org Thu May 8 12:39:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:39:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References:

Message-ID: On Wed, 7 May 2025 20:30:00 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/runtime/vmStructs.cpp line 1266: > >> 1264: declare_toplevel_type(CDSFileMapRegion) \ >> 1265: declare_toplevel_type(UpcallStub::FrameData) \ >> 1266: declare_toplevel_type(UnloadableMethodHandle) \ > > So are these left for the async profiler? Yes, see https://github.com/async-profiler/async-profiler/issues/1260 that is filed already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079634276 From shade at openjdk.org Thu May 8 12:42:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:42:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References:

Message-ID: On Wed, 7 May 2025 20:28:10 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 35: > >> 33: #include "oops/weakHandle.inline.hpp" >> 34: >> 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { > > This should initialize method in the ctor initializer list. Maybe, but the field is not `const`, so there seem to be no point? We also assign after assert checks `method` for us. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079637960 From shade at openjdk.org Thu May 8 12:50:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:50:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References:

Message-ID: On Wed, 7 May 2025 19:54:04 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 51: > >> 49: // Method holder class cannot be unloaded. >> 50: return nullptr; >> 51: } > > This is nice that this doesn't require creating a jni handle for unloadable class loaders with this change. Right? Wasteful to even go through all this dance for compiling JDK methods :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079651140 From aboldtch at openjdk.org Thu May 8 13:01:07 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 8 May 2025 13:01:07 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v2] In-Reply-To: References: Message-ID: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: - Use private inheritance - Separate tree logic to own class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25112/files - new: https://git.openjdk.org/jdk/pull/25112/files/4bc5cf09..3c3e22bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25112&range=00-01 Stats: 253 lines in 2 files changed: 122 ins; 93 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/25112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25112/head:pull/25112 PR: https://git.openjdk.org/jdk/pull/25112 From aboldtch at openjdk.org Thu May 8 13:03:53 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 8 May 2025 13:03:53 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree In-Reply-To: References: Message-ID: On Thu, 8 May 2025 05:21:20 GMT, Axel Boldt-Christmas wrote: > [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. > > The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. > > Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. > > Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. @stefank had some comments about having to much logic inlined. So abstracted the extra tree logic into its own inner class. Currently re-running tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25112#issuecomment-2862969347 From shade at openjdk.org Thu May 8 14:33:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 14:33:02 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References:

Message-ID: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> On Wed, 7 May 2025 20:18:29 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/compiler/compileBroker.cpp line 1697: > >> 1695: JavaThread* thread = JavaThread::current(); >> 1696: >> 1697: methodHandle method(thread, task->method()); > > I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. Ah, that reminds me, thanks. I removed this because I caught method to be in unsafe (unloaded) state, so `method()` asserted on me. `compiler/c1/TestConcurrentPatching.java` seems to intermittently crash on it. On this code path, I think we might be plausibly waiting on unloaded compile task, and we "only" wait for notification that task got purged from the queue. Handelizing broken `Method*` is awkward, to say the least! Then again, I am not sure if removing this handle is safe enough. So out of abundance of caution, we can actually handelize `Method*` after checking for task status. But now that I do this: methodHandle method(thread, task->is_unloaded() ? nullptr : task->method()); ...the test still fails on the same assert! Which makes no sense to me, as we are supposed to be guarded by `is_unloaded` check before it. Something is off, I'll investigate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079838894 From kvn at openjdk.org Thu May 8 15:14:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 15:14:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com>

Message-ID: <40ZOuLCtxa6ytFKxGHY5mHY_SI_e1AxrXSUrpmNB9Lk=.17f141ca-5b1e-4ead-8416-86f5b7382598@github.com> On Tue, 6 May 2025 18:57:14 GMT, Roberto Casta?eda Lozano wrote: > > Why the attribute is not set for `zLoadP` on x64? > > `ins_is_late_expanded_null_check_candidate` is set to `true` for `zLoadP` in [src/hotspot/cpu/x86/gc/z/z_x86_64.ad (line 121)](https://github.com/openjdk/jdk/pull/25066/files#diff-183d5784f9317f5582b267d82e7afa4e23ae137671fab8ba9cb5b502dae52b3dR121), or did I misunderstand your question? Somehow I missed this change. Good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2863416833 From kvn at openjdk.org Thu May 8 15:24:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 15:24:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). src/hotspot/share/opto/lcm.cpp line 95: > 93: } > 94: > 95: void PhaseCFG::maybe_hoist_into(Node* n, Block* b) { Consider adding asserts into these 2 new methods to make sure that they operate only on **data** and not control nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079942627 From stefank at openjdk.org Thu May 8 16:06:04 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 8 May 2025 16:06:04 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v4] In-Reply-To: References:

Message-ID: On Thu, 8 May 2025 10:06:41 GMT, Stefan Johansson wrote: >> Please review this change to improve TLAB handling in ZGC. >> >> **Summary** >> In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. >> >> The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: >> >> bool update_allocation_history = used > 0.5 * capacity; >> ``` >> >> So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. >> >> Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. >> >> How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. >> >> This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. >> >> **Testing** >> * Functional testin... > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Handle inc and dec in alloc/undo I like this change. I've added a few comments below. src/hotspot/share/gc/z/zTLABUsage.cpp line 32: > 30: _used_history() {} > 31: > 32: Suggestion: src/hotspot/share/gc/z/zTLABUsage.cpp line 39: > 37: void ZTLABUsage::decrease_used(size_t size) { > 38: precond(size <= _used); > 39: Atomic::sub(&_used, size, memory_order_relaxed); Suggestion: precond(size <= _used); Atomic::sub(&_used, size, memory_order_relaxed); src/hotspot/share/gc/z/zTLABUsage.cpp line 43: > 41: > 42: void ZTLABUsage::reset() { > 43: const size_t current_used = Atomic::xchg(&_used, (size_t) 0); Does this work instead? Suggestion: const size_t current_used = Atomic::xchg(&_used, 0u); src/hotspot/share/gc/z/zTLABUsage.cpp line 51: > 49: > 50: // Save the old values for logging > 51: const size_t old_used = used(); It's not immediately obvious what `_used` is compared to `used()` Could one of these be renamed so that readers don't mistakenly assume that `used()` returns `_used`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24814#pullrequestreview-2825630207 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080009139 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080009572 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080010741 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080017958 From sviswanathan at openjdk.org Thu May 8 22:20:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 8 May 2025 22:20:52 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References:

Message-ID: On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24919#pullrequestreview-2826479403 From Monica.Beckwith at microsoft.com Thu May 8 22:47:25 2025 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Thu, 8 May 2025 22:47:25 +0000 Subject: G1 AHS + Request for Feedback and Testing on G1 Heap Resizing Prototype Message-ID: Hi all, Thanks to everyone for the ongoing AHS discussions across 8236073, 8238686/87, and umbrella JDK-8353716. >From the Microsoft side, we have been reviewing logs from a range of prod-like use cases across the broader MSFT environment, including first-party Java services (both Azure-hosted and non-Azure), as well as OSS-based deployments (Cassandra, Kafka, etc). We've also been benchmarking with various combinations (ReservePercent, GCTimeRatio, periodic GC, etc) and exploring early models to help gauge expected shrink/grow behavior under service conditions. These observations have shaped our perspective and contributions to upstream design discussions. Here's?where we currently stand: ------------------------------------------------------------------------ 1. ?SoftMaxHeapSize semantics and placement ------------------------------------------------------------------------ We continue to support the current SoftMax proposal as a **soft upper bound** on heap usage?one that the GC controller respects, but may temporarily exceed if necessary. Our analysis of logs shows that an effective SoftMax, even when static, would help reduce RSS under light traffic without requiring aggressive full GCs. We also plan to evaluate the controller changes under PR #24211 once they?re merged, and we?d like to keep the option of a `jcmd GC.set_soft_max` interface, consistent with ZGC and future container signals (e.g. memory.high). ------------------------------------------------------------------------ 2. ?GCTimeRatio as a feedback driver ------------------------------------------------------------------------ We support the move to a higher default value for `GCTimeRatio` as it aligns well with throughput goals in our measured workloads, including SPECjbb2015, DBs, and Spring-based services. We plan to continue stepped testing across representative service patterns. ?We'd also support exposing an alias like `-XX:GCCPUPercent` to improve ergonomics for operators.? ------------------------------------------------------------------------ 3. ?Reserve floor and shrink control ------------------------------------------------------------------------ We strongly recommend retaining `G1ReservePercent` as a configurable minimum, particularly in low-latency scenarios or when allocation bursts are expected immediately after idle phases. We?d also be open to exploring future adaptive variants of the reserve floor as the AHS loop matures. ------------------------------------------------------------------------ 4. ?Periodic GC fallback and field heuristics ------------------------------------------------------------------------ Until AHS-driven shrink behavior is well understood and widely adopted, we recommend retaining a periodic GC safety net?especially for services with extended idle phases. As AHS matures, we?ll continue to evaluate whether this fallback remains necessary in production. ------------------------------------------------------------------------ 5. ?Role of externally-supplied limits ------------------------------------------------------------------------ Internally, we?ve discussed how AHS should behave in managed container environments such as AKS. In most cases we expect the JVM to operate within cgroup-defined memory.max and possibly memory.high bounds. We don?t?currently envision supporting non-cgroup (custom/embedded) environments on day one. We also believe that memory.high or RSS-based constraints could eventually serve as complementary signals for guiding heap elasticity, especially for AKS customers. These use cases are still exploratory, but we hope they can be accommodated within the direction of AHS without adding undue complexity to the core loop. ------------------------------------------------------------------------ 6. ?Design notes and alignment ------------------------------------------------------------------------ For reference, our current AHS evaluation and alignment write-up (including control flow diagrams and tuning strategy) is here: ? ? https://github.com/microsoft/openjdk-workstreams/tree/main/G1-AHS We?ll?continue to update that as PRs land and more data becomes available. We welcome any feedback on the write-up or our alignment approach and would be happy to incorporate community input via PRs. We are also open to hosting the write-up within an OpenJDK project repo if that's deemed appropriate. Thanks again to everyone driving this effort forward?happy to continue refining as the pieces come together. Best regards, ? Monica From jbhateja at openjdk.org Fri May 9 05:31:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 05:31:57 GMT Subject: Integrated: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. This pull request has now been integrated. Changeset: 53ad4b2a Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/53ad4b2ad2664e5056c113543dfaa26647d6ce26 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Co-authored-by: Axel Boldt-Christmas Reviewed-by: aboldtch, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/24919 From sjohanss at openjdk.org Fri May 9 06:07:53 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 May 2025 06:07:53 GMT Subject: RFR: 8350596: [Linux] Increase default MaxRAMPercentage for containerized workloads In-Reply-To: References: Message-ID: On Wed, 7 May 2025 09:29:16 GMT, Severin Gehwolf wrote: > Please take a look at this proposal to fix the "Java needs so much memory" perception in containers. The idea would be to bump the default `MaxRAMPercentage` to a higher value. The patch proposes 75%, but we could just as well use 50% if people feel more comfortable about it. Right now the default deployment in containers with resource limits in place (common for Kubernetes deployments) where a single process runs in the container isn't well catered for today for an application that just uses the default configuration. Only 25% of the container memory will be used for the Java heap, arguably wasting much of the remaining memory that has been granted to the container by a memory limit (that the JVM would detect and use as physical memory). > > I've filed a CSR for this as well for which I'm looking for reviewers too and intend to write a release note as well about this change as it has some risk associated with it, although the escape hatch is pretty simple: set `-XX:MaxRAMPercentage=25.0` to go back to the old behavour. > > Testing: > - [x] GHA - tier 1 (windows failures seem infra related) > - [x] hotspot and jdk container tests on cg v2 and cg v1 including the two new tests. > > Thoughts? Opinions? Thanks for looking into this Severin. Thinking back to the discussions we had around this at OCW, I remember there were some concerns regarding different types of deployments. I think this really makes sense in the cases where we divide a machines memory using containers, but what if containers are just used to divide other resources. One use-case that was raised was containerized applications on Linux. I'm not sure if such an application would report true for `is_containerized()`, but it would be nice to have some data around this. Have you done any testing with containerized apps? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25086#issuecomment-2865246427 From sjohanss at openjdk.org Fri May 9 07:52:53 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 May 2025 07:52:53 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v4] In-Reply-To: References:

Message-ID: <2uwu7EoW1H6F6v0FlZsop7jiQhePYWnXNzePf_4pQBc=.52f2dde4-dadc-4b07-af0b-8fd52f0765f0@github.com> On Thu, 8 May 2025 15:57:19 GMT, Stefan Karlsson wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Handle inc and dec in alloc/undo > > src/hotspot/share/gc/z/zTLABUsage.cpp line 43: > >> 41: >> 42: void ZTLABUsage::reset() { >> 43: const size_t current_used = Atomic::xchg(&_used, (size_t) 0); > > Does this work instead? > Suggestion: > > const size_t current_used = Atomic::xchg(&_used, 0u); No, `0ul` works on Linux, but Windows fails with that. > src/hotspot/share/gc/z/zTLABUsage.cpp line 51: > >> 49: >> 50: // Save the old values for logging >> 51: const size_t old_used = used(); > > It's not immediately obvious what `_used` is compared to `used()` Could one of these be renamed so that readers don't mistakenly assume that `used()` returns `_used`. Talked a bit about this offline, will add some comments and rename `used()` and `capacity()` to `tlab_used()` and `tlab_capacity()` to make it a bit more clear that they are not directly connected and also better match the `ZHeap` interface. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2081127733 PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2081130690 From sjohanss at openjdk.org Fri May 9 08:17:13 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 May 2025 08:17:13 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v5] In-Reply-To: References: Message-ID: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: StefanK review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24814/files - new: https://git.openjdk.org/jdk/pull/24814/files/ba7cb673..2f5742fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=03-04 Stats: 22 lines in 3 files changed: 4 ins; 1 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/24814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814 PR: https://git.openjdk.org/jdk/pull/24814 From stefank at openjdk.org Fri May 9 08:29:53 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 9 May 2025 08:29:53 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v5] In-Reply-To: References:

Message-ID: On Fri, 9 May 2025 08:17:13 GMT, Stefan Johansson wrote: >> Please review this change to improve TLAB handling in ZGC. >> >> **Summary** >> In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. >> >> The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: >> >> bool update_allocation_history = used > 0.5 * capacity; >> ``` >> >> So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. >> >> Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. >> >> How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. >> >> This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. >> >> **Testing** >> * Functional testin... > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > StefanK review lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24814#pullrequestreview-2827676721 From sgehwolf at openjdk.org Fri May 9 10:06:50 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 9 May 2025 10:06:50 GMT Subject: RFR: 8350596: [Linux] Increase default MaxRAMPercentage for containerized workloads In-Reply-To: References: Message-ID: <3zkVeUEqr_avGG1v8Q0Dp_0_FiZrXLxHJeU4KQ556sg=.77fbebb1-8bcf-40bb-95c0-664120321cbf@github.com> On Wed, 7 May 2025 09:29:16 GMT, Severin Gehwolf wrote: > Please take a look at this proposal to fix the "Java needs so much memory" perception in containers. The idea would be to bump the default `MaxRAMPercentage` to a higher value. The patch proposes 75%, but we could just as well use 50% if people feel more comfortable about it. Right now the default deployment in containers with resource limits in place (common for Kubernetes deployments) where a single process runs in the container isn't well catered for today for an application that just uses the default configuration. Only 25% of the container memory will be used for the Java heap, arguably wasting much of the remaining memory that has been granted to the container by a memory limit (that the JVM would detect and use as physical memory). > > I've filed a CSR for this as well for which I'm looking for reviewers too and intend to write a release note as well about this change as it has some risk associated with it, although the escape hatch is pretty simple: set `-XX:MaxRAMPercentage=25.0` to go back to the old behavour. > > Testing: > - [x] GHA - tier 1 (windows failures seem infra related) > - [x] hotspot and jdk container tests on cg v2 and cg v1 including the two new tests. > > Thoughts? Opinions? Thanks for looking at this, Stefan. > Thinking back to the discussions we had around this at OCW, I remember there were some concerns regarding different types of deployments. I think this really makes sense in the cases where we divide a machines memory using containers, but what if containers are just used to divide other resources. One use-case that was raised was containerized applications on Linux. Currently there is only the generic `is_containerized()` API which has been documented in the bug that fixed that: [JDK-8261242](https://bugs.openjdk.org/browse/JDK-8261242?focusedId=14685743&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14685743) So yes, this would update the RAM percentage for a) unprivileged container (no limits), b) some other container tech which sets the cgroup CPU limit for example. The JVM currently only looks at memory/cpu limits for privileged containers and takes that into consideration for `is_containerized()`. If there is consensus, we could add an API that returns true if only a memory limit is present. That doesn't exist yet, though. Happy to propose something going into that direction. The infra is already there. > I'm not sure if such an application would report true for `is_containerized()`, but it would be nice to have some data around this. It would return true for any non-privileged container. I can see that this might be a concern. > Have you done any testing with containerized apps? I have done some basic testing so far, but would be happy to do more. What specific testing would you be interested in? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25086#issuecomment-2865954385 From shade at openjdk.org Fri May 9 11:23:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 11:23:55 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> References:

<7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> Message-ID: <_8y_DYl9Q4P1scTtA_J8ilWw_GP0kdSL37bAmYb4dEM=.ea34a76f-0236-459f-b99c-a8d6129c3a67@github.com> On Thu, 8 May 2025 14:29:56 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/compiler/compileBroker.cpp line 1697: >> >>> 1695: JavaThread* thread = JavaThread::current(); >>> 1696: >>> 1697: methodHandle method(thread, task->method()); >> >> I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. > > Ah, that reminds me, thanks. > > I removed this because I caught method to be in unsafe (unloaded) state, so `method()` asserted on me. `compiler/c1/TestConcurrentPatching.java` seems to intermittently crash on it. On this code path, I think we might be plausibly waiting on unloaded compile task, and we "only" wait for notification that task got purged from the queue. Handelizing broken `Method*` is awkward, to say the least! > > Then again, I am not sure if removing this handle is safe enough. So out of abundance of caution, we can actually handelize `Method*` after checking for task status. But now that I do this: > > > methodHandle method(thread, task->is_unloaded() ? nullptr : task->method()); > > > ...the test still fails on the same assert! Which makes no sense to me, as we are supposed to be guarded by `is_unloaded` check before it. Something is off, I'll investigate. I understand now. There are TOCTOU-s under concurrent `block_unloading`. The most egregious one is here: `is_unloaded` checks in two steps: `!_weak_handle.is_empty() && _weak_handle.peek() == nullptr;`. So when `block_unloading` comes in concurrently and resets weak to empty (since we have strong handle now), it might be possible that first predicate is still `true`, but evaluation of second predicate calls `peek` on empty `_weak_handle`, oops. We could technically claim that `UnloadableMethodHandle` is not thread-safe, but it does not solve current compiler uses, and it is very unsatisfactory for the utility class. I'll look into ways to make it resilient under concurrent updates. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2081467353 From shade at openjdk.org Fri May 9 17:08:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 17:08:42 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v12] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Tracking UMH state more accurately - Rework for safer concurrency - Merge branch 'master' into JDK-8231269-compile-task-weaks - Move to oops - Improve get_method_blocker - Simplify a bit - Merge branch 'master' into JDK-8231269-compile-task-weaks - Do not accept nullptr methods - Attempt at phasing doc - Merge branch 'master' into JDK-8231269-compile-task-weaks - ... and 12 more: https://git.openjdk.org/jdk/compare/ad07426f...1cdbed2b ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=11 Stats: 393 lines in 11 files changed: 331 ins; 25 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Fri May 9 17:08:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 17:08:42 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References:

Message-ID: On Sun, 11 May 2025 16:40:15 GMT, Albert Mingkun Yang wrote: >> Hi, >> >> Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. >> >> This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). >> >> Testing: gha, manual testing as below: >> >> Mainline: >> >> >> [3.740s][info ][gc,init ] Heap Min Capacity: 150G >> [3.740s][info ][gc,init ] Heap Initial Capacity: 150G >> [3.740s][info ][gc,init ] Heap Max Capacity: 150G >> . >> . >> [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B >> . >> . >> [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms >> >> With patch (No shrinking when -Xms == -Xms): >> >> >> [3.753s][info ][gc,init ] Heap Min Capacity: 150G >> [3.753s][info ][gc,init ] Heap Initial Capacity: 150G >> [3.753s][info ][gc,init ] Heap Max Capacity: 150G >> . >> . >> [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms >> >> With patch (Shrinking when -Xms != -Xms): >> >> >> [3.755s][info ][gc,init ] Heap Min Capacity: 153568M >> [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M >> [3.755s][info ][gc,init ] Heap Max Capacity: 150G >> . >> . >> [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) >> . >> . >> [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25036#issuecomment-2871357566 From iwalulya at openjdk.org Mon May 12 08:18:59 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 12 May 2025 08:18:59 GMT Subject: Integrated: 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:29:02 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to prevent G1 from shrinking the heap below -Xms when deallocating CDS archive regions. This issue is particularly noticeable when -Xms==-Xmx, G1 still uncommits the archive regions thus shrinking the heap below -Xms. In this change, G1 does not uncommit the archive regions in cases where doing so would shrink the heap below the configured -Xms. > > This is a temporary fix, we expect a more complete solution to be delivered under [JDK-8326035](https://bugs.openjdk.org/browse/JDK-8326035). > > Testing: gha, manual testing as below: > > Mainline: > > > [3.740s][info ][gc,init ] Heap Min Capacity: 150G > [3.740s][info ][gc,init ] Heap Initial Capacity: 150G > [3.740s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.749s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B > . > . > [9.000s][info ][gc ] GC(0) Pause Full (System.gc()) 10728M->140M(153568M) 119.887ms > > With patch (No shrinking when -Xms == -Xms): > > > [3.753s][info ][gc,init ] Heap Min Capacity: 150G > [3.753s][info ][gc,init ] Heap Initial Capacity: 150G > [3.753s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [8.773s][info ][gc ] GC(0) Pause Full (System.gc()) 10687M->140M(153600M) 117.901ms > > With patch (Shrinking when -Xms != -Xms): > > > [3.755s][info ][gc,init ] Heap Min Capacity: 153568M > [3.755s][info ][gc,init ] Heap Initial Capacity: 153568M > [3.755s][info ][gc,init ] Heap Max Capacity: 150G > . > . > [3.764s][debug][gc,ergo,heap] Attempt heap shrinking (CDS archive regions). Total size: 33554432B (1 Regions) > . > . > [8.919s][info ][gc ] GC(0) Pause Full (System.gc()) 10692M->140M(153568M) 125.810ms This pull request has now been integrated. Changeset: a3afc9f7 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/a3afc9f7ceba24ab607141426bb0a2693e6d37ca Stats: 16 lines in 1 file changed: 11 ins; 1 del; 4 mod 8308854: G1 archive region allocation may expand/shrink the heap above/below -Xms Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/25036 From shade at openjdk.org Mon May 12 13:11:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 13:11:17 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v13] In-Reply-To: References: Message-ID: <2ydVKTAbomGLgJTwl-1jRBxgF4MRz0h-2CQmr9yHTxg=.0e094037-94b2-4627-92ef-01946fed014b@github.com> > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - More thorough locking and redefinition escape hatch - Fix build failures: add more headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/1cdbed2b..ce737c5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=11-12 Stats: 114 lines in 7 files changed: 58 ins; 20 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Mon May 12 14:15:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 14:15:16 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v14] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Fix release builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/ce737c5a..f239c221 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Mon May 12 14:33:40 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 14:33:40 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v15] In-Reply-To: References: Message-ID: <2LlyHKO14TOr7qVXQbyjy4ZWrGo8fCo3muVoa6VlFzc=.50816f66-e90b-4bb6-b953-64f6a675d664@github.com> > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More touchups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/f239c221..33e545ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=13-14 Stats: 26 lines in 3 files changed: 14 ins; 4 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From rcastanedalo at openjdk.org Mon May 12 14:48:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 May 2025 14:48:59 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Thanks for looking at this PR, Emanuel! > It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations. Right, I have prototyped this alternative in the wider context of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627) since it would be required for using writes as implicit null checks (both in ZGC and G1), and it indeed adds some complexity to `PhaseOutput` and other places (see https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks). I ran some preliminary experiments and could not see enough benefits to justify the additional complexity. > In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. This cannot happen because the address we are loading from is constant through the barrier, see e.g. the code generated for a zLoadP in x64 (AT&T syntax): 0x00007514c47d6aa0: movq 0x10(%rsi), %rax ; main OOP load with implicit exception: dispatches to 0x00007514c47d6abe 0x00007514c47d6aa4: shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax 0x00007514c47d6aa8: ja 0x36 ; jump to barrier stub (slow path) (...) 0x00007514c47d6abe: trigger uncommon trap (null_check) (...) barrier stub (slow path): 0x00007514c47d6ae4: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) 0x00007514c47d6b09: jmp -0x5d ; go back to main code section Note how the address we might fault on (triggering the implicit exception) is stored on `%rsi` (base address) + `0x10` (field offset), which is not changed between the main load and the slow-path reload. > I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer is still valid in the barrier, and if not, we somehow directly translate the old pointer to a new pointer? Is that what the oop map is used for? I am not sure I understand the question, could you perhaps re-formulate it using some example to make it more concrete? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2872870543 From eosterlund at openjdk.org Mon May 12 21:53:54 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 12 May 2025 21:53:54 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v2] In-Reply-To: References:

Message-ID: On Thu, 8 May 2025 13:01:07 GMT, Axel Boldt-Christmas wrote: >> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. >> >> The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. >> >> Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. >> >> Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Use private inheritance > - Separate tree logic to own class Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2834716108 From kdnilsen at openjdk.org Mon May 12 22:40:33 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 22:40:33 GMT Subject: RFR: 8355340: GenShen: Remove unneeded log messages related to remembered set write table [v2] In-Reply-To: References: Message-ID: <0rJUri0R4B1p5Vf_3tzRegWxn3T6r7046gKXUJbeYV8=.af67166b-45bc-4888-82ec-c69fbdb5c6af@github.com> > Remove unneeded log messages related to processing of the remembered set write card table. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Use log_develop_debug message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24809/files - new: https://git.openjdk.org/jdk/pull/24809/files/c1f65632..9cb54f3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24809&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24809&range=00-01 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24809.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24809/head:pull/24809 PR: https://git.openjdk.org/jdk/pull/24809 From kdnilsen at openjdk.org Mon May 12 22:40:33 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 22:40:33 GMT Subject: RFR: 8355340: GenShen: Remove unneeded log messages related to remembered set write table In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 00:54:45 GMT, Kelvin Nilsen wrote: > Remove unneeded log messages related to processing of the remembered set write card table. Replaced original messages with log_develop_debug() messages. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24809#issuecomment-2874353597 From kdnilsen at openjdk.org Mon May 12 22:43:26 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 22:43:26 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() Message-ID: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Two changes: 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). ------------- Commit messages: - Fix white apce - available() returns Sentinel if under construction - Log full gc region transfers outside heaplock - Make ShenFreeSet::available() race free - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 29 more: https://git.openjdk.org/jdk/compare/92730945...6353f1f7 Changes: https://git.openjdk.org/jdk/pull/25165/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356667 Stats: 95 lines in 10 files changed: 72 ins; 9 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/25165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25165/head:pull/25165 PR: https://git.openjdk.org/jdk/pull/25165 From wkemper at openjdk.org Mon May 12 23:09:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 May 2025 23:09:53 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() In-Reply-To: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: On Fri, 9 May 2025 23:45:50 GMT, Kelvin Nilsen wrote: > Two changes: > > 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) > 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). Minor nits. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 235: > 233: > 234: // Acquire heap lock and return available_in, assuming heap lock is not acquired by the caller. > 235: inline size_t available_in_under_lock(ShenandoahFreeSetPartitionId which_partition) const { This name confuses me: `available_in_under_lock`. Should it be called `available_without_lock` or `available_no_lock`? Or, switch it with `available_in` (which asserts that the heap lock is held). I see that it takes the lock, but this is only to make the assertion. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 242: > 240: } > 241: > 242: ShenandoahGenerationalHeap::TransferResult result;; Extra `;` here. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25165#pullrequestreview-2834801227 PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085650563 PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085648309 From wkemper at openjdk.org Mon May 12 23:10:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 May 2025 23:10:52 GMT Subject: RFR: 8355340: GenShen: Remove unneeded log messages related to remembered set write table [v2] In-Reply-To: <0rJUri0R4B1p5Vf_3tzRegWxn3T6r7046gKXUJbeYV8=.af67166b-45bc-4888-82ec-c69fbdb5c6af@github.com> References: <0rJUri0R4B1p5Vf_3tzRegWxn3T6r7046gKXUJbeYV8=.af67166b-45bc-4888-82ec-c69fbdb5c6af@github.com> Message-ID: <9ZAGrz5s4dOvL3oaBpQ014-qW272MewsPGOK9JordRc=.c21440fc-129e-4bfd-b5b4-3ad55459264d@github.com> On Mon, 12 May 2025 22:40:33 GMT, Kelvin Nilsen wrote: >> Remove unneeded log messages related to processing of the remembered set write card table. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Use log_develop_debug message Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24809#pullrequestreview-2834806479 From kdnilsen at openjdk.org Mon May 12 23:16:58 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 23:16:58 GMT Subject: Integrated: 8355340: GenShen: Remove unneeded log messages related to remembered set write table In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 00:54:45 GMT, Kelvin Nilsen wrote: > Remove unneeded log messages related to processing of the remembered set write card table. This pull request has now been integrated. Changeset: c23469df Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/c23469df162498e30119f43bc3d1effa15574a42 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8355340: GenShen: Remove unneeded log messages related to remembered set write table Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/jdk/pull/24809 From kdnilsen at openjdk.org Mon May 12 23:17:51 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 23:17:51 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() In-Reply-To: References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: On Mon, 12 May 2025 23:05:43 GMT, William Kemper wrote: >> Two changes: >> >> 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) >> 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 235: > >> 233: >> 234: // Acquire heap lock and return available_in, assuming heap lock is not acquired by the caller. >> 235: inline size_t available_in_under_lock(ShenandoahFreeSetPartitionId which_partition) const { > > This name confuses me: `available_in_under_lock`. Should it be called `available_without_lock` or `available_no_lock`? Or, switch it with `available_in` (which asserts that the heap lock is held). I see that it takes the lock, but this is only to make the assertion. Thanks for review and comments. I'll change the name. It follows a pattern that is admittedly very confusing... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085656619 From kdnilsen at openjdk.org Mon May 12 23:22:33 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 23:22:33 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v2] In-Reply-To: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: <9O1jQ5rn-sWaFz3hO-5tn4CCiDbyh1Q5E1fXDTM_Tco=.5efc1191-0c87-47d1-b219-737249bfe63d@github.com> > Two changes: > > 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) > 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Respond to reviewer comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25165/files - new: https://git.openjdk.org/jdk/pull/25165/files/6353f1f7..ffe1113e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25165&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25165/head:pull/25165 PR: https://git.openjdk.org/jdk/pull/25165 From kdnilsen at openjdk.org Mon May 12 23:22:33 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 23:22:33 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v2] In-Reply-To: References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com>

Message-ID: On Mon, 12 May 2025 23:14:59 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 235: >> >>> 233: >>> 234: // Acquire heap lock and return available_in, assuming heap lock is not acquired by the caller. >>> 235: inline size_t available_in_under_lock(ShenandoahFreeSetPartitionId which_partition) const { >> >> This name confuses me: `available_in_under_lock`. Should it be called `available_without_lock` or `available_no_lock`? Or, switch it with `available_in` (which asserts that the heap lock is held). I see that it takes the lock, but this is only to make the assertion. > > Thanks for review and comments. I'll change the name. It follows a pattern that is admittedly very confusing... changing to available_in_not_locked() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085660231 From kdnilsen at openjdk.org Mon May 12 23:22:33 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 May 2025 23:22:33 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v2] In-Reply-To: References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> Message-ID: On Mon, 12 May 2025 23:02:26 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer comments > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 242: > >> 240: } >> 241: >> 242: ShenandoahGenerationalHeap::TransferResult result;; > > Extra `;` here. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085660144 From wkemper at openjdk.org Mon May 12 23:31:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 May 2025 23:31:54 GMT Subject: RFR: 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() [v2] In-Reply-To: <9O1jQ5rn-sWaFz3hO-5tn4CCiDbyh1Q5E1fXDTM_Tco=.5efc1191-0c87-47d1-b219-737249bfe63d@github.com> References: <13TqKn0YFISVnkWRhFsnR-e-T7_CL2orLWHIiDMc0qM=.ad4f4b45-6342-4453-84fe-acd3900c4040@github.com> <9O1jQ5rn-sWaFz3hO-5tn4CCiDbyh1Q5E1fXDTM_Tco=.5efc1191-0c87-47d1-b219-737249bfe63d@github.com> Message-ID: On Mon, 12 May 2025 23:22:33 GMT, Kelvin Nilsen wrote: >> Two changes: >> >> 1. Compute the difference between capacity and used while holding the heap lock (so that these two values do not change in incompatible ways while the difference is being computed) >> 2. Return a "harmless" Sentinel value in case available() is consulted while we are in the process of rebuilding the freeset. The SIZE_MAX value is considered harmless in that it will not cause immediate trigger of a new GC. In typical usage, the control or regulator thread will check available() again in another millisecond, at which time a more accurate accounting of available will be provided (if the freeset reconstruction has been completed). > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Respond to reviewer comments Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 234: > 232: } > 233: > 234: // Acquire heap lock and return available_in, assuming heap lock is not acquired by the caller. Sorry - can we change this comment too? This method does _not_ acquire the lock in release builds. Comment could mention that it acquires the lock only for the correctness of the assertion? ------------- PR Review: https://git.openjdk.org/jdk/pull/25165#pullrequestreview-2834876929 PR Review Comment: https://git.openjdk.org/jdk/pull/25165#discussion_r2085680099 From stefank at openjdk.org Tue May 13 04:31:56 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 13 May 2025 04:31:56 GMT Subject: RFR: 8356455: ZGC: Replace ZIntrusiveRBTree with IntrusiveRBTree [v2] In-Reply-To: References:

Message-ID: On Thu, 8 May 2025 13:01:07 GMT, Axel Boldt-Christmas wrote: >> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was implemented before IntrusiveRBTree was introduced, and as such implemented its own intrusive red-black tree. Now that a shared data structure implementation is available, use that instead. >> >> The switch is straight forward, and the O(1) left and right most node lookup which ZIntrusiveRBTree implements that IntrusiveRBTree does not is trivial to implement on top of the tree. >> >> Initial performance evaluation shows no difference between the two implementations. And the functional testing passes. >> >> Tested Oracle Supported platforms, Oracle tier1-8 ZGC testing tasks. > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Use private inheritance > - Separate tree logic to own class Thanks for doing this cleanup. I have a few nits below. src/hotspot/share/gc/z/zMappedCache.cpp line 308: > 306: > 307: // Replace in size-class lists > 308: _tree.replace(old_node, new_node, cursor); This code was changed from: // Replace in tree _tree.replace(entry->node_addr(), cursor); // Replace in size-class lists to: // Replace in size-class lists _tree.replace(old_node, new_node, cursor); It seems like something went wrong with the comments. src/hotspot/share/gc/z/zMappedCache.cpp line 672: > 670: // use is_empty_error_reporter_safe and size_error_reporter_safe on the size > 671: // class lists. > 672: const size_t entry_count = _tree.size(); There doesn't seem to be an `Atomic::load` or `volatile` to make sure that we honor the comment about reading only once. src/hotspot/share/gc/z/zMappedCache.hpp line 32: > 30: #include "gc/z/zList.hpp" > 31: #include "utilities/globalDefinitions.hpp" > 32: #include "utilities/rbTree.hpp" Sort order. ------------- PR Review: https://git.openjdk.org/jdk/pull/25112#pullrequestreview-2835230617 PR Review Comment: https://git.openjdk.org/jdk/pull/25112#discussion_r2085893256 PR Review Comment: https://git.openjdk.org/jdk/pull/25112#discussion_r2085896187 PR Review Comment: https://git.openjdk.org/jdk/pull/25112#discussion_r2085896390 From aboldtch at openjdk.org Tue May 13 06:02:27 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 13 May 2025 06:02:27 GMT Subject: RFR: 8356716: ZGC: Cleanup Uncommit Logic Message-ID: <9T7g6nawhKvvp8dfTlqmGvUtwIqPY9rqiXv3r246mrQ=.2e56f850-71ab-4d88-b5bb-7dd3c3e2b8a3@github.com> [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) required changing the way ZGC handle memory uncommitting (returning physical memory to the OS). Previously ZGC tracked how recently used memory was on a ZPage level. [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) did away with the ZPage abstraction for unused memory. But because of this ZGC does not have a convenient way of tracking the usage of a specific memory range. Instead [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) opted to keep a watermark in the cache unused mapped memory, to keep track of the amount of memory that was not used within the last ZUncommitDelay, and use this when deciding how much to uncommit. Because this measurement is not as granular as previously, and because uncommitting memory is something we want to do conservatively, as a response to low memory utilization, [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) was written with the intent to spread out the uncommitting over some time interval. The actual implementation in [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) has a few issues which this RFE tries to address: * Missing wait, the uncommitting is not actually spread out, but happens all at once. * Reactivity, if the process starts using memory that was below the previous watermark, uncommitting should stop. * Structure, the current implementation has a lot of different dependencies and has state spread out over multiple classes. Refactor to keep the logic contained to the ZUncommitter, and provide better named facilitating functions on the ZPartition and ZMappedCache. And make the lifecycle of ZUncommitter more explicit. * Events, overhaul the JFR uncommit events to be sent (and track the time for) a chunk of uncommits without any waits. An alternative discussed has been to do uncommitting based on GC triggers rather than a periodically. So rather than using ZUncommitDelay, we could have our proactive GCs actually trigger and track uncommitting. This might be a future RFE, but it was not attempted here as it would change user facing APIs. [JDK-8329758](https://bugs.openjdk.org/browse/JDK-8329758) will more than likely overhaul the uncommit triggers as well, and the whole concept of ZUncommitDelay and having to tune how to uncommit will go away. ------------- Commit messages: - Use milliseconds instead of seconds - Improve events and statistics - Handle timeout correctly - Cleanups - Remove test's TIMEOUT_FACTOR dependency - Improve remove from min - Maybe better? - This is more inline with what uncommit does - Speed up as well, but weirdly none-linear - The intent Changes: https://git.openjdk.org/jdk/pull/25198/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25198&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356716 Stats: 333 lines in 9 files changed: 264 ins; 32 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/25198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25198/head:pull/25198 PR: https://git.openjdk.org/jdk/pull/25198 From epeter at openjdk.org Tue May 13 06:10:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 06:10:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Mon, 12 May 2025 14:46:22 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Thanks for looking at this PR, Emanuel! > >> It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations. > > Right, I have prototyped this alternative in the wider context of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627) since it would be required for using writes as implicit null checks (both in ZGC and G1), and it indeed adds some complexity to `PhaseOutput` and other places (see https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks). I ran some preliminary experiments and could not see enough benefits to justify the additional complexity. > >> In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. > > This cannot happen because the address we are loading from is constant through the barrier, see e.g. the code generated for a zLoadP in x64 (AT&T syntax): > > > 0x00007514c47d6aa0: movq 0x10(%rsi), %rax ; main OOP load with implicit exception: dispatches to 0x00007514c47d6abe > 0x00007514c47d6aa4: shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax > 0x00007514c47d6aa8: ja 0x36 ; jump to barrier stub (slow path) > > (...) > > 0x00007514c47d6abe: trigger uncommon trap (null_check) > > (...) > > barrier stub (slow path): > 0x00007514c47d6ae4: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring > (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) > 0x00007514c47d6b09: jmp -0x5d ; go back to main code section > > > Note how the address we might fault on (triggering the implicit exception) is stored on `%rsi` (base address) + `0x10` (field offset), which is not changed between the main load and the slow-path reload. > >> I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer ... @robcasloz Thanks for the explanations! I have no idea how the GC barriers work, and what addresses they load from. So I just had a list of questions run through my mind, about what could possibly go wrong. But the questions are more speculations, because I really have no idea what the GC barriers do. I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? Ideally, we would have some sort of semi-formal proof, to guarantee that if we did ever encounter a null-pointer, we would have to encounter it already on that first load. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875161021 From epeter at openjdk.org Tue May 13 06:19:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 06:19:52 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <_fLiVC2_bMj4oQ8k1__Y07Eyl-vAE4JrdjWbTfIR5QU=.94c2bfde-72ce-4db0-9d62-7b87c5067779@github.com> On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). If I understand your statements above correctly: The first load and any subsequent loads are all from the **exact same** address. Hence, if any were null-pointer, the first one has to be a null-pointer. Assuming this is correct, it seems that this follows: Assuming the pointer is not a null-pointer, then wherever it points to cannot be moved by the GC. In your example code above, `0x10(%rsi)` is the address, and presumably `rsi` refers to the base of some object, and `0x10` is the offset to a field. The object that `rsi` points to can thus not be moved by the GC, correct? But the object that the field at offset `0x10` points to may have been moved, and that is why we check its coloring, and then re-load from that field later. Does that sound correct to you? What guarantees that the object associated with `rsi` is not moved by the GC? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875176654 From sjohanss at openjdk.org Tue May 13 07:41:50 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 13 May 2025 07:41:50 GMT Subject: RFR: 8350596: [Linux] Increase default MaxRAMPercentage for containerized workloads In-Reply-To: <3zkVeUEqr_avGG1v8Q0Dp_0_FiZrXLxHJeU4KQ556sg=.77fbebb1-8bcf-40bb-95c0-664120321cbf@github.com> References: <3zkVeUEqr_avGG1v8Q0Dp_0_FiZrXLxHJeU4KQ556sg=.77fbebb1-8bcf-40bb-95c0-664120321cbf@github.com> Message-ID: On Fri, 9 May 2025 10:04:41 GMT, Severin Gehwolf wrote: > Thanks for looking at this, Stefan. > > > Thinking back to the discussions we had around this at OCW, I remember there were some concerns regarding different types of deployments. I think this really makes sense in the cases where we divide a machines memory using containers, but what if containers are just used to divide other resources. One use-case that was raised was containerized applications on Linux. > > Currently there is only the generic `is_containerized()` API which has been documented in the bug that fixed that: [JDK-8261242](https://bugs.openjdk.org/browse/JDK-8261242?focusedId=14685743&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14685743) > > So yes, this would update the RAM percentage for a) unprivileged container (no limits), b) some other container tech which sets the cgroup CPU limit for example. The JVM currently only looks at memory/cpu limits for privileged containers and takes that into consideration for `is_containerized()`. If there is consensus, we could add an API that returns true if only a memory limit is present. That doesn't exist yet, though. Happy to propose something going into that direction. The infra is already there. > This could be a good direction, we at least need some way to avoid desktop Java apps using too much memory out of the box. There was some talk about using 75% when containerized, but also looking at the machine total, so that if 75% of the container is more than 25% of the machine we fall back to 25% of the machine. For example, for an 8g container on a 16g machine, we would constrain the heap to 4g (25% of machine) rather than 6g (75% of the container). This would of course not be optimal in all situations either, but it would be a fall back to the old defaults for limit-less containers and still in some cases provide a higher default heap for memory configured container deployments. > > I'm not sure if such an application would report true for `is_containerized()`, but it would be nice to have some data around this. > > It would return true for any non-privileged container. I can see that this might be a concern. > Thanks for verifying this. > > Have you done any testing with containerized apps? > > I have done some basic testing so far, but would be happy to do more. What specific testing would you be interested in? I was mostly thinking about limitless containers (desktop apps) to see if we run into the problems of using way too much memory, but given your answer above I guess we would. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25086#issuecomment-2875389061 From sjohanss at openjdk.org Tue May 13 07:46:59 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 13 May 2025 07:46:59 GMT Subject: RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v5] In-Reply-To: References:

Message-ID: On Fri, 9 May 2025 08:27:20 GMT, Stefan Karlsson wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> StefanK review > > Marked as reviewed by stefank (Reviewer). Thanks for the reviews @stefank and @xmas92 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24814#issuecomment-2875402257 From sjohanss at openjdk.org Tue May 13 07:47:00 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 13 May 2025 07:47:00 GMT Subject: Integrated: 8353184: ZGC: Simplify and correct tlab_used() tracking In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 07:58:35 GMT, Stefan Johansson wrote: > Please review this change to improve TLAB handling in ZGC. > > **Summary** > In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected. > > The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: > > bool update_allocation_history = used > 0.5 * capacity; > ``` > > So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected. > > Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. > > How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead. > > This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. > > **Testing** > * Functional testing tier1-tier7 > * Performance testing in A... This pull request has now been integrated. Changeset: 526f543a Author: Stefan Johansson URL: https://git.openjdk.org/jdk/commit/526f543adfeb90341b3b5b18916c1bb7ef725599 Stats: 227 lines in 12 files changed: 180 ins; 41 del; 6 mod 8353184: ZGC: Simplify and correct tlab_used() tracking Reviewed-by: stefank, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/24814 From rcastanedalo at openjdk.org Tue May 13 08:40:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 08:40:52 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <_fLiVC2_bMj4oQ8k1__Y07Eyl-vAE4JrdjWbTfIR5QU=.94c2bfde-72ce-4db0-9d62-7b87c5067779@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> <_fLiVC2_bMj4oQ8k1__Y07Eyl-vAE4JrdjWbTfIR5QU=.94c2bfde-72ce-4db0-9d62-7b87c5067779@github.com> Message-ID: On Tue, 13 May 2025 06:16:53 GMT, Emanuel Peter wrote: > If I understand your statements above correctly: > The first load and any subsequent loads are all from the exact same address. Hence, if any were null-pointer, the first one has to be a null-pointer. Right. > Assuming this is correct, it seems that this follows: > Assuming the pointer is not a null-pointer, then wherever it points to cannot be moved by the GC. In your example code above, 0x10(%rsi) is the address, and presumably rsi refers to the base of some object, and 0x10 is the offset to a field. The object that rsi points to can thus not be moved by the GC, correct? But the object that the field at offset 0x10 points to may have been moved, and that is why we check its coloring, and then re-load from that field later. Does that sound correct to you? What guarantees that the object associated with rsi is not moved by the GC? The inner workings of ZGC's guarantee that "root" addresses such as `%rsi` remain valid ("have a good color" in ZGC speak), but I am afraid I cannot offer a more detailed explanation. You may find more information in e.g. [1] (even though it is outdated by now as it describes non-generational ZGC), or perhaps some GC engineer may chime into the discussion and offer more detail? In any case, to convince ourselves of the correctness of this RFE without needed to dive deep into ZGC internals, maybe it is enough to ensure that we preserve the same behavior as in mainline (where `zLoadP` cannot be used for implicit null checks). Here is how the compiled code looks for the above example before and after this change: # Before the RFE (explicit null check): testq %rsi, %rsi ; explicit null check on the base address je #uncommon_trap block movq 0x10(%rsi), %rax ; main OOP load shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax ja #slow_barrier_path continue: (...) slow_barrier_path: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) jmp #continue # After the RFE (implicit null check): movq 0x10(%rsi), %rax ; main OOP load with implicit exception: dispatches to #uncommon_trap block shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax ja #slow_barrier_path continue: (...) slow_barrier_path: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) jmp #continue As you can see, both cases rely on the same assumptions about the validity of `%rsi` through the execution of the compiled code. [1] Albert Mingkun Yang and Tobias Wrigstad. Deep Dive into ZGC: A Modern Garbage Collector in OpenJDK. In ACM TOPLAS, 2022. https://doi.org/10.1145/3538532 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875559501 From rcastanedalo at openjdk.org Tue May 13 08:55:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 08:55:55 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com>

Message-ID: On Tue, 13 May 2025 06:08:38 GMT, Emanuel Peter wrote: > I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? Regarding code, I recommend you starting [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L126-L129) and following `z_load_barrier`. The slow barrier path is generated in a stub [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L1217-L1235). Regarding documentation, you might have a look at the [TOPLAS paper](https://dl.acm.org/doi/full/10.1145/3538532) (which is unfortunately a bit outdated because it only covers non-generational ZGC, but might still offer some intuition that is valid for the latest ZGC version, in particular regarding concurrent relocation and load barriers), the [Generational ZGC JEP](https://openjdk.org/jeps/439), or one of the numerous presentations available on YouTube (e.g. I found the overview in https://www.youtube.com/watch?v=YyXjC68l8mw&t=864s pretty useful). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875604527 From epeter at openjdk.org Tue May 13 09:59:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 09:59:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com>

Message-ID: On Tue, 13 May 2025 08:53:08 GMT, Roberto Casta?eda Lozano